0% found this document useful (0 votes)
30 views21 pages

Lecture 5

Here are the key points about performance measures for multi-class classification: - Confusion matrix is extended to an N×N matrix, where N is the number of classes - Diagonal elements represent correctly classified samples for each class - Off-diagonal elements represent misclassified samples between classes - Accuracy is calculated as the sum of all diagonal elements divided by total samples - Precision, recall, F1 score can be calculated individually for each class - Overall precision/recall averages the scores across all classes - Specificity remains the same definition as binary classification The confusion matrix provides insights into which classes are most confused or misclassified by the model. Focusing on improving predictions for those classes can enhance multi-class classification

Uploaded by

aly869556
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views21 pages

Lecture 5

Here are the key points about performance measures for multi-class classification: - Confusion matrix is extended to an N×N matrix, where N is the number of classes - Diagonal elements represent correctly classified samples for each class - Off-diagonal elements represent misclassified samples between classes - Accuracy is calculated as the sum of all diagonal elements divided by total samples - Precision, recall, F1 score can be calculated individually for each class - Overall precision/recall averages the scores across all classes - Specificity remains the same definition as binary classification The confusion matrix provides insights into which classes are most confused or misclassified by the model. Focusing on improving predictions for those classes can enhance multi-class classification

Uploaded by

aly869556
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Lecture 5:Performance

Measures
By: Dr. Eman Ahmed
Contents
• Confusion Matrix
• Accuracy
• Precision
• Sensitivity
• Specificity
• F1-Score
Confusion Matrix
• A confusion matrix is a performance evaluation tool in machine
learning.
• It displays the number of true positives, true negatives, false
positives, and false negatives.
• A Confusion matrix is an N x N matrix used for evaluating the
performance of a classification model, where N is the total
number of classes. The matrix compares the actual target values
with those predicted by the machine learning model.
Confusion Matrix
• For a binary classification problem, we would have a 2 x 2 matrix

Total Predicted Positive

Total Predicted Negative

Total Actual Positive Total Actual Negative


•The class variable has two values: Positive or Negative
•The columns represent the classes actual values
•The rows represent the classes predicted values
• True Positive (TP)
• The predicted class matches the actual class.
• The actual class was positive, and the model predicted a positive class.

• True Negative (TN)


• The predicted class matches the actual class.
• The actual class was negative, and the model predicted a negative class.
• False Positive (FP) – Type I Error
• The predicted class was falsely predicted.
• The actual class was negative, but the model predicted a positive class.

• False Negative (FN) – Type II Error


• The predicted class was falsely predicted.
• The actual value was positive, but the model predicted a negative value.
Example of Confusion Matrix (Sick or Healthy)
Example
• Get the values of TP, TN, FP, FN.
• How many samples are there in the test set?
Example
• True Positive (TP) = 560, meaning the model correctly classified
560 positive class data points.
• True Negative (TN) = 330, meaning the model correctly
classified 330 negative class data points.
• False Positive (FP) = 60, meaning the model incorrectly classified
60 negative class data points as belonging to the positive class.
• False Negative (FN) = 50, meaning the model incorrectly
classified 50 positive class data points as belonging to the
negative class.
• Total Number of samples = TP + TN + FN + FP = 1000
Performance Measures
• Accuracy
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑁 + 𝐹𝑃
Performance Measures
• Precision: It tells us how many of the positively predicted cases
actually turned out to be positive.

𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃

• This would determine whether our model is reliable or not.


Performance Measures
• Sensitivity or Recall or True Positive Rate (TPR):
• It tells us how many of the actual positive cases we were able to
predict correctly with our model.
𝑇𝑃
𝑇𝑃𝑅 =
𝑇𝑃 + 𝐹𝑁

Example: In medical case, sensitivity refers to a test's ability to classify an individual


with disease as positive. A highly sensitive test means that there are few false
negative results, and thus fewer cases of disease are missed.
Get Precision and Recall for
Result and Comment
Among all the positively predicted classes, 50% were
actually postives

Among all the positively actual classes, 75 % were correctly


predicted as positives

Precision is a useful metric in cases where False Positive is a higher concern than False Negatives.
Example: Music recommendation systems or E-commerse. There are two classes: recommended
(positive) and not recommended (negative). Many false positives means considering not recommended
music as recommended. Hence, customers will get bored and stop using the app causing loss in
business.

Recall is a useful metric in cases where False Negative is a higher concern than False Positives.
Example: Medical applications. There are two classes: sick (+ve) or healthy (-ve). False negatives means
considering sick patient as healthy putting his life at risk because he won’t take medications.
Performance Measure
• F1-Score (for the positive class). The harmonic mean of the precision
and recall scores obtained for the positive class.

In a binary classification model, a large F1 score of 1 indicates excellent precision and recall, while a low
score indicates poor model performance. In general, a higher F1 score suggests better model
performance.

It is used when it is not clear which of the precision or recall is most important for a given problem.
Performance Measures
• Specificity: The number of samples predicted correctly to be in the
negative class out of all the samples in the dataset that actually
belong to the negative class. (True Negative Rate (TNR))

Example: For medical application, the specificity of a test is its ability


to classify an individual who does not have a disease as negative.
Confusion Matrix for a Multi-class problem
• Assume you have 4 classes with the following Confusion Matrix (M),
each cell is named as M_ij: Actual
• What is the accuracy of this classifier?
• What is the class for which the
classifier has the best performance?
• What is the class for which the
classifier has the worst performance?
• The diagonal elements are the correctly predicted samples. A total of
145 samples were correctly predicted out of the total 191 samples.
Thus, the overall accuracy is 75.92%.
• M_24=0 implies that the model does not confuse samples originally
belonging to class-4 with class-2, i.e., the classification boundary
between classes 2 and 4 was learned well by the classifier.
• To improve the model’s performance, one should focus on the
predictive results in class-3. A total of 18 samples (adding the
numbers in the red boxes of column 3) were misclassified by the
classifier, which is the highest misclassification rate among all the
classes. Accuracy in prediction for class-3 is, thus, 58.14% only.
What is TP, TN, FP, FN for class 1?
Actual
Actual

TP FP
FN TN
What is TP, TN, FP, FN for class 2?
Actual
Actual
Performance Measures of Multi-class

You might also like