Performance Evaluation
Lili Ayu Wulandhari Ph.D.
Test Sets
• Learning Process is carried out to recognize pattern of datasets.
• In order to avoid bias result of pattern recognition, the data is divided
into in-sample and out-sample data where these data must be
independence each others.
• In-sample data training is used to obtain learning model such as
number of hidden, number of neuron in hidden, and acceptable
weights which appropriate to the pattern.
• Out-sample data is used for validation which yields selected models
and testing is to evaluate models and weights obtained from training.
• In – sample and out – sample data can be divided into several
methods, for instance
1. Cross Validation
2. Split average
• Appropriate model and weights are determined by high accuracy,
recall and precision
Confusion Matrix
Table 5.1: Confusion Matrix
Predicted Predicted
Positives Negatives
Actual Positives a b
Actual Negatives c d
Confusion Matrix
• As example, assuming the sample of 23 fruits consists of
equal to 3 classes, namely 8 oranges, 10 apples and 5
grapes. The confusion matrix can be described as Table 5.2
below
Table 5.2: Classification Matrix for 3-Class Classification
Predicted Predicted Predicted
Oranges Apples Grapes
Actual Oranges 6 1 1
Actual Apples 2 8 0
Actual Grapes 1 1 3
Confusion Matrix
Table 5.3 Confusion Table of Orange class
6 true positives 2 false negatives
(actual oranges which are correctly (oranges which are incorrectly identified
classified as oranges) as apples and grapes)
3 false positives 11 true negatives
(non-oranges which are incorrectly (all the remaining fruits which are
identified as oranges) correctly identified as non-oranges)
Confusion Matrix
Table 5.4 Confusion Table of Apple class
8 true positives 2 false negatives
(actual apples which are correctly (apples which are incorrectly identified
classified as apples) as oranges)
2 false positives 9 true negatives
(non-apples which are incorrectly (all the remaining fruits which are
identified as apples) correctly identified as non-apples)
Confusion Matrix
Table 5.5 Confusion Table of Grape class
3 true positives 2 false negatives
(actual grapes which are correctly (grapes which are incorrectly identified
classified as grapes) as oranges and apples)
1 false positives 14 true negatives
(non-grapes which are incorrectly (all the remaining fruits which are
identified as grapes) correctly identified as non-grapes)
Classification Accuracy
• Accuracy shows the proportion of true prediction
classes obtained from the model against actual classes
of classification. Accuracy is calculated using:
Predicted Predicted
Positives Negatives
Actual Positives TP FN
Actual Negatives FP TN
TP TN
Accuracy 100%
TP TN FP FN
Recall
Recall (True positive Rate (TPR) is rate of correct
positive prediction to all actual positive classes
Predicted Predicted
Positives Negatives
Actual Positives TP FN
Actual Negatives FP TN
TP
Recall 100%
TP FN
Precision
Precision is rate of correct positive prediction to all
positive predicted classes
Predicted Predicted
Positives Negatives
Actual Positives TP FN
Actual Negatives FP TN
TP
Precision 100%
TP FP
F1 Score
the F1 score presents the balance between the
precision and the recall.
2 * (Recall Precision)
F1 Score
Recall Precision
References
• Page, David. (2017) Evaluating Machine Learning Methods,
Department of Biostatistics and Medical Informatics
and Department of Computer Sciences, School of Medicine
and Public Health University of Wisconsin-Madison.
http://pages.cs.wisc.edu/~dpage/cs760/evaluating.pdf
• Wulandhari, Lili Ayu. (2014). Enhanced Genetic Algorithm-
based Back Propagation Neural Network to Diagnose
Conditions of Multiple-bearing System, Ph.D. Thesis, Universiti
Teknologi Malaysia.
• Machart, P., and Ralaivola, L. (2012). Confusion Matrix Stability
Bounds for Multiclass Classification. Aix-Marseille Univ., LIF-
QARMA, CNRS, UMR 7279, F-13013, Marseille, France.
• Kohavi, R., and Provost, F. (1998). Glossary of Terms: Special
Issue on Applictions of Macine Learning and the Knowledge
Discovery Process. Machine Learning, 30, 271-274.
• http://scikit-learn.org/stable/modules/cross_validation.html