0% found this document useful (0 votes)
9 views25 pages

Performance Measures

Uploaded by

shashanks2493
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views25 pages

Performance Measures

Uploaded by

shashanks2493
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Performance Measures

Performance Measures
Classification Metrics
• Classification Accuracy

• Logarithmic Loss

• Confusion Matrix

• Area under Curve(AOC)


• F1 Score
Regression Metrics

• Mean Absolute Error

• Mean Squared Error


• R –Squared
Classification Accuracy
• Classification Accuracy is what we usually mean, when we use the term accuracy.
It is the ratio of number of correct predictions to the total number of input samples.

• It works well only if there are equal number of samples belonging to each class.

• For example, consider that there are 98% samples of class A and 2% samples of
class B in our training set. Then our model can easily get 98% training accuracy by
simply predicting every training sample belonging to class A.
• When the same model is tested on a test set with 60% samples of class A and 40%
samples of class B, then the test accuracy would drop down to 60%.
Classification Accuracy is great, but gives us the false sense of achieving high
accuracy.
• The real problem arises, when the cost of misclassification of the minor class
samples are very high. If we deal with a rare but fatal disease, the cost of failing to
diagnose the disease of a sick person is much higher than the cost of sending a
healthy person to more tests.
Python Example
import pandas
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv("pima-indians-diabetes.data.csv", names=names)
X = dataframe.iloc[:, :-1]
Y= dataframe.iloc[:, 8]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = LogisticRegression()
scoring = 'accuracy'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())
print(results.std())
Logarithmic Loss

• Logarithmic Loss or Log Loss, works by penalizing the false classifications. It


works well for multi-class classification. When working with Log Loss, the classifier
must assign probability to each class for all the samples. Suppose, there are N
samples belonging to M classes, then the Log Loss is calculated as below :

where,
• y_ij, indicates whether sample i belongs to class j or not
• p_ij, indicates the probability of sample i belonging to class j
• Log Loss has no upper bound and it exists on the range [0, ∞). Log Loss nearer to
0 indicates higher accuracy, whereas if the Log Loss is away from 0 then it
indicates lower accuracy.
• In general, minimising Log Loss gives greater accuracy for the classifier.
Python Example
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv("pima-indians-diabetes.data.csv", names=names)
X = dataframe.iloc[:, :-1]
Y= dataframe.iloc[:, 8]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = LogisticRegression()
scoring = 'neg_log_loss'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("Logloss: %.3f (%.3f)") % (results.mean(), results.std())
Confusion Matrix
• Confusion Matrix as the name suggests gives us a matrix as output and
describes the complete performance of the model.
• Lets assume we have a binary classification problem. We have some
samples belonging to two classes : YES or NO. Also, we have our own
classifier which predicts a class for a given input sample. On testing our
model on 165 samples ,we get the following result.

• There are 4 important terms :
• True Positives : The cases in which we predicted YES and the actual
output was also YES.
• True Negatives : The cases in which we predicted NO and the actual
output was NO.
• False Positives : The cases in which we predicted YES and the actual
output was NO.
• False Negatives : The cases in which we predicted NO and the actual
output was YES.
• Accuracy for the matrix can be calculated by taking average of the values
lying across the “main diagonal” i.e
Python Example
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv("pima-indians-diabetes.data.csv", names=names)
X = dataframe.iloc[:, :-1]
Y= dataframe.iloc[:, 8]
seed = 7
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=0.2,
random_state=seed)
model = LogisticRegression()
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
matrix = confusion_matrix(Y_test, predicted)
print(matrix)
Area Under Curve
• Area Under Curve(AUC) is one of the most widely used metrics for
evaluation. It is used for binary classification problem. AUC of a classifier is
equal to the probability that the classifier will rank a randomly chosen
positive example higher than a randomly chosen negative example. Before
defining AUC, let us understand two basic terms :
• True Positive Rate (Sensitivity) : True Positive Rate is defined as TP/
(FN+TP). True Positive Rate corresponds to the proportion of positive data
points that are correctly considered as positive, with respect to all positive
data points.
Contd..
• False Positive Rate (1-Specificity) : False Positive Rate is defined as FP / (FP+TN). False Positive
Rate corresponds to the proportion of negative data points that are mistakenly considered as
positive, with respect to all negative data points.

• Specificity is also known as True negative rate and False positive rate =(1-specificity).
• So,
Contd..
• False Positive Rate and True Positive Rate both have values in the range
[0, 1]. FPR and TPR both are computed at threshold values such as (0.00,
0.02, 0.04, …., 1.00) and a graph is drawn. AUC is the area under the
curve of plot False Positive Rate vs True Positive Rate at different points in
[0, 1].
• As evident, AUC has a range of [0, 1]. The greater the value, the better is
the performance of our model.
Python example
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe=pandas.read_csv("pima-indians-diabetes.data.csv", names=names)
X = dataframe.iloc[:, :-1]
Y= dataframe.iloc[:, 8]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = LogisticRegression()
scoring = 'roc_auc'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results)
print(results.mean())
print(results.std())
F1 Score

• F1 Score is used to measure a test’s accuracy


• F1 Score is the Harmonic Mean between precision and recall. The range
for F1 Score is [0, 1]. It tells you how precise your classifier is (how many
instances it classifies correctly), as well as how robust it is (it does not
miss a significant number of instances).
• High precision but lower recall, gives you an extremely accurate, but it
then misses a large number of instances that are difficult to classify. The
greater the F1 Score, the better is the performance of our model.
Mathematically, it can be expressed as :
Contd..
F1 Score tries to find the balance between precision and recall.
• Precision : It is the number of correct positive results divided by the number of
positive results predicted by the classifier.
• Recall : It is the number of correct positive results divided by the number of all
relevant samples (all samples that should have been identified as positive).
Python example
# Cross Validation Classification Report
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe=pandas.read_csv("pima-indians-diabetes.data.csv", names=names)
X = dataframe.iloc[:, :-1]
Y= dataframe.iloc[:, 8]
seed = 7
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed)
model = LogisticRegression()
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
report = classification_report(Y_test, predicted)
print(report)
Mean Absolute Error
• Mean Absolute Error is the average of the difference between the Original Values
and the Predicted Values.
• It gives us the measure of how far the predictions were from the actual output.
However, they don’t gives us any idea of the direction of the error i.e. whether we
are under predicting the data or over predicting the data.
• Mathematically, it is represented as :
Boston Housing dataset
Attribute Information:

1. CRIM per capita crime rate by town


2. ZN proportion of residential land zoned for lots over 25,000 sq.ft.
3. INDUS proportion of non-retail business acres per town
4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
5. NOX nitric oxides concentration (parts per 10 million)
6. RM average number of rooms per dwelling
7. AGE proportion of owner-occupied units built prior to 1940
8. DIS weighted distances to five Boston employment centres
9. RAD index of accessibility to radial highways
10. TAX full-value property-tax rate per $10,000
11. PTRATIO pupil-teacher ratio by town
12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
13. LSTAT % lower status of the population
14. MEDV Median value of owner-occupied homes in $1000’s
(The dataset can be available on Kaggle.com as Boston Housing Dataset)
Python Example
# Cross Validation Regression MAE
• import pandas
• from sklearn import model_selection
• from sklearn.linear_model import LinearRegression
• names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
• dataframe = pandas.read_csv(“housing.data”, delim_whitespace=True, names=names)
• array = dataframe.values
• X = array[:,0:13]
• Y = array[:,13]
• seed = 7
• kfold = model_selection.KFold(n_splits=10, random_state=seed)
• model = LinearRegression()
• scoring = 'neg_mean_absolute_error'
• results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
• print("MAE: %.3f (%.3f)") % (results.mean(), results.std())
Mean Squared Error
• Mean Squared Error(MSE) is quite similar to Mean Absolute Error, the
only difference being that MSE takes the average of the square of the
difference between the original values and the predicted values.
• The advantage of MSE being that it is easier to compute the gradient,
whereas Mean Absolute Error requires complicated linear programming
tools to compute the gradient.
• As, we take square of the error, the effect of larger errors become more
pronounced then smaller error, hence the model can now focus more on
the larger errors.
Python Example
# Cross Validation Regression MSE
import pandas
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
dataframe = pandas.read_csv(“housing.data”, delim_whitespace=True, names=names)
array = dataframe.values
X = array[:,0:13]
Y = array[:,13]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = LinearRegression()
scoring = 'neg_mean_squared_error'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("MSE: %.3f (%.3f)") % (results.mean(), results.std())
R-Squared Metric
• The R^2 (or R Squared) metric provides an indication of the goodness of fit of a set
of predictions to the actual values. In statistical literature, this measure is called the
coefficient of determination.
• This is a value between 0 and 1 for no-fit and perfect fit respectively.

Python Example
# Cross Validation Regression R^2
import pandas
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.data"
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
dataframe = pandas.read_csv(“housing.data”, delim_whitespace=True, names=names)
array = dataframe.values
X = array[:,0:13]
Y = array[:,13]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = LinearRegression()
scoring = 'r2'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("R^2: %.3f (%.3f)") % (results.mean(), results.std())
Contd..

Thank You

You might also like