Performance Measures
Performance Measures
Classification Metrics
• Classification Accuracy
•   Logarithmic Loss
•   Confusion Matrix
•  Area under Curve(AOC)
•  F1 Score
Regression Metrics
•   Mean Absolute Error
•   Mean Squared Error
•   R –Squared
Classification Accuracy
• Classification Accuracy is what we usually mean, when we use the term accuracy.
  It is the ratio of number of correct predictions to the total number of input samples.
• It works well only if there are equal number of samples belonging to each class.
• For example, consider that there are 98% samples of class A and 2% samples of
  class B in our training set. Then our model can easily get 98% training accuracy by
  simply predicting every training sample belonging to class A.
• When the same model is tested on a test set with 60% samples of class A and 40%
  samples of class B, then the test accuracy would drop down to 60%.
  Classification Accuracy is great, but gives us the false sense of achieving high
  accuracy.
• The real problem arises, when the cost of misclassification of the minor class
  samples are very high. If we deal with a rare but fatal disease, the cost of failing to
  diagnose the disease of a sick person is much higher than the cost of sending a
  healthy person to more tests.
Python Example
import pandas
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv("pima-indians-diabetes.data.csv", names=names)
X = dataframe.iloc[:, :-1]
Y= dataframe.iloc[:, 8]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = LogisticRegression()
scoring = 'accuracy'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())
print(results.std())
Logarithmic Loss
• Logarithmic Loss or Log Loss, works by penalizing the false classifications. It
  works well for multi-class classification. When working with Log Loss, the classifier
  must assign probability to each class for all the samples. Suppose, there are N
  samples belonging to M classes, then the Log Loss is calculated as below :
where,
• y_ij, indicates whether sample i belongs to class j or not
• p_ij, indicates the probability of sample i belonging to class j
• Log Loss has no upper bound and it exists on the range [0, ∞). Log Loss nearer to
  0 indicates higher accuracy, whereas if the Log Loss is away from 0 then it
  indicates lower accuracy.
• In general, minimising Log Loss gives greater accuracy for the classifier.
Python Example
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv("pima-indians-diabetes.data.csv", names=names)
X = dataframe.iloc[:, :-1]
Y= dataframe.iloc[:, 8]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = LogisticRegression()
scoring = 'neg_log_loss'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("Logloss: %.3f (%.3f)") % (results.mean(), results.std())
Confusion Matrix
  • Confusion Matrix as the name suggests gives us a matrix as output and
    describes the complete performance of the model.
  • Lets assume we have a binary classification problem. We have some
    samples belonging to two classes : YES or NO. Also, we have our own
    classifier which predicts a class for a given input sample. On testing our
    model on 165 samples ,we get the following result.
  •
• There are 4 important terms :
• True Positives : The cases in which we predicted YES and the actual
  output was also YES.
• True Negatives : The cases in which we predicted NO and the actual
  output was NO.
• False Positives : The cases in which we predicted YES and the actual
  output was NO.
• False Negatives : The cases in which we predicted NO and the actual
  output was YES.
• Accuracy for the matrix can be calculated by taking average of the values
  lying across the “main diagonal” i.e
Python Example
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv("pima-indians-diabetes.data.csv", names=names)
X = dataframe.iloc[:, :-1]
Y= dataframe.iloc[:, 8]
seed = 7
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=0.2,
random_state=seed)
model = LogisticRegression()
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
matrix = confusion_matrix(Y_test, predicted)
print(matrix)
Area Under Curve
• Area Under Curve(AUC) is one of the most widely used metrics for
  evaluation. It is used for binary classification problem. AUC of a classifier is
  equal to the probability that the classifier will rank a randomly chosen
  positive example higher than a randomly chosen negative example. Before
  defining AUC, let us understand two basic terms :
• True Positive Rate (Sensitivity) : True Positive Rate is defined as TP/
  (FN+TP). True Positive Rate corresponds to the proportion of positive data
  points that are correctly considered as positive, with respect to all positive
  data points.
Contd..
• False Positive Rate (1-Specificity) : False Positive Rate is defined as FP / (FP+TN). False Positive
  Rate corresponds to the proportion of negative data points that are mistakenly considered as
  positive, with respect to all negative data points.
• Specificity is also known as True negative rate and False positive rate =(1-specificity).
• So,
Contd..
• False Positive Rate and True Positive Rate both have values in the range
  [0, 1]. FPR and TPR both are computed at threshold values such as (0.00,
  0.02, 0.04, …., 1.00) and a graph is drawn. AUC is the area under the
  curve of plot False Positive Rate vs True Positive Rate at different points in
  [0, 1].
• As evident, AUC has a range of [0, 1]. The greater the value, the better is
  the performance of our model.
Python example
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe=pandas.read_csv("pima-indians-diabetes.data.csv", names=names)
X = dataframe.iloc[:, :-1]
Y= dataframe.iloc[:, 8]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = LogisticRegression()
scoring = 'roc_auc'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results)
print(results.mean())
print(results.std())
F1 Score
  • F1 Score is used to measure a test’s accuracy
  • F1 Score is the Harmonic Mean between precision and recall. The range
    for F1 Score is [0, 1]. It tells you how precise your classifier is (how many
    instances it classifies correctly), as well as how robust it is (it does not
    miss a significant number of instances).
  • High precision but lower recall, gives you an extremely accurate, but it
    then misses a large number of instances that are difficult to classify. The
    greater the F1 Score, the better is the performance of our model.
    Mathematically, it can be expressed as :
Contd..
F1 Score tries to find the balance between precision and recall.
• Precision : It is the number of correct positive results divided by the number of
  positive results predicted by the classifier.
• Recall : It is the number of correct positive results divided by the number of all
  relevant samples (all samples that should have been identified as positive).
Python example
# Cross Validation Classification Report
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe=pandas.read_csv("pima-indians-diabetes.data.csv", names=names)
X = dataframe.iloc[:, :-1]
Y= dataframe.iloc[:, 8]
seed = 7
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed)
model = LogisticRegression()
model.fit(X_train, Y_train)
predicted = model.predict(X_test)
report = classification_report(Y_test, predicted)
print(report)
Mean Absolute Error
• Mean Absolute Error is the average of the difference between the Original Values
  and the Predicted Values.
• It gives us the measure of how far the predictions were from the actual output.
  However, they don’t gives us any idea of the direction of the error i.e. whether we
  are under predicting the data or over predicting the data.
• Mathematically, it is represented as :
Boston Housing dataset
Attribute Information:
  1. CRIM      per capita crime rate by town
  2. ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
  3. INDUS      proportion of non-retail business acres per town
  4. CHAS       Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  5. NOX       nitric oxides concentration (parts per 10 million)
  6. RM       average number of rooms per dwelling
  7. AGE       proportion of owner-occupied units built prior to 1940
  8. DIS      weighted distances to five Boston employment centres
  9. RAD       index of accessibility to radial highways
  10. TAX      full-value property-tax rate per $10,000
  11. PTRATIO pupil-teacher ratio by town
  12. B      1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
  13. LSTAT     % lower status of the population
  14. MEDV      Median value of owner-occupied homes in $1000’s
(The dataset can be available on Kaggle.com as Boston Housing Dataset)
Python Example
# Cross Validation Regression MAE
• import pandas
• from sklearn import model_selection
• from sklearn.linear_model import LinearRegression
• names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
• dataframe = pandas.read_csv(“housing.data”, delim_whitespace=True, names=names)
• array = dataframe.values
• X = array[:,0:13]
• Y = array[:,13]
• seed = 7
• kfold = model_selection.KFold(n_splits=10, random_state=seed)
• model = LinearRegression()
• scoring = 'neg_mean_absolute_error'
• results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
• print("MAE: %.3f (%.3f)") % (results.mean(), results.std())
Mean Squared Error
  • Mean Squared Error(MSE) is quite similar to Mean Absolute Error, the
    only difference being that MSE takes the average of the square of the
    difference between the original values and the predicted values.
  • The advantage of MSE being that it is easier to compute the gradient,
    whereas Mean Absolute Error requires complicated linear programming
    tools to compute the gradient.
  • As, we take square of the error, the effect of larger errors become more
    pronounced then smaller error, hence the model can now focus more on
    the larger errors.
Python Example
# Cross Validation Regression MSE
import pandas
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
dataframe = pandas.read_csv(“housing.data”, delim_whitespace=True, names=names)
array = dataframe.values
X = array[:,0:13]
Y = array[:,13]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = LinearRegression()
scoring = 'neg_mean_squared_error'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("MSE: %.3f (%.3f)") % (results.mean(), results.std())
R-Squared Metric
• The R^2 (or R Squared) metric provides an indication of the goodness of fit of a set
  of predictions to the actual values. In statistical literature, this measure is called the
  coefficient of determination.
• This is a value between 0 and 1 for no-fit and perfect fit respectively.
•
Python Example
# Cross Validation Regression R^2
import pandas
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.data"
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
dataframe = pandas.read_csv(“housing.data”, delim_whitespace=True, names=names)
array = dataframe.values
X = array[:,0:13]
Y = array[:,13]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = LinearRegression()
scoring = 'r2'
results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("R^2: %.3f (%.3f)") % (results.mean(), results.std())
Contd..
          Thank You