Department of Computer Engineering
Experiment No: 05
                                                                                      Date:
                                                                                      Roll No:
   Aim: To Study and Implement Confusion Matrix
   Theory:
   A confusion matrix is a fundamental tool in the realm of machine learning and statistics that
   provides a detailed and comprehensive assessment of the performance of a classification
   model. It helps in understanding how well the model is classifying instances into various
   classes and provides valuable insights into the strengths and weaknesses of the model's
   predictions.
   The confusion matrix is typically presented in a tabular form, with rows representing the
   actual or true classes of instances and columns representing the predicted classes made by the
   model. The four main components of the confusion matrix are:
● True Positives (TP): These are instances where the model correctly predicts the positive
  class. In other words, the model accurately identifies instances belonging to the target class as
  positive.
● True Negatives (TN): These instances are correctly predicted as the negative class. The
  model accurately recognizes instances that do not belong to the target class as negative.
● False Positives (FP): These instances are predicted as positive by the model but actually
  belong to the negative class. This is also known as a Type I error, where the model
  incorrectly identifies instances as positive.It is calculated as
                  Type 1 Error= FP / (TN+FP)
● False Negatives (FN): These are instances that the model predicts as negative but are
  actually from the positive class. This is a Type II error, where the model fails to identify
  instances that should have been classified as positive.It is calculated as
                   Type 2 Error= FN / (TP+FN)
                                                    1
                                Department of Computer Engineering
    The confusion matrix allows for the computation of various performance metrics that offer
    deeper insights into the model's performance:
●   Accuracy: This metric calculates the proportion of correctly classified instances (TP and TN)
    among all instances. While accuracy provides an overall measure of correctness, it may not
    be suitable when classes are imbalanced.
●   Precision: Precision is the ratio of true positives to the total number of instances predicted as
    positive (TP + FP). It represents the model's ability to avoid falsely labeling negative
    instances as positive.
●   Recall (Sensitivity or True Positive Rate): Recall is the ratio of true positives to the total
    number of instances that actually belong to the positive class (TP + FN). It indicates the
    model's capability to correctly identify positive instances.
●   F1-Score: The F1-score is the harmonic mean of precision and recall. It provides a balance
    between these two metrics and is particularly useful when there's a trade-off between
    precision and recall.
●   Specificity (True Negative Rate): Specificity measures the model's ability to correctly
    identify negative instances, calculated as TN / (TN + FP)
●   False Positive Rate: This is the ratio of false positives to the total number of actual
    negatives, computed as FP / (TN + FP)
    By analyzing these metrics derived from the confusion matrix, machine learning practitioners
    can gain a holistic view of the model's performance. This, in turn, helps them make informed
    decisions about model improvements, parameter tuning, and addressing class imbalances.
    The confusion matrix is especially valuable when dealing with multi-class classification
    problems, where it can provide insights into how well the model distinguishes between
    multiple classes.
    The confusion matrix is an indispensable tool that offers a comprehensive and nuanced
    evaluation of classification model performance. Its insights enable practitioners to understand
    the trade-offs between different performance metrics and make informed decisions to
    enhance the model's accuracy and reliability. The confusion matrix is not only useful for
    model evaluation but also for understanding the types of errors a model makes. By analyzing
    the matrix's components, one can identify patterns and trends in misclassifications. This
    understanding can lead to more targeted improvements in data preprocessing, feature
    engineering, or model selection.For instance, in medical diagnosis, a false negative (missing a
    positive case) could be much more critical than a false positive (flagging a negative case as
    positive). In such cases, decision-makers can adjust the model's threshold based on their
    priorities      and        tolerance      for       different       types       of       errors.
                                                     2
                                     Department of Computer Engineering
Program:
import numpy as np
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score,
precision_score, recall_score, f1_score
import seaborn as sns
import matplotlib.pyplot as plt
# Define actual and predicted values
actual = np.array(['Pet', 'Wild', 'Pet', 'Wild', 'Pet', 'Wild', 'Pet', 'Pet', 'Wild', 'Wild'])
predicted = np.array(['Pet', 'Pet', 'Pet', 'Wild', 'Pet', 'Pet', 'Pet', 'Wild', 'Wild', 'Wild'])
# Compute confusion matrix
cm = confusion_matrix(actual, predicted, labels=['Pet', 'Wild'])
# Plot confusion matrix
sns.heatmap(cm, annot=True, fmt='g', xticklabels=['Pet', 'Wild'], yticklabels=['Pet', 'Wild'])
plt.ylabel('Actual', fontsize=13)
plt.title('Confusion Matrix', fontsize=17, pad=20)
plt.gca().xaxis.set_label_position('top')
plt.xlabel('Prediction', fontsize=13)
plt.gca().xaxis.tick_top()
plt.gca().figure.subplots_adjust(bottom=0.2)
plt.gca().figure.text(0.5, 0.05, 'Prediction', ha='center', fontsize=13)
plt.show()
# Print classification report
print(classification_report(actual, predicted, labels=['Pet', 'Wild']))
# Calculate and print accuracy, precision, recall, f1 score
accuracy = accuracy_score(actual, predicted)
precision = precision_score(actual, predicted, pos_label='Pet')
recall = recall_score(actual, predicted, pos_label='Pet')
f1 = f1_score(actual, predicted, pos_label='Pet')
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")
# Compute TN, FP, FN, TP
                                                  3
TN, FP, FN, TP = confusion_matrix(actual, predicted, labels=['Pet', 'Wild']).ravel()
                                 Department of Computer Engineering
# Calculate specificity, Type I and Type II errors
specificity = TN / (TN + FP)
type1_error = FP / (FP + TN)
type2_error = FN / (FN + TP)
print(f"Specificity: {specificity}")
print(f"Type I Error: {type1_error}")
print(f"Type II Error: {type2_error}"
Output:
                                                     4
                         Department of Computer Engineering
Conclusion: Thus we have studied and Implemented Confusion Matrix.