0% found this document useful (0 votes)
22 views81 pages

Main

Uploaded by

AnshTanwar19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views81 pages

Main

Uploaded by

AnshTanwar19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

*

2 CONTENTS

Contents

MAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

MSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

MSLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

RMSLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

MAPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

sMAPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

wMAPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

MASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

MSPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

MDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

MAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

MPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

MGD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

D2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Explained Variance Score . . . . . . . . . . . . . . . . . . . . . . . . . 25

Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

FPR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

FNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

FNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

TPR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

TNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Balanced Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
CONTENTS 3

F1-score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

F-beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

F-beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

ROC AUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

PR AUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Brier Score Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Log Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Jaccard Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

D2 Log Loss Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

P4-metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Cohen’s Kappa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Phi Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

MCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Mutual Info Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Adjusted Mutual Info score . . . . . . . . . . . . . . . . . . . . . . . . 53

Normalized Mutual Info Score . . . . . . . . . . . . . . . . . . . . . . 54

Rand Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Adjusted Rand Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

CH Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Contingency Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Pair Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Completeness Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Davis Bouldin Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Fowlkes Mallows Score . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Homogeneity Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

V Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Homogeneity Completeness V Measure . . . . . . . . . . . . . . . . 65


4 CONTENTS

Silhouette Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Consensus Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Ranking Sample Metric . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Computer Vision Sample Metric . . . . . . . . . . . . . . . . . . . . . 71


NLP Sample Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
GenAI Sample Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Probabilistic Sample Metric . . . . . . . . . . . . . . . . . . . . . . . . 77

Bias Sample Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79


Bussiness Sample Metric . . . . . . . . . . . . . . . . . . . . . . . . . 81
Introduction
Regression
Regression – MAE

MAE
Mean Absolute Error

MAE is one of the most popular regression accuracy metrics. It is calcu-


lated as the sum of absolute errors divided by the sample size. It is a scale-
dependent accuracy measure which means that it uses the same scale as
the data being measured.

forecast value
1∑
n
M AE = |Yt − Ŷt |
n t=1
actual value
number of samples

The smaller the MAE, the closer the model’s predictions are to the actual
targets. Theoretically, MAE belongs in the 0 to +infinity range. One of the
aspects that makes MAE popular is that it is easy to understand and com-
pute.
When to use MAE?

Use MAE when you need an interpretable, robust metric that penalizes all
errors equally. Avoid using it when larger errors need more significant pe-
nalization.

Strength Weakness

• MAE provides an easy-to- • MAE can be biased when the


understand value since it distribution of errors is skewed,
represents the average error in as it does not account for the
the same units as the data. direction of the error.
• MAE treats under-predictions • The absolute value function
and over-predictions equally. used in MAE is not differen-
Bear in mind that this may not tiable at zero, which can pose
be desirable in all contexts. challenges in optimization and
gradient-based learning algo-
rithms.

Daily metrics at nannyml.com/metrics 3.1 Regression: MAE


Figure 2.1 MAE. Top: The rate of
change of MAE is linear. Each
error contributes proportionally to
the total error. Right: We can see
that MAE is always non-negative,
symmetrical, and cantered around
zero. By looking at this plot it is
clear that MAE is not differentiable
at zero.

Did you know that...


A forecast method that minimizes MAE will lead to forecasts of
the median.

Other related metrics


Other metrics commonly explored alongside MAE are Mean Squared Error
(MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage
Error (MAPE).

Daily metrics at nannyml.com/metrics 3.1 Regression: MAE


Regression – MSE

MSE
Mean Squared Error

Daily metrics at nannyml.com/metrics 3.2 Regression: MSE


Regression – MSLE

MSLE
Mean Squared Log Error

Daily metrics at nannyml.com/metrics 3.3 Regression: MSLE


Regression – RMSE

RMSE
Root Mean Squared Error

Daily metrics at nannyml.com/metrics 3.4 Regression: RMSE


Regression – RMSLE

RMSLE
Root Mean Squared Log Error

Daily metrics at nannyml.com/metrics 3.5 Regression: RMSLE


Regression – MAPE

MAPE
Mean Absolute Percentage Error

Daily metrics at nannyml.com/metrics 3.6 Regression: MAPE


Regression – sMAPE

sMAPE
Symmetric Mean Absolute Percentage Error

Daily metrics at nannyml.com/metrics 3.7 Regression: sMAPE


Regression – wMAPE

wMAPE
Weighted Mean Absolute Percentage Error

Daily metrics at nannyml.com/metrics 3.8 Regression: wMAPE


Regression – MASE

MASE
Mean Absolute Scaled Error

Daily metrics at nannyml.com/metrics 3.9 Regression: MASE


Regression – MSPE

MSPE
Mean Squared Prediction Error

Daily metrics at nannyml.com/metrics 3.10 Regression: MSPE


Regression – MDA

MDA
Mean Directional Accuracy

Daily metrics at nannyml.com/metrics 3.11 Regression: MDA


Regression – MAD

MAD
Mean Absolute Deviation

Daily metrics at nannyml.com/metrics 3.12 Regression: MAD


Regression – MPD

MPD
Mean Poisson Deviance

Daily metrics at nannyml.com/metrics 3.13 Regression: MPD


Regression – MGD

MGD
Mean Gamma Deviance

Daily metrics at nannyml.com/metrics 3.14 Regression: MGD


Regression – R2

R2
R-squared

R-squared needs little introduction; it’s featured in every statistics book.


Also known as the coefficient of determination, it’s commonly introduced
as a measure that quantifies the amount of variability explained by the re-
gression model.

Predicted value
∑n
t=1 (Yt − Ŷt )
2
R = 1 − ∑n
2

t=1 (Yt − Ȳ )
2

Target value Mean of targets

However, it may be easier to think of R-squared as a way to scale MSE be-


tween a perfect model and one that always predicts the mean. A score of
1.0 means Yt and Ŷt are equal. Despite its name, R-squared can be negative
if the model performs worse than just predicting the mean.

When to use R-squared?


R-squared can be more intuitive than MAE, MSE, RMSE, and other scale-
dependent metrics since it can be expressed as a percentage, whereas
the latter have arbitrary ranges.

Strength Weakness

• Easy Interpretation. Especially • Just like MSE, R-squared can


when interpreted as a scaled be sensitive to outliers, as large
MSE. errors have a greater impact.
• R-squared is widely accepted • Be cautious about which value
in statistical analysis and re- of Ȳ to use. Most implemen-
search, making it a standard tations default to Ȳtest which
choice for evaluating model can lead to information leak-
performance. age. It is advisable to use Ȳtrain
instead.

Daily metrics at nannyml.com/metrics 3.15 Regression: R2


Figure 2.15 R-Squared. Top: The
areas of the purple squares rep-
resent the MSE of the evaluated
model. While, the areas of the
red squares represent the MSE of
a model that always predicts the
mean. R-squared can be written as
R2 = 1 − MMSE
SEmodel
baseline
Right: R-squared quickly drops into
the negative region in cases where
the mean is a better predictor than
the evaluated model.

Did you know that...


R-squared is more than 100 years old; it was introduced by ge-
neticist Sewall Wright in 1921.

R-squared alternatives and related metrics


Other metrics commonly explored alongside R-squared are Adjusted Rsquared,
out-of-sample R-squared, Mean Absolute Error (MAE), Mean Squared Error
(MSE), Root Mean Squared Error (RMSE), etc.

Daily metrics at nannyml.com/metrics 3.15 Regression: R2


Regression – D2

D2
D2 Absolute Score

Daily metrics at nannyml.com/metrics 3.16 Regression: D2


Regression – Explained Variance Score

Explained Variance Score


Explained Variance Score

Daily metrics at nannyml.com/metrics


3.17 Regression: Explained Variance Score
Classification
Classification – Confusion Matrix

Confusion Matrix
Confusion Matrix

Daily metrics at nannyml.com/metrics 4.1 Classification: Confusion Matrix


Classification – FPR

FPR
False Positive Rate

The False Positive Rate (FPR), also known as the false alarm ratio or fall-
out, measures how often negative instances are incorrectly classified as
positive in binary classification.

False positives

FP
FPR =
FP + TN
True negatives

FPR ranges from 0 (no false alarms) to 1 (all predicted positives are incor-
rect). FPR can also be interpreted as the probability that a negative in-
stance will be incorrectly identified as positive.

When to use FPR?


Use FPR when you need to evaluate how well a classifier avoids false pos-
itives, especially when false positives have significant costs, like in medi-
cal diagnostics or security systems. It’s also useful for understanding the
trade-off between true positive rate (sensitivity) and false positive rate.

Strength Weakness

• It provides a clear and intuitive • FPR does not consider true


measure of a classifier’s false positive instances.
positive performance.
• FPR can be sensitive to class
• It helps identify scenarios imbalance, as it may be easier
where the classifier is overly to achieve a low FPR when the
sensitive and prone to false negative class is dominant.
alarms.
• FPR doesn’t exist in isolation;
it’s often important to show its
relationship with another key
metric. (e.g., TPR, Precision,
Recall).

Daily metrics at nannyml.com/metrics 4.2 Classification: FPR


Figure 3.1 False Positive Rate.
Top: 3D surface illustrating
FPR’s non-linear relationship
with FP and TN. FPR is lowest
(blue) when FP is low. It in-
creases (red) as FP increases.
Right: Shows how FPR de-
creases hyperbolically as to-
tal negative cases increase for
fixed FP values. Lower FP main-
tains better FPR.

Did you know that...


In the context of statistical hypothesis testing, the FPR is also
known as the ”type I error rate” or the probability of rejecting a
true null hypothesis.

FPR alternatives and related


metrics
Other metrics used alongside or instead of FPR include True Positive Rate
(TPR), Precision, F1-Score, Receiver Operating Characteristic (ROC AUC),
and Specificity.

Daily metrics at nannyml.com/metrics 4.2 Classification: FPR


Classification – FNR

FNR
False Negative Rate

The False Negative Rate (FNR), also known as the miss rate, measures the
proportion of actual positive instances incorrectly classified as negative in
binary classification.

False negatives

FN
F NR =
FN + TP
True positives

FNR ranges from 0 (no false negatives) to 1 (all positive instances misclas-
sified). It represents the probability that a positive instance will be incor-
rectly identified as negative.

When to use FNR?


Use FNR when the cost of missing positive cases is high (e.g., in medical
diagnostics or fraud detection) or when you must balance false negatives
and false positives.

Strength Weakness

• It directly measures the rate of


• It doesn’t account for true neg-
missed positive cases.
atives or false positives.
• It is critical in fields where false
• It can be misleading in highly
negatives have severe conse-
imbalanced datasets.
quences.
• It should be considered along-
• Complements True Positive
side other metrics for a com-
Rate (TPR) in assessing classi-
prehensive evaluation.
fier performance.

Daily metrics at nannyml.com/metrics 4.3 Classification: FNR


Figure 3.1 False Negative Rate.
Top: 3D surface illustrating
FPR’s non-linear relationship
with FP and TN. FPR is lowest
(blue) when FP is low. It in-
creases (red) as FP increases.
Right: Shows how FPR de-
creases hyperbolically as to-
tal negative cases increase for
fixed FP values. Lower FP main-
tains better FPR.

Did you know that...


In hypothesis testing, reducing the False Negative Rate (β) in-
creases the power of the test (1 − β), but often at the cost of
increasing the False Positive Rate (α). This demonstrates the in-
herent trade-off between Type I and Type II errors in statistical
testing.

FPR alternatives and related


metrics
Other metrics used alongside or instead True Positive Rate (TPR/Recall/Sensitivity)
Specificity, Precision, F1-Score, and ROC curve.

Daily metrics at nannyml.com/metrics 4.3 Classification: FNR


Classification – FNR

FNR
False Negative Rate

Daily metrics at nannyml.com/metrics 4.4 Classification: FNR


Classification – TPR

TPR
True Positive Rate (Recall/Sensitivity)

Daily metrics at nannyml.com/metrics 4.5 Classification: TPR


Classification – TNR

TNR
True Negative Rate (Specificity)

Daily metrics at nannyml.com/metrics 4.6 Classification: TNR


Classification – Accuracy

Accuracy
Accuracy

Daily metrics at nannyml.com/metrics 4.7 Classification: Accuracy


Classification – Balanced Accuracy

Balanced Accuracy
Balanced Accuracy

Daily metrics at nannyml.com/metrics


4.8 Classification: Balanced Accuracy
Classification – Precision

Precision
Precision

Daily metrics at nannyml.com/metrics 4.9 Classification: Precision


Classification – F1-score

F1-score
F1-score

Daily metrics at nannyml.com/metrics 4.10 Classification: F1-score


Classification – F-beta

F-beta
F-beta

Daily metrics at nannyml.com/metrics 4.11 Classification: F-beta


Classification – F-beta

F-beta
F-beta

Daily metrics at nannyml.com/metrics 4.12 Classification: F-beta


Classification – ROC AUC

ROC AUC
Area Under the Receiver Operating Characteristic Curve

Daily metrics at nannyml.com/metrics 4.13 Classification: ROC AUC


Classification – PR AUC

PR AUC
Area Under the Precision-Recall Curve

Daily metrics at nannyml.com/metrics 4.14 Classification: PR AUC


Classification – Brier Score Loss

Brier Score Loss


Brier Score Loss

Daily metrics at nannyml.com/metrics4.15 Classification: Brier Score Loss


Classification – Log Loss

Log Loss
Log Loss

Daily metrics at nannyml.com/metrics 4.16 Classification: Log Loss


Classification – Jaccard Score

Jaccard Score
Jaccard Score

Daily metrics at nannyml.com/metrics 4.17 Classification: Jaccard Score


Classification – D2 Log Loss Score

D2 Log Loss Score


D2 Log Loss Score

Daily metrics at nannyml.com/metrics


4.18 Classification: D2 Log Loss Score
Classification – P4-metric

P4-metric
P4-metric

Daily metrics at nannyml.com/metrics 4.19 Classification: P4-metric


Classification – Cohen’s Kappa

Cohen’s Kappa
Cohen’s Kappa

Daily metrics at nannyml.com/metrics 4.20 Classification: Cohen’s Kappa


Classification – Phi Coefficient

Phi Coefficient
Phi Coefficient

Daily metrics at nannyml.com/metrics 4.21 Classification: Phi Coefficient


Classification – MCC

MCC
Matthew’s Correlation Coefficient

Daily metrics at nannyml.com/metrics 4.22 Classification: MCC


Clustering
Clustering – Mutual Info Score

Mutual Info Score


Mutual Info Score

Daily metrics at nannyml.com/metrics 5.1 Clustering: Mutual Info Score


Clustering – Adjusted Mutual Info score

Adjusted Mutual Info score


Adjusted Mutual Information score

Daily metrics at nannyml.com/metrics


5.2 Clustering: Adjusted Mutual Info score
Clustering – Normalized Mutual Info Score

Normalized Mutual Info Score


Normalized Mutual Info Score

Daily metrics at nannyml.com/metrics


5.3 Clustering: Normalized Mutual Info Score
Clustering – Rand Score

Rand Score
Rand Score

Daily metrics at nannyml.com/metrics 5.4 Clustering: Rand Score


Clustering – Adjusted Rand Score

Adjusted Rand Score


Adjusted Rand Score

Daily metrics at nannyml.com/metrics5.5 Clustering: Adjusted Rand Score


Clustering – CH Score

CH Score
Calinski Harabasz Score

Daily metrics at nannyml.com/metrics 5.6 Clustering: CH Score


Clustering – Contingency Matrix

Contingency Matrix
Contingency Matrix

Daily metrics at nannyml.com/metrics 5.7 Clustering: Contingency Matrix


Clustering – Pair Confusion Matrix

Pair Confusion Matrix


Pair Confusion Matrix

Daily metrics at nannyml.com/metrics


5.8 Clustering: Pair Confusion Matrix
Clustering – Completeness Score

Completeness Score
Completeness Score

Daily metrics at nannyml.com/metrics5.9 Clustering: Completeness Score


Clustering – Davis Bouldin Score

Davis Bouldin Score


Davis Bouldin Score

Daily metrics at nannyml.com/metrics5.10 Clustering: Davis Bouldin Score


Clustering – Fowlkes Mallows Score

Fowlkes Mallows Score


Fowlkes Mallows Score

Daily metrics at nannyml.com/metrics


5.11 Clustering: Fowlkes Mallows Score
Clustering – Homogeneity Score

Homogeneity Score
Homogeneity Score

Daily metrics at nannyml.com/metrics5.12 Clustering: Homogeneity Score


Clustering – V Measure

V Measure
V Measure

Daily metrics at nannyml.com/metrics 5.13 Clustering: V Measure


Clustering – Homogeneity Completeness V Measure

Homogeneity Completeness V Measure


Homogeneity Completeness V Measure

Daily metrics at nannyml.com/metrics


5.14 Clustering: Homogeneity Completeness V Measure
Clustering – Silhouette Score

Silhouette Score
Silhouette Score

Daily metrics at nannyml.com/metrics 5.15 Clustering: Silhouette Score


Clustering – Consensus Score

Consensus Score
Consensus Score

Daily metrics at nannyml.com/metrics 5.16 Clustering: Consensus Score


Ranking
Ranking – Ranking Sample Metric

Ranking Sample Metric


Ranking Sample Metric

Daily metrics at nannyml.com/metrics6.1 Ranking: Ranking Sample Metric


Computer Vision
Computer Vision – Computer Vision Sample Metric

Computer Vision Sample Metric


Computer Vision Sample Metric

Daily metrics at nannyml.com/metrics


7.1 Computer Vision: Computer Vision Sample Metric
NLP
NLP – NLP Sample Metric

NLP Sample Metric


NLP Sample Metric

Daily metrics at nannyml.com/metrics 8.1 NLP: NLP Sample Metric


GenAI
GenAI – GenAI Sample Metric

GenAI Sample Metric


GenAI Sample Metric

Daily metrics at nannyml.com/metrics 9.1 GenAI: GenAI Sample Metric


Probabilistic
Probabilistic – Probabilistic Sample Metric

Probabilistic Sample Metric


Probabilistic Sample Metric

Daily metrics at nannyml.com/metrics


10.1 Probabilistic: Probabilistic Sample Metric
Bias & Fairness
Bias & Fairness – Bias Sample Metric

Bias Sample Metric


Bias Sample Metric

Daily metrics at nannyml.com/metrics


11.1 Bias & Fairness: Bias Sample Metric
Bussiness
Bussiness – Bussiness Sample Metric

Bussiness Sample Metric


Bussiness Sample Metric

Daily metrics at nannyml.com/metrics


12.1 Bussiness: Bussiness Sample Metric

You might also like