Classification
CLASSIFICATION
TYPES
Binary Classification Multi-class Classification
Binary classification is the task of Multiclass classification is the task
classifying the elements of a given of classifying the elements of a
set into two groups on the basis of given set into more than two
a classification rule. groups on the basis of a
classification rule.
BMSCE - ME | PA G E
MCL - Python 2
Classification
• Can you separate red class
from blue class ?
BMSCE - ME
MCL - Python
| PA G E 3
Linear Boundary
• Straight line for two dimensions.
BMSCE - ME
MCL - Python
| PA G E 4
Linear Boundary
• Straight line for two dimensions.
• Plane for three dimensions.
• Hyperplane for higher dimension
BMSCE - ME
MCL - Python
| PA G E 5
Confusion Matrix Accuracy
Predicted
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑
𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠
Positive Negative ¿ ( 𝑇𝑃 +𝑇𝑁 )
A c c u+𝐹𝑃
(𝑇𝑃+𝐹𝑁 r a c +𝑇𝑁
y )
Positive a (TP) b (FN)
Actual Confusion Matrix
Negative c (FP) d (TN)
Other error metrics
• Precision
Sensitivity Specificity • Re c a l l
• F score
a / (a + c) d/ (b + d) • ROC cur ve
Algorithm
Construct a frequency table for the target and
select its most frequent value.
For classification this is Baseline Model
Disease
Yes No
9 6
0.6 0.4
ZeroR Method
BMSCE - ME
MCL - Python
| PA G E 7
Blood
Pressure
Classification
Target
variable is
discrete
Target : 0/1
BMSCE - ME
MCL - Python
| PA G E 11
Can you fit a
Linear regression
model ?
BMSCE - ME
MCL - Python
| PA G E 12
BMSCE - ME
MCL - Python
| PA G E 13
BMSCE - ME
MCL - Python
| PA G E 14
BMSCE - ME
MCL - Python
| PA G E 15
BMSCE - ME
MCL - Python
| PA G E 16
BMSCE - ME
MCL - Python
| PA G E 17
BMSCE - ME
MCL - Python
| PA G E 18
BMSCE - ME
MCL - Python
| PA G E 19
Algorithm
Model the log-odds ratio as a linear function of
independent variables and then convert the log odds ratio
to probability using the sigmoid (logistic) function
Logistic Function
Logistic
Regression BMSCE - ME | PA G E 20
MCL - Python
Algorithm
C a l c u l a te t h e p o s te r i o r p ro ba b i l i t y, P ( A | B ) ,
from P(A), P(B), and P(B|A). Naive Bayes
classifier assume that the effect of the
value of a predictor ( x) on a given class
(c) is independent of the values of other
predictors.
Naive Bayes
BMSCE - ME
MCL - Python
| PA G E 22
Algorithm
C a l c u l a te t h e p o s te r i o r p ro ba b i l i t y, P ( A | B ) ,
from P(A), P(B), and P(B|A). Naive Bayes
classifier assume that the effect of the
value of a predictor ( x) on a given class
(c) is independent of the values of other
predictors.
P(A|D) P(A)
P(D)
Naive Bayes
BMSCE - ME | PA G E
P(D|Alco & S & Age) = P(A|D) * P(S|D)MCL
* P(Age|D)
- Python 23
* P(D)
Naive Bayes
PROS
• Very easy and fast
• Can be used for multiclass prediction
• Performs well with categorical features
• If features are independent NB gives
superior predictions
CONS
• Features are not independent in most real
life examples
• Issue with category that was not found in
training data
• Assumption that the numerical features
follow normal distribution
P(Y | X) = P(X1 | Y) * P( X2 | Y)…. P(Xn|Y) * P(Y)
BMSCE - ME
MCL - Python
| PA G E 24
Algorithm
SVM performs classification by coming up with a
hyperplane that maximizes the separation margin
between the two classes. The vectors that support
the hyperplane are support vectors
Plot all the data rows as a point in N-Dimensional
space
N dimensions refer to N features
So the point will have feature as the value for a
particular coordinate
Support Vector Come up with a hyperplane that can separate these
Machines (SVM) points in the best way possible into different classes
in that N-dimensions BMSCE - ME | PA G E
25
MCL - Python
Which one would you select ?
Why ?
The black line
► Because it is classifying the points
accurately with the highest margin
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 26
Which one would you select ?
Why ?
I would again select the black line
► Because it is classifying the points
accurately with the highest margin
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 27
Maximum Margin Classifier
► Classifies with the maximum margin
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 28
What will you do in this case?
Can we calculate some other feature from X
and Y and then try to separate this ?
How about Z = X2 + Y2
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 29
What will you do in this case?
Can we calculate some other feature from F1
and F2 and then try to separate this ?
How about Z = F12 + F22
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 30
What will you do in this case?
Can we calculate some other feature from F1
and F2 and then try to separate this ?
How about Z = F12 + F22
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 31
What will you do in this case?
Can we calculate some other feature from F1
and F2 and then try to separate this ?
How about Z = F12 + F22
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 32
Support Vector
Machines
PROS
Works very well for small and clean
datasets
Works well with clear separation margins
Very effective in high dimensional space
Kernels give more flexibility
CONS
Large data sets require lot of training
time and eventually won’t perform well
Can’t do a good job when noisy data is
given ( overlapping classes)
BMSCE - ME
MCL - Python
| PA G E 33
Algorithm
Decision tree uses Entropy and Information Gain to
construct a tree. Top-down, greedy search through
the space of possible branches with no backtracking
Entropy for 1 attribute
Entropy for 1 attribute
Decision Trees
BMSCE - ME
MCL - Python
| PA G E 34
BMSCE - ME
MCL - Python
| PA G E 35
BMSCE - ME
MCL - Python
| PA G E 36
Steps in Decision Trees
Step
1 Calculate entropy of the target variable
Step Calculate entropy for each branch (split by
2 various features)
Step
3 Calculate Gain for each of the above splits
Step Choose attribute with the largest
4 information gain as the decision node
Step Check if the entropy is zero, else continue
5 further
Step Run recursively on the all branches, until all
6 data is classified (Branch entropy == 0)
BMSCE - ME
MCL - Python
| PA G E 37
Decision Trees
PROS
Implicitly perform feature selection
Discover Nonlinear relationships
Not affected by outliers
Easy to interpret and explain
Rules generated which can be shared
easily
CONS
Decision Trees do not work well if you
have smooth boundaries
Super attributes will give higher info gain
Missing values are ignored
BMSCE - ME
MCL - Python
| PA G E 38
Bias-Variance
Tradeoff
BIAS :
How well the model fits the data
Simpler models: Stable (low variance) but they VARIANCE :
don't get close to the truth (high bias).
How much the model changes based on
Complex models: More prone to being over fit changes in the inputs
(high variance) but they are expressive enough to get
close to the truth (low bias).
BMSCE - ME
MCL - Python
| PA G E 39
Decision Trees +
Bagging
If the number of cases in the training set is N, sample
01 N cases at random - but with replacement, from the
original data. This sample will be the training set for
growing the tree.
02 If there are M input variables, a number m<<M is
specified such that at each node, m variables are
selected at random out of the M and the best split on
Random Forest these m is used to split the node.
The value of m is held constant during the forest
03 growing. BMSCE - ME | PA G E 40
MCL - Python
BAGGING
• It is also called as bootstrap aggregating
• Bagging tries to combine predictions from
multiple similar learners trained on
different datasets by averaging their
predictions.
• It reduces variance and helps to avoid
overfitting
• Mostly used with decision trees
BMSCE - ME
MCL - Python
| PA G E 41
Features of
Random Forests
Implicitly gives estimates of what
01 Unexcelled in accuracy among
current algorithms.
04 variables are important in the 07 Generated forests can be
saved for future use on other
classification. data.
It has an effective method for
02 Runs efficiently on
large data sets.
05 estimating missing data and maintains 08 It offers an experimental
method for detecting variable
accuracy when a large proportion of
interactions
the data are missing.
03 06 09
Can handle thousands of input It has methods for balancing error
Uses OOB for error
variables without variable in class population unbalanced
calculation
deletion. data sets.
BMSCE - ME
MCL - Python
| PA G E 42