Classification
Part 1
Outline
◘ What Is Classification?
◘ Classification Examples
◘ Classification Methods
– Decision Trees
– Bayesian Classification
– K-Nearest Neighbor
– Neural Network
– Support Vector Machines (SVM)
– Fuzzy Set Approaches
What Is Classification?
◘ Classification
– Construction of a model to classify data
– When constructing the model, use the training set and the class labels
(i.e. yes no) in the target column
Training Set Model
Classification Steps
1. Model construction
– Each tuple is assumed to belong to a predefined class
– The set of tuples used for model construction is training set
– The model is represented as classification rules, trees, or mathematical formulae
2. Test Model
– Using test set, estimate accuracy rate of the model
• Accuracy rate is the percentage of test set samples that are correctly classified
by the model
3. Model Usage (Classifying future or unknown objects)
– If the accuracy is acceptable, use the model to classify data tuples whose classes
don’t known
Classification Steps
Tid Refund Marital Taxable Refund Marital Taxable
Status Income Cheat Status Income Cheat
1 Yes Single 125K No No Single 75K No
2 No Married 100K No Yes Married 50K Yes 2. Test Model
3 No Single 70K No No Married 150K Yes
4 Yes Married 120K No Yes Divorced 90K No
Test
10
5 No Divorced 95K Yes
6 No Married 60K No
Set
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No 1. Construct Model
10 No Single 90K Yes
10
Training
Learn
Model
Set Classifier
Refund Marital Taxable
Status Income Cheat
Yes Divorced 50K ? New
No Married 50K ? Data
Yes Single 150K ? 3. Use Model
Classification (A Two-Step Process)
Data
Training Data Test Data Mining Model To Predict
Mining Model
DM DM
Engine Engine
Mining Model Predicted Data
Accuracy Rate
of the Model
Classification Example
◘ Given old data about customers and payments, predict new applicant’s
loan eligibility.
– Good Customers
– Bad Customers
Previous customers Classifier Rules
Salary > 5 L
Age
Salary Good/
Profession
Prof. = Exec
bad
Location
Customer type
New applicant’s data
Classification Techniques
1. Decision Trees 4. Neural Network
2. Bayesian Classification 5. Support Vector Machines (SVM)
n
p(c j )
c = max
cj p(d )
p(a
i =1
i | cj)
3. K-Nearest Neighbor 6. Fuzzy Set Approaches
Classification Techniques
Decision Trees
Bayesian Classification
K-Nearest Neighbor
Classification Neural Network
Support Vector Machines (SVM)
Fuzzy Set Approaches
…
Decision Trees
◘ Decision Tree is a tree where
– internal nodes are simple decision rules on one or more attributes
– leaf nodes are predicted class labels
◘ Decision trees are used for deciding between several courses of action
age income student credit_rating buys_computer Attribute
<=30 high no fair no
<=30 high no excellent no
Value
31…40 high no fair yes age?
>40 medium no fair yes
>40 low yes fair yes >40
>40 low yes excellent no
<=30 31..40 Classification
31…40 low yes excellent yes
<=30 medium no fair no student? yes credit rating?
<=30 low yes fair yes
>40 medium yes fair yes No Excellent Fair
Yes
<=30 medium yes excellent yes
31…40 medium no excellent yes no yes yes
31…40 high yes fair yes
>40 medium no excellent no
Decision Regions
Desicion Tree Applications
◘ Decision trees are used extensively in data mining.
◘ Has been applied to:
– classify medical patients based on the disease,
– classify customers based on past behavior (their interests, loyalty, etc.),
– classify documents
– ...
Salary < 1 M
Job = teacher Age < 30
Good Bad Bad Good
House Hiring
Decision Tree Adv. DisAdv.
Positives (+) Negatives (-)
+ Reasonable training time - Cannot handle complicated
+ Fast application relationship between features
+ Easy to interpret - Simple decision boundaries
(can be re-represented as if-then- - Problems with lots of missing
else rules) data
+ Easy to implement - Output attribute must be categorical
+ Can handle large number of - Limited to one output attribute
features
+ Does not require any prior
knowledge of data distribution
Rules Indicated by Decision Trees
◘ Write a rule for each path in the decision tree from the root to a leaf.
Decision Tree Algorithms
◘ ID3
– Quinlan (1981)
– Tries to reduce expected number of comparison
◘ C 4.5
– Quinlan (1993)
– It is an extension of ID3
– Just starting to be used in data mining applications
– Also used for rule induction
◘ CART
– Breiman, Friedman, Olshen, and Stone (1984)
– Classification and Regression Trees
◘ CHAID
– Kass (1980)
– Oldest decision tree algorithm
– Well established in database marketing industry
◘ QUEST
– Loh and Shih (1997)
Decision Tree Construction
◘ Which attribute is the best classifier?
– Calculate the information gain G(S,A) for each attribute A.
– Select the attribute with the highest information gain.
m
Entropy(S) = − p i log 2 p i Entropy(S) = −p1 log 2 p1 − p 2 log 2 p 2
i =1
| Si |
Gain (S, A ) = Entropy(S ) − Entropy(Si)
iA |S|
Entropy
Decision Tree Construction
Which attribute first?
Decision Tree Construction
Entropi(S ) = −(9 / 14) log2 (9 / 14) − (5 / 14) log2 (5 / 14) = 0,940
Decision Tree Construction
Day Wind Tennis?
D1 weak no
Values(wind)=weak, strong D2 strong no
S = [10+, 4-] D3 weak yes
Sweak = [6+, 2-] D4 weak yes
Sstrong = [3+, 3-] D5 weak yes
D6 strong no
D7 strong yes
D8 weak no
D9 weak yes
D10 weak yes
D11 strong yes
D12 strong yes
D13 weak yes
D14 strong no
Decision Tree Construction
Entropi(S ) = −(9 / 14) log2 (9 / 14) − (5 / 14) log2 (5 / 14) = 0,940
Outlook
sunny overcast rain
[2+, 3-] [4+, 0] [3+, 2-]
E=0.971 E=0.0 E=0.971
Gain(S,Outlook)=0.940-(5/14)*0.971
-(4/14)*0.0
- (5/14)*0.0971
=0.247
| S High | |S | 7 7
Gain( S , Humidity) = Entropy( S ) − Entropy( S High) − Normal Entropy( S Normal) = 0,940 − * 0,985 − *1,0
|S| |S| 14 14
= 0,151
Decision Tree Construction
Gain(S, Outlook) = 0,246
Gain(S, Temperature) = 0,029
Gain(S, Humidity) = 0,151
Gain(S, Wind) = 0,048
Outlook
Sunny
overcast rain
? ?
yes
Decision Tree Construction
◘ Which attribute is next?
Outlook
Sunny
overcast rain
? ?
yes
Gain(Ssunny ,Wind) = 0,970 − (2 / 5)1,0 − (3 / 5)0,918 = 0,970 = 0,019
Gain(Ssunny , Humidity) = 0,970 − (3 / 5)0,0 − (2 / 5)0,0 = 0,970
Gain(SSunny , Temperature) = 0,970 − (2 / 5)0 − (2 / 5)1 − (1/ 5)0 = 0,570
Decision Tree Construction
Outlook
rain
Sunny overcast
Wind
Humidity yes
[D3,D7,D12,D13]
High Normal weak strong
yes no
No yes
[D4,D5,D10] [D6,D14]
[D1,D2, D8] [D9,D11]
Another Example
At the weekend:
- go shopping,
- watch a movie,
- play tennis or
- just stay in.
What you do depends on three things:
- the weather (windy, rainy or sunny);
- how much money you have (rich or poor)
- whether your parents are visiting.
Another Example
Classification Techniques
Decision Trees
Bayesian Classification
K-Nearest Neighbor
Classification Neural Network
Support Vector Machines (SVM)
Fuzzy Set Approaches
…
Classification Techniques
2- Bayesian Classification
◘ A statistical classifier: performs probabilistic prediction, i.e., predicts class
membership probabilities.
◘ Foundation: Based on Bayes’ Theorem.
Given training data X, posteriori probability of a hypothesis H, P(H|X), follows
the Bayes theorem
P(H | X) = P(X | H )P(H )
P(X)
Classification Techniques
2- Bayesian Classification
◘ X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
age income student credit_rating buys_computer
◘ P(C1): P(buys_computer = “yes”) = 9/14 = 0.643 <=30 high no fair no
P(C2): P(buys_computer = “no”) = 5/14= 0.357 <=30 high no excellent no
31…40 high no fair yes
◘ Compute P(X|Ci) for each class >40 medium no fair yes
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 >40 low yes fair yes
>40 low yes excellent no
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
31…40 low yes excellent yes
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 <=30 medium no fair no
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 <=30 low yes fair yes
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 >40 medium yes fair yes
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 <=30 medium yes excellent yes
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 31…40 medium no excellent yes
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 31…40 high yes fair yes
>40 medium no excellent no
◘ P(X|C1) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|C2) : P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
Classification Techniques
2- Bayesian Classification
Classification Techniques
Decision Trees
Bayesian Classification
K-Nearest Neighbor
Classification Neural Network
Support Vector Machines (SVM)
Fuzzy Set Approaches
…
K-Nearest Neighbor (k-NN)
◘ An object is classified by a majority vote of its neighbors (k closest
members) .
◘ If k = 1, then the object is simply assigned to the class of its nearest
neighbor.
◘ Euclidean Distance measure is used to calculate how close
K-Nearest Neighbor (k-NN)