0% found this document useful (0 votes)

16 views33 pages

05 Classification

The document provides an overview of classification, detailing its definition, methods, and steps involved in constructing and testing classification models. Key techniques discussed include Decision Trees, Bayesian Classification, K-Nearest Neighbor, Neural Networks, Support Vector Machines, and Fuzzy Set Approaches. It also highlights the importance of using training and test sets to evaluate model accuracy and classify future data.

Uploaded by

cessmania

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views33 pages

05 Classification

Uploaded by

cessmania

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Classification

Part 1
Outline

◘ What Is Classification?
◘ Classification Examples
◘ Classification Methods
– Decision Trees
– Bayesian Classification
– K-Nearest Neighbor
– Neural Network
– Support Vector Machines (SVM)
– Fuzzy Set Approaches
What Is Classification?

◘ Classification
– Construction of a model to classify data
– When constructing the model, use the training set and the class labels
(i.e. yes no) in the target column

Training Set Model

Classification Steps

1. Model construction
– Each tuple is assumed to belong to a predefined class
– The set of tuples used for model construction is training set
– The model is represented as classification rules, trees, or mathematical formulae

2. Test Model
– Using test set, estimate accuracy rate of the model
• Accuracy rate is the percentage of test set samples that are correctly classified
by the model

3. Model Usage (Classifying future or unknown objects)

– If the accuracy is acceptable, use the model to classify data tuples whose classes
don’t known
Classification Steps
Tid Refund Marital Taxable Refund Marital Taxable
Status Income Cheat Status Income Cheat

1 Yes Single 125K No No Single 75K No

2 No Married 100K No Yes Married 50K Yes 2. Test Model
3 No Single 70K No No Married 150K Yes
4 Yes Married 120K No Yes Divorced 90K No
Test
10

5 No Divorced 95K Yes

6 No Married 60K No
Set
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No 1. Construct Model
10 No Single 90K Yes
10

Training
Learn
Model
Set Classifier

Refund Marital Taxable

Status Income Cheat

Yes Divorced 50K ? New

No Married 50K ? Data
Yes Single 150K ? 3. Use Model
Classification (A Two-Step Process)

Data
Training Data Test Data Mining Model To Predict

Mining Model
DM DM
Engine Engine

Mining Model Predicted Data

Accuracy Rate
of the Model
Classification Example

◘ Given old data about customers and payments, predict new applicant’s
loan eligibility.
– Good Customers
– Bad Customers

Previous customers Classifier Rules

Salary > 5 L
Age
Salary Good/
Profession
Prof. = Exec
bad
Location
Customer type

New applicant’s data

Classification Techniques
1. Decision Trees 4. Neural Network

2. Bayesian Classification 5. Support Vector Machines (SVM)

n
p(c j )
c = max
cj p(d )
 p(a
i =1
i | cj)

3. K-Nearest Neighbor 6. Fuzzy Set Approaches

Classification Techniques

Decision Trees

Bayesian Classification

K-Nearest Neighbor

Classification Neural Network

Support Vector Machines (SVM)

Fuzzy Set Approaches

…
Decision Trees

◘ Decision Tree is a tree where

– internal nodes are simple decision rules on one or more attributes
– leaf nodes are predicted class labels
◘ Decision trees are used for deciding between several courses of action

age income student credit_rating buys_computer Attribute

<=30 high no fair no
<=30 high no excellent no
Value
31…40 high no fair yes age?
>40 medium no fair yes
>40 low yes fair yes >40
>40 low yes excellent no
<=30 31..40 Classification
31…40 low yes excellent yes
<=30 medium no fair no student? yes credit rating?
<=30 low yes fair yes
>40 medium yes fair yes No Excellent Fair
Yes
<=30 medium yes excellent yes
31…40 medium no excellent yes no yes yes
31…40 high yes fair yes
>40 medium no excellent no
Decision Regions
Desicion Tree Applications

◘ Decision trees are used extensively in data mining.

◘ Has been applied to:
– classify medical patients based on the disease,
– classify customers based on past behavior (their interests, loyalty, etc.),
– classify documents
– ...

Salary < 1 M

Job = teacher Age < 30

Good Bad Bad Good

House Hiring
Decision Tree Adv. DisAdv.

Positives (+) Negatives (-)

+ Reasonable training time - Cannot handle complicated
+ Fast application relationship between features
+ Easy to interpret - Simple decision boundaries
(can be re-represented as if-then- - Problems with lots of missing
else rules) data
+ Easy to implement - Output attribute must be categorical
+ Can handle large number of - Limited to one output attribute
features
+ Does not require any prior
knowledge of data distribution
Rules Indicated by Decision Trees

◘ Write a rule for each path in the decision tree from the root to a leaf.
Decision Tree Algorithms

◘ ID3
– Quinlan (1981)
– Tries to reduce expected number of comparison
◘ C 4.5
– Quinlan (1993)
– It is an extension of ID3
– Just starting to be used in data mining applications
– Also used for rule induction
◘ CART
– Breiman, Friedman, Olshen, and Stone (1984)
– Classification and Regression Trees
◘ CHAID
– Kass (1980)
– Oldest decision tree algorithm
– Well established in database marketing industry
◘ QUEST
– Loh and Shih (1997)
Decision Tree Construction

◘ Which attribute is the best classifier?

– Calculate the information gain G(S,A) for each attribute A.
– Select the attribute with the highest information gain.

m
Entropy(S) = −  p i log 2 p i Entropy(S) = −p1 log 2 p1 − p 2 log 2 p 2
i =1

| Si |
Gain (S, A ) = Entropy(S ) −  Entropy(Si)
iA |S|
Entropy
Decision Tree Construction

Which attribute first?

Decision Tree Construction

Entropi(S ) = −(9 / 14) log2 (9 / 14) − (5 / 14) log2 (5 / 14) = 0,940

Decision Tree Construction
Day Wind Tennis?
D1 weak no
Values(wind)=weak, strong D2 strong no
S = [10+, 4-] D3 weak yes
Sweak = [6+, 2-] D4 weak yes
Sstrong = [3+, 3-] D5 weak yes
D6 strong no
D7 strong yes
D8 weak no
D9 weak yes
D10 weak yes
D11 strong yes
D12 strong yes
D13 weak yes
D14 strong no
Decision Tree Construction
Entropi(S ) = −(9 / 14) log2 (9 / 14) − (5 / 14) log2 (5 / 14) = 0,940

Outlook
sunny overcast rain
[2+, 3-] [4+, 0] [3+, 2-]
E=0.971 E=0.0 E=0.971

Gain(S,Outlook)=0.940-(5/14)*0.971
-(4/14)*0.0
- (5/14)*0.0971
=0.247

| S High | |S | 7 7
Gain( S , Humidity) = Entropy( S ) − Entropy( S High) − Normal Entropy( S Normal) = 0,940 − * 0,985 − *1,0
|S| |S| 14 14
= 0,151
Decision Tree Construction

Gain(S, Outlook) = 0,246

Gain(S, Temperature) = 0,029
Gain(S, Humidity) = 0,151
Gain(S, Wind) = 0,048

Outlook
Sunny
overcast rain
? ?
yes
Decision Tree Construction

◘ Which attribute is next?

Outlook
Sunny
overcast rain
? ?
yes

Gain(Ssunny ,Wind) = 0,970 − (2 / 5)1,0 − (3 / 5)0,918 = 0,970 = 0,019

Gain(Ssunny , Humidity) = 0,970 − (3 / 5)0,0 − (2 / 5)0,0 = 0,970

Gain(SSunny , Temperature) = 0,970 − (2 / 5)0 − (2 / 5)1 − (1/ 5)0 = 0,570

Decision Tree Construction

Outlook

rain
Sunny overcast
Wind
Humidity yes
[D3,D7,D12,D13]
High Normal weak strong

yes no
No yes
[D4,D5,D10] [D6,D14]
[D1,D2, D8] [D9,D11]
Another Example
At the weekend:
- go shopping,
- watch a movie,
- play tennis or
- just stay in.

What you do depends on three things:

- the weather (windy, rainy or sunny);
- how much money you have (rich or poor)
- whether your parents are visiting.
Another Example
Classification Techniques

Decision Trees

Bayesian Classification

K-Nearest Neighbor

Classification Neural Network

Support Vector Machines (SVM)

Fuzzy Set Approaches

…
Classification Techniques
2- Bayesian Classification
◘ A statistical classifier: performs probabilistic prediction, i.e., predicts class
membership probabilities.

◘ Foundation: Based on Bayes’ Theorem.

Given training data X, posteriori probability of a hypothesis H, P(H|X), follows

the Bayes theorem

P(H | X) = P(X | H )P(H )

P(X)
Classification Techniques
2- Bayesian Classification
◘ X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
age income student credit_rating buys_computer
◘ P(C1): P(buys_computer = “yes”) = 9/14 = 0.643 <=30 high no fair no
P(C2): P(buys_computer = “no”) = 5/14= 0.357 <=30 high no excellent no
31…40 high no fair yes
◘ Compute P(X|Ci) for each class >40 medium no fair yes
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 >40 low yes fair yes
>40 low yes excellent no
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
31…40 low yes excellent yes
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 <=30 medium no fair no
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 <=30 low yes fair yes
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 >40 medium yes fair yes
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 <=30 medium yes excellent yes
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 31…40 medium no excellent yes
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 31…40 high yes fair yes
>40 medium no excellent no
◘ P(X|C1) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|C2) : P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

P(X|Ci)P(Ci) : P(X|buys_computer = “yes”) P(buys_computer = “yes”) = 0.028

P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

Classification Techniques
2- Bayesian Classification
Classification Techniques

Decision Trees

Bayesian Classification

K-Nearest Neighbor

Classification Neural Network

Support Vector Machines (SVM)

Fuzzy Set Approaches

…
K-Nearest Neighbor (k-NN)

◘ An object is classified by a majority vote of its neighbors (k closest

members) .

◘ If k = 1, then the object is simply assigned to the class of its nearest

neighbor.

◘ Euclidean Distance measure is used to calculate how close

K-Nearest Neighbor (k-NN)

Data Mining Classification Guide
No ratings yet
Data Mining Classification Guide
35 pages
Chapter 02 - DM Tasks - Part I - Classification
No ratings yet
Chapter 02 - DM Tasks - Part I - Classification
58 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
Classification
No ratings yet
Classification
33 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
L11 Slides
No ratings yet
L11 Slides
28 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
CH 5
No ratings yet
CH 5
84 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
8 Classification
No ratings yet
8 Classification
45 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Unit 3
No ratings yet
Unit 3
16 pages
DMDW 11 Classification Basic
No ratings yet
DMDW 11 Classification Basic
41 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
93 pages
ML Unit II - Final
No ratings yet
ML Unit II - Final
138 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
84 pages
Unit 4
No ratings yet
Unit 4
78 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
83 pages
DMDM Part 2
No ratings yet
DMDM Part 2
94 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
Session 5
No ratings yet
Session 5
91 pages
Classification Algorithms
No ratings yet
Classification Algorithms
23 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Slide 3
No ratings yet
Slide 3
23 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
Classification
No ratings yet
Classification
52 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
7 Classification
100% (3)
7 Classification
63 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
161 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
DM Lect 9 - Classification - Decision Trees
No ratings yet
DM Lect 9 - Classification - Decision Trees
39 pages
Classification
No ratings yet
Classification
45 pages
Classification
No ratings yet
Classification
81 pages
DWDM Unit Iv
No ratings yet
DWDM Unit Iv
81 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Intro to Machine Learning Basics
100% (1)
Intro to Machine Learning Basics
15 pages
Mall Customer Segmentation Using Clustering Algorithm: March 2021
No ratings yet
Mall Customer Segmentation Using Clustering Algorithm: March 2021
37 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
6 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
Ai-Thinksheet 1 Class 12
No ratings yet
Ai-Thinksheet 1 Class 12
1 page
Islam Et Al. - 2024 - iXGB Improving The Interpretability of XGBoost Us
No ratings yet
Islam Et Al. - 2024 - iXGB Improving The Interpretability of XGBoost Us
9 pages
Voluntary Ai Safety Standard
No ratings yet
Voluntary Ai Safety Standard
69 pages
Infrared Pedestrian Detection
No ratings yet
Infrared Pedestrian Detection
5 pages
Cost-Sensitive Prediction of Stock Price Direction Selection of Technical Indicators
No ratings yet
Cost-Sensitive Prediction of Stock Price Direction Selection of Technical Indicators
17 pages
AI & Machine Learning Basics
No ratings yet
AI & Machine Learning Basics
26 pages
Boe 12 11 6969
No ratings yet
Boe 12 11 6969
16 pages
GoogleNET and ResNet v4 With Nin and Bias
No ratings yet
GoogleNET and ResNet v4 With Nin and Bias
82 pages
Ai and ML qp1 Solved
No ratings yet
Ai and ML qp1 Solved
20 pages
Unit 5
No ratings yet
Unit 5
31 pages
Review Paper
No ratings yet
Review Paper
6 pages
O T A M - S S S: B E F: Ptimal Raffic Llocation For Ulti LOT Ponsored Earch Alance of Fficiency and Airness
No ratings yet
O T A M - S S S: B E F: Ptimal Raffic Llocation For Ulti LOT Ponsored Earch Alance of Fficiency and Airness
12 pages
JOCC Volume 2 Issue 1 Page 9 19
No ratings yet
JOCC Volume 2 Issue 1 Page 9 19
11 pages
Dubber
No ratings yet
Dubber
36 pages
Lec 9 Supervised Learning Final
100% (1)
Lec 9 Supervised Learning Final
182 pages
Autism Detection via Federated Learning
No ratings yet
Autism Detection via Federated Learning
14 pages
Predicting Ayurveda Based Constituent Balancing in
No ratings yet
Predicting Ayurveda Based Constituent Balancing in
11 pages
2nd Cotton
No ratings yet
2nd Cotton
5 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
64 pages
Usenixsecurity23 Carlini
No ratings yet
Usenixsecurity23 Carlini
19 pages
Fraudulent Credit Card Activity Detection Using Adaptive Boosting and Aggregate Voting
No ratings yet
Fraudulent Credit Card Activity Detection Using Adaptive Boosting and Aggregate Voting
8 pages
ML Spam Detection for Developers
No ratings yet
ML Spam Detection for Developers
51 pages
Support-Set Context Matters For Bongard Problems: Nikhil Raghuraman
No ratings yet
Support-Set Context Matters For Bongard Problems: Nikhil Raghuraman
45 pages
Bai701 DLRL Module 1
No ratings yet
Bai701 DLRL Module 1
53 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
NLP Applications and Preprocessing
No ratings yet
NLP Applications and Preprocessing
56 pages