Name: Nilay Debnath
ID: CSE 06607735
Section: C
Batch- 66
Submitted to: Tanvir Rahman
Date of submission: 7-10-21
SUPERVISED LEARNING
Introduction:
Supervised learning is the machine learning task of learning a function that maps an input to
an output based on example input-output pairs. It infers a function from labeled training
data consisting of a set of training examples. In supervised learning, each example is
a pair consisting of an input object (typically a vector) and a desired output value (also called
the supervisory signal). A supervised learning algorithm analyzes the training data and
produces an inferred function, which can be used for mapping new examples. An optimal
scenario will allow for the algorithm to correctly determine the class labels for unseen instances.
This requires the learning algorithm to generalize from the training data to unseen situations in a
"reasonable" way. The parallel task in human and animal psychology is often referred to
as concept learning.
Abstraction:
In this supervised learning focused on the following criteria based on some issues
• Buying
• Maint
• Doors
• Persons
• Lug_Boot
• Safety
• Condition
Here based on this parameter the efficiency of five different algorithm for this case was tested and
implemented and there are results given in result portion.
Algorithms for Maint attributes
Run Information
1 weka.classifiers.rules.ZeroR
Scheme: weka.classifiers.rules.ZeroR
Relation: Car
Instances: 1728
Attributes: 7
buying
maint
doors
persons
lug_boot
safety
Condition
Test mode: 5-fold cross-validation
=== Classifier model (full training set) ===
ZeroR predicts class value: vhigh
Time taken to build model: 0 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 430 24.8843 %
Incorrectly Classified Instances 1298 75.1157 %
Kappa statistic -0.0015
Mean absolute error 0.375
Root mean squared error 0.433
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 1728
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.597 0.601 0.249 0.597 0.351 -0.003 0.498 0.249 vhigh
0.199 0.200 0.249 0.199 0.221 -0.001 0.498 0.249 high
0.199 0.201 0.249 0.199 0.221 -0.002 0.498 0.249 med
0.000 0.000 ? 0.000 ? ? 0.498 0.249 low
Weighted Avg. 0.249 0.250 ? 0.249 ? ? 0.498 0.249
=== Confusion Matrix ===
a b c d <-- classified as
258 87 87 0 | a = vhigh
259 86 87 0 | b = high
260 86 86 0 | c = med
260 86 86 0 | d = low
2 weka.classifiers.rules.PART
Scheme: weka.classifiers.rules.PART -C 0.25 -M 2
Relation: Car
Instances: 1728
Attributes: 7
buying
maint
doors
persons
lug_boot
safety
Condition
Test mode: 5-fold cross-validation
=== Classifier model (full training set) ===
PART decision list
------------------
Condition = acc AND
buying = high: high (108.0/72.0)
Condition = unacc: vhigh (1210.0/850.0)
Condition = acc AND
buying = vhigh: med (72.0/36.0)
Condition = acc AND
safety = high: vhigh (89.0/43.0)
Condition = good: low (69.0/23.0)
safety = high: med (65.0/39.0)
lug_boot = big: vhigh (40.0/24.0)
lug_boot = small: med (35.0/21.0)
doors > 3: vhigh (20.0/12.0)
doors <= 2: med (10.0/6.0)
persons <= 4: med (5.0/3.0)
: vhigh (5.0/3.0)
Number of Rules : 12
Time taken to build model: 0.04 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 315 18.2292 %
Incorrectly Classified Instances 1413 81.7708 %
Kappa statistic -0.0903
Mean absolute error 0.3537
Root mean squared error 0.4338
Relative absolute error 94.3215 %
Root relative squared error 100.1908 %
Total Number of Instances 1728
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.308 0.298 0.256 0.308 0.280 0.009 0.638 0.377 vhigh
0.111 0.299 0.110 0.111 0.111 -0.188 0.485 0.234 high
0.150 0.270 0.157 0.150 0.153 -0.121 0.525 0.263 med
0.160 0.223 0.193 0.160 0.175 -0.068 0.584 0.345 low
Weighted Avg. 0.182 0.273 0.179 0.182 0.180 -0.092 0.558 0.305
=== Confusion Matrix ===
a b c d <-- classified as
133 156 85 58 | a = vhigh
179 48 108 97 | b = high
110 123 65 134 | c = med
97 109 157 69 | d = low
3 weka.classifiers.functions.Logistic
Scheme: weka.classifiers.functions.Logistic -R 1.0E-8 -M -1 -num-decimal-places 4
Relation: Car
Instances: 1728
Attributes: 7
buying
maint
doors
persons
lug_boot
safety
Condition
Test mode: 5-fold cross-validation
=== Classifier model (full training set) ===
Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
Class
Variable vhigh high med
===============================================
buying=vhigh -0.4104 -0.2583 -0.0603
buying=high -0.2765 -0.2119 -0.0781
buying=med 0.2 0.1098 0.0169
buying=low 0.4869 0.3605 0.1215
doors 0.0017 0.0009 0
persons 0.3477 0.1927 0.0074
lug_boot=small -0.2145 -0.1108 -0.0014
lug_boot=med 0.0479 0.0241 0.0007
lug_boot=big 0.1667 0.0867 0.0007
safety=low -0.642 -0.3601 -0.012
safety=med 0.1167 0.0771 0.0132
safety=high 0.5253 0.283 -0.0012
Condition=unacc 3.5598 1.7988 0.0034
Condition=acc 2.0261 1.1919 0.1981
Condition=vgood -14.8556 -0.0593 -0.1045
Condition=good -14.5985 -15.1667 -0.8131
Intercept -4.2465 -2.1353 -0.0077
Odds Ratios...
Class
Variable vhigh high med
===============================================
buying=vhigh 0.6634 0.7723 0.9415
buying=high 0.7584 0.8091 0.9249
buying=med 1.2214 1.116 1.017
buying=low 1.6273 1.434 1.1292
doors 1.0017 1.0009 1
persons 1.4158 1.2125 1.0074
lug_boot=small 0.8069 0.8951 0.9986
lug_boot=med 1.049 1.0244 1.0007
lug_boot=big 1.1814 1.0906 1.0007
safety=low 0.5263 0.6976 0.9881
safety=med 1.1238 1.0802 1.0133
safety=high 1.6909 1.3271 0.9988
Condition=unacc 35.1578 6.0422 1.0034
Condition=acc 7.5847 3.2932 1.2191
Condition=vgood 0 0.9425 0.9008
Condition=good 0 0 0.4435
Time taken to build model: 0.12 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 532 30.787 %
Incorrectly Classified Instances 1196 69.213 %
Kappa statistic 0.0772
Mean absolute error 0.3611
Root mean squared error 0.4268
Relative absolute error 96.2986 %
Root relative squared error 98.5638 %
Total Number of Instances 1728
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.567 0.382 0.331 0.567 0.418 0.162 0.655 0.369 vhigh
0.148 0.131 0.274 0.148 0.192 0.021 0.508 0.247 high
0.271 0.218 0.293 0.271 0.281 0.054 0.580 0.299 med
0.245 0.191 0.299 0.245 0.270 0.058 0.601 0.366 low
Weighted Avg. 0.308 0.231 0.299 0.308 0.290 0.074 0.586 0.320
=== Confusion Matrix ===
a b c d <-- classified as
245 77 55 55 | a = vhigh
197 64 92 79 | b = high
153 48 117 114 | c = med
145 45 136 106 | d = low
Algorithms for persons attributes
1. weka.classifiers.functions.GaussianProcesses
Scheme: weka.classifiers.functions.GaussianProcesses -L 1.0 -N 0 -K
"weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007" -S 1
Relation: Car
Instances: 1728
Attributes: 7
buying
maint
doors
persons
lug_boot
safety
Condition
Test mode: 5-fold cross-validation
=== Classifier model (full training set) ===
Gaussian Processes
Kernel used:
Linear Kernel: K(x,y) = <x,y>
All values shown based on: Normalize training data
Average Target Value : 0.555555555555551
Inverted Covariance Matrix:
Lowest Value = -0.020975481221554237
Highest Value = 0.993470109156667
Inverted Covariance Matrix * Target-value Vector:
Lowest Value = -0.7812415052997398
Highest Value = 0.8274300015945182
Time taken to build model: 5.4 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.5203
Mean absolute error 0.915
Root mean squared error 1.0651
Relative absolute error 82.2512 %
Root relative squared error 85.2746 %
Total Number of Instances 1728
2. weka.classifiers.functions.LinearRegression
=== Run information ===
Scheme: weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4
Relation: Car
Instances: 1728
Attributes: 7
buying
maint
doors
persons
lug_boot
safety
Condition
Test mode: 5-fold cross-validation
=== Classifier model (full training set) ===
Linear Regression Model
persons =
-0.1398 * buying=high,med,low +
-0.296 * buying=med,low +
-0.194 * maint=high,med,low +
-0.2147 * maint=med,low +
-0.234 * lug_boot=med,big +
-0.6555 * safety=med,high +
-0.2865 * safety=high +
1.9373 * Condition=good,acc,vgood +
-0.2593 * Condition=acc,vgood +
0.5098 * Condition=vgood +
4.3284
Time taken to build model: 0.03 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.519
Mean absolute error 0.9149
Root mean squared error 1.0663
Relative absolute error 82.235 %
Root relative squared error 85.365 %
Total Number of Instances 1728
3. weka.classifiers.misc.InputMappedClassifier
Scheme: weka.classifiers.misc.InputMappedClassifier -I -trim -W weka.classifiers.rules.ZeroR
Relation: Car
Instances: 1728
Attributes: 7
buying
maint
doors
persons
lug_boot
safety
Condition
Test mode: 5-fold cross-validation
=== Classifier model (full training set) ===
InputMappedClassifier:
ZeroR predicts class value: 3.6666666666666665
Attribute mappings:
Model attributes Incoming attributes
--------------------- ----------------
(nominal) buying --> 1 (nominal) buying
(nominal) maint --> 2 (nominal) maint
(numeric) doors --> 3 (numeric) doors
(numeric) persons --> 4 (numeric) persons
(nominal) lug_boot --> 5 (nominal) lug_boot
(nominal) safety --> 6 (nominal) safety
(nominal) Condition --> 7 (nominal) Condition
Time taken to build model: 0 seconds
=== Cross-validation ===
=== Summary ===
Correlation coefficient -0.0727
Mean absolute error 1.1125
Root mean squared error 1.2491
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 1728
Analysis and Result:
Analyzing all 6 different algorithms for the given specification (which is Maint and Persons) and
dataset was tested.
For maint attributes I have used three different algorithms which is rules.ZeroR, rules.PART,
functions.Logistic. From these following 3 algorithms it seems functions.Logisticis is the best
classifier for the case. It has less error.
Relative absolute error 96.2986 %
Root relative squared error 98.5638 %
For Persons attributes I have used 3 different algorithms which is functions.GaussianProcesses ,
functions.LinearRegression, misc.InputMappedClassifier. From this 3 algorithms is seems
functions.GaussianProcesses is the best classifier for the case.
Relative absolute error 82.2512 %
Root relative squared error 85.2746 %
For 2nd Sheet