408
Chapter 9 Classification: Advanced Methods
9.3
 
o™
with corresponding output unit values. Similarly, the sets of input values
values are studied to derive rules describing the relationship between A Activation
and the hidden “layer units”? Finally, the two sets of rules may be coms PU lng
IF-THEN rules. Other algorithms may derive rules of other forms, inclu, 4 form
rules (where M out of a given N conditions in the rule antecedent mua eo
rule consequent to be applied), decision trees with M-of-N tests fuzzy rete
automata, % aNd finite
Sensitivity analysis is used to assess the impact that a given input variabj, h
network output. The input tothe variable is varied while the remaining inpng 3°"
are fixed at some value. Meanwhile, changes in the network output are Monitor
Knowledge gained from this analysis form can be represented in rules seh
decreases 5% THEN Y increases 8%.” as “IF x
Support Vector Machines
In this section, we study support vector machines (SVMs), a method for the das
cation of both linear and nonlinear data. In a nutshell, an SVM is an algorithm thy
works as follows, It uses a nonlinear mapping to transform the original training da
into a higher dimension. Within this new dimension, it searches forthe linear op
mal separating hyperplane (i.e.,a “decision boundary” separating the tuples of one clas
from another). With an appropriate nonlinear mapping to a sufficiently high dimen.
sion, data from two classes can always be separated by a hyperplane. The SVM finds this
hyperplane using support vectors (“essential” training tuples) and margins (defined by
the support vectors). We will delve more into these new concepts later.
“P've heard that SVMs have attracted a great deal of attention lately. Why?” The frst
Paper on support vector machines was presented in 1992 by Vladimir Vapnik and col
leagues Bernhard Boser and Isabelle Guyon, although the groundwork for SVMs his
been around since the 1960s (including early work by Vapnik and Alexei Chervonenkis
on statistical learning theory). Although the training time of even the fastest SVMs
can be extremely slow, they are highly accurate, owing to their ability to model com-
plex nonlinear decision boundaries. They are much less prone to overfitting than other
methods. The support vectors found also provide a compact description of the learned
model. SVMs can be used for numeric prediction as well as classification. They have
been applied to a number of areas, including handwritten digit recognition, objet
recognition, and speaker identification, as well as benchmark time-series prediction
tests.
The Case When the Data Are Linearly Separable
To explain the mystery of SVMs, le’ frst look atthe simplest case—a two-
 0.
Thus,
Similarly, any point that lies below the separating hyperplane satisfies
wot wx + wm <0.
(0.3)
(9.4)
(9.15)93 Support ‘Vector Machines
au
be adjusted so that the hyperp}
rhe weights can 'yperplanes defining the “sides” i
ee 8 the “sides” ofthe margin
Ay: Wo + win + wox > 1 for y= +41, (9.16)
Hewotmin tine <1 fory a1, (17
That is, any tuple that falls on or above Hj
on or below Hp belongs to class —1. Combi
(9.17), we get
belongs to class +1, and any tuple tha fall
ining the two inequalities of Eqs, (9.16) and
iW + Wixi + wex») > 1, Vi (9.18)
Aany traning tuples that fll on hyperplanes Hor Hy (ie, the “sides” defining the
margin) satisfy Eq, (9.18) and are called support vectors, Thats, they are equally close
to the (separating) MMH. In Figure 9.9, the support vectors are shown encircled with
a thicker border. Essentially, the support vectors are the most difficult tuples to classify
and give the most information regarding classification,
From this, we can obtain a formula for the size ofthe maximal margin. The distance
from the separating hyperplane to any point on Hy is 1 where ||W]] is the Euclidean
norm of W, that is, VW W.? By definition, this is equal to the distance from any point
on Hp to the separating hyperplane. Therefore, the maximal margin is Tad
Ay
 
O class 1.94 ta compater=ys)
© is 2,)=1 9s. compter=no)
 
 
 
 
Al
: Support wen ; :
"pure 9.9 Support vectors. The SVM finds the maximum separating hyperplane, that is, the one a
maximum distance between the nearest training tuples. The support vectors are shown.
thicker border.
IW = tins, sahthen YAW = Yd412 Chapter 9 Classification: ‘Advanced Methods
/M find the MEY and the support ¥ECtOs?” Using some
“0, how does an SV) : fabri .
math tricks,” we can rewrite Eq, (9.18) so that it becomes what is known as Aconstrainey
conv ‘atic optimization problem. Such fancy math tricks are beyond the
ofthis een readers may be interested to note that the tricks involve re
ing Eq, (9.18) using a Lagrangian formulation and then solving for the solution Ea
Karush-Kuhn-Tucker (KKT) conditions. Details can be found in the bibliographic zone
at the end of this chapter (Section 9.10).
If the data are small (say, less than 2000 training tuples), any optimization sofyar
package for solving constrained convex quadratic problems can then be used try
the support vectors and MMH. For larger data, special and more efficient algorithms
for training SVMs can be used instead, the details of which exceed the Scope of this
book. Once we've found the support vectors and MMH (note that the support vecion,
define the MMH!), we have a trained support vector machine. The MMH isa linear das,
boundary, and so the corresponding SVM can be used to classify linearly separable dats
We refer to such a trained SVM asa linear SVM.
“Once I've got a trained support vector machine, how do I use it to classify test (ie,
new) tuples?” Based on the Lagrangian formulation mentioned before, the MMH can be
rewritten as the decision boundary
!
(XT) = YP yiaiXiXT + bo, (9.19)
is
where is the class label of support vector Xi X7 is a test tuple; a and by are numeric
parameters that were determined automatically by the optimization or SVM algorithm
noted before; and lis the number of support vectors.
Interested readers may note that the a are Lagrangian multipliers. For linearly sepa-
rable data, the support vectors are a subset of the actual training tuples (although there
will be a slight twist regarding this when dealing with nonlinearly separable data, a we
shall see in the following).
Given a test tuple, X7, we plug it into Eq. (9.19), and then check to see the sign of the
result. This tells us on which side of the hyperplane the test tuple falls. Ifthe sign is posi-
tive, then X7 falls on or above the MMH, and so the SVM predicts that X" belongs
to class +1 (representing buys.computer = yes, in our case). If the sign is negative,
then X7 falls on or below the MMH and the class prediction is —1 (representing
buys. computer = no).
Notice that the Lagrangian formulation of our problem (Eq. 9.19) contains a dot
Product between support vector X; and test tuple X7, This will prove very useful for
finding the MMH and support vectors for the case when the given data are nonlinearly
separable, as described further in the next section,
} Before we move on to the nonlinear case, there are two more important things ‘0
note. The complexity ofthe learned classifier is characterized by the number of support
Nectors rather than the dimensionality of the data. Hence, SVMs tend to be less Pro"
to overftting than some other methods. The support vectors are the essential or criti
‘raining tuples—they lie closest to the decision boundary (MMH). If all other trainin9:3. Support Vector Machines 413
e removed and training were repeated, the same separating hyperplane would
ciples Furthermore, the number of support vectors found can be used to compute
La Tani expected error rate of the SVM clasifier, which is independent
20 (Pre “Ammensionaity An SVM wit sen number of support vectors can have
of te ee ee even when the dimensionality of the datas high.
good gfn Methods
1.Decision Tree
2.Bayes Classification
3.Rule Based Classification
4.Classification by Back Propagation
5.Support Vector Machines
6.Associative Classification
7.Lazy Learners
8.OtherClassification Methods
 
Clas nm Tree Indu
 
n_by De
 
ical
   
Decision tree induction is the learning of decision trees from class-labeled training tuples.
A decision tree is a flowchari-like tree structure, where
C Each internal node denotes a test on an attribute.
C0 Each branch represents an outcome of the test.
U Each leaf node holds a class label
0 The topmost node in a tree is the root node. it
 
 
5] excellent, fair
Ts a]
ot ae RE fr 40
 
Decision tree indicating whether a customer is likely to purchase a computer.
Class label-Yes :the customer is likely to buy a computer,
Class Label-No- the customer is unlikely to buy a computer.
How are decision trees used for classification?
-the attribute of a tuple are tested against the decision tree.
-Each path is traced from root to a leaf node to predict the results,
Example
Test on age-<=30
Test on Student - no
Class the customer is unlikely to buy a computer,
 
 
‘2ge_| income [student ered
40 [medioms)
 
Algorithm for Decision Tree Induction
'™ Basic algorithm (adopted by ID3, C4.5 and CART ):a greedy algorithm.
Tree is constructed in a top-down recursive divide-and-conquer manner
Iterations
™ Atstart, all the training examples are at the root
Attributes are categorical (if continuous-valued, they are discredited in advance)
Examples are partitioned recursively based on selected attributes
M_ Test attributes are selected on the basis of a heuristic or statistical measure (e.g.,
information gain)
Conditions for stopping partitioning
All samples for a given node belong to the same class
There are no remaining attributes for further partitioning ~ majority voting is :
employed for classifying the leaf.
There are no samples left Ln
 
Input:
Data partition, D, which is a set of training tuples and their associated class labels:
© attribute ls, the set of candidate attributes;
"8 Attribute.selection. method, a procedure to determine the splitting criterion that “best” par-
titions the data tuples into individual classes. Ths criterion consists of a splitting attribute
and, possibly, either a spit peint or spitting subset.
Output: A decision tree.
Method:
(1) createa node Ns
(2). tftuplesin D are al of the same cass, C then
(3) return asa leaf node labeled with the class Ci
(4) tfattributetist is empty then
(5) return V asa leaf node labeled with the majority class in Ds // majority voting
(6) apply Attribute selection method(D, attrzbuteist) to find the “best” spitting criterions
(7) label node N with splitting-crterions,
(8) if splitting-atiribute is discrete-valued and
‘multiway splits allowed then // not restricted to binary trees
(9) attribute jst — attribute Sst — splitting atribute; | remove splitting attribute
(10) for each outcome j of splitting-criterion
1 partition the taples and grow subtrees for each partition
(11) et Dy be the set of data tuples in D satisfying outcome Js // a partition
(12) MED, isempty then
a3) attach a leaf labeled with the majority class in D to node Ws
(14) else attach the node returned by Generate decision tree(D,, attribute list) to node Ns
endfor
(15) return Ns
 
oO
 
The algorithm is called with three parameters:
O Data partition
Attribute list
O Attribute selection method
Attribute Types
1. A is discrete-valued:
In this case, the outcomes of the test at node N correspond directly to the known values of A.
A branch is created for each known value, aj, of A and labeled with that value,
A need not be considered in any future partitioning of the tuples.
2 Ais continuous-valued:
In this case, the test at node N has two possible outcomes, corresponding to the conditions A
<=split point and A >split point, respectively where split point is the split-point returned by
Attribute selection method as part of the splitting criterion.
3 Ais diserete-valued and a binary tree must be produced:
 
The test at node N is of the form—A€SA?I. SA is the splitting subset for A, returned by Attribute
selection method as part of the splitting criterion. It is a subset of the known values of A.
4
oN SR
2 % 4 %
F
sl i a 2
paca \
\
 
 
/ Z
 
»
Paniuontcg Seeman
‘ ar "
a4 %
Gama
A-sspinpoint A> pone
Z X f
ye =, ys 2
ZL \ |Z Ni
(@)IE A is Discrete valued (b)If A is continuous valued (c) IA is discrete-valued and a binary tree
must be produced:
 
»
 
 
 
 
 higher uncertainty —
« Lower entropy => lower uncertainty ao
= Conditional Entropy den
= H(VIX) = Ex PQ) IX = x) m=2
<&Attribute Selection Measure:
_Information Gain (ID3/C4.5)
 
= Select the attribute with the highest information gain
= Let p, be the probability that an arbitrary tuple in D belongs to
class C,, estimated by |C, 51/|D1
= Expected information (entropy) needed to classify a tuple in D:
Info (D) =~ p, log 3(2,)
= Information needed (after using A to split D'Thto v partitions) to
classify D: .1D,1
Info ,(D) = Y) +> Info (D ,)
fi 1DI
= Information gained by branching on attribute A
Gain(A) = Info(D) - Info ,(D)
Attribute Selection: Information Gain
    
 
Class P: buys_computer = “yes” Info, (0) =£12,3) +4164)
I Class N: buys_computer = “no” ea 14
se se 5 .
Flow.) =0.940 + 7g 1G2)= 0.694
> 1(2,3ymeans “age <=30" has 5 out of
14 samples, with 2 yes’es and 3
 
o's. Hence
Gain(age) = Info(D) ~ Info,,.(D) = 0.246
Similarly,
Gairkincome) = 0.029
Gair(student) =0.151
Gairfcredit _rating) = 0.048
“
 
 
<%Computing Information-Gain for
Continuous-Valued Attributes
 
= Let attribute A be a continuous-valued attribute
= Must determine the best split point for A
= Sort the value A in increasing order
= Typically, the midpoint between each pair of adjacent values
is considered as a possible split point
= (a\+a,,,)/2 is the midpoint between the values of a, and a,,1
«= The point with the minimum expected information
requirement for A is selected as the split-point for A
= Split:
= D1is the set of tuples in D satisfying A < split-point, and D2 is
the set of tuples in D satisfying A > split-point
as
Gain Ratio for Attribute Selection (C4.5)
 
= Information gain measure is biased towards attributes
large number of values
= C4.5 (a successor of ID3) uses gain ratio to overcome the
problem (normalization to information gain)
|D
8
ith a
 
     
Splitinfo. 4(D) = -
 
= GainRatio(A) = Gain(A)/Splitinfo(A)
aE 6
x.
SplitiefomeemeD) = —2 x Ios2(4) - & x on (2
 
fx toea (8) = 2.807
= gain_ratio(income) = 0.029/1.557 = 0.019
= The attribute with the maximum gain ratio is selected as the
splitting attribute
16
<)>i Index (CART, IBM IntelligentMiner)
 
= Ifa data set D contains examples from n classes, gini index,
i n
gini(D) is defined as ot (Dyat- £ ph
  
where p; is the relative frequency of class jin D
= Ifa data set D is split on A into two subsets D, and Dz, the gini
index gini(D) defined as
gini ,(D)= Oyen (Dit Uhm (D 2)
 
= Reduction in Impurity:
Agini (A) = gini (D)- gini ,(D)
= The attribute provides the smallest ginigo,(D) (or the largest
reduction in impurity) is chosen to split the node (need to
enumerate all the possible splitting points for each attribute)
Index
 
Computation of Gi
 
= Ex, D has 9 tuples in buys_computer = “yes” and 5 in “no’
gin (D)=1-(2 (a) = 0.459
= Suppose the attribute income partitions D Into 10 in D,: {low,
medium} and 4 in Dy giti son sive man \(D) = (# Gini (o+( Son @)
-8(0-(8)'-(@))+46-(@'-@)
= Gintama «tie 2)
Ginigow nigh) 1S 0-458; Gini gnecium righ) iS 0.450. Thus, split on the
{low,medium} (and {high}) since it has the lowest Gini index
= Allattributes are assumed continuous-valued
= May need other tools, e.g., clustering, to get the possible split
values
= Can be modified for categorical attributes ‘s
 
ou© Decision tree construction becomes inefficient due to swapping of the training tuples in
and out of main and cache memories.
© Over fitting problem.
Decision Tree Induction algorithm for Scalability are:
‘© SLIQ- Builds an index for each attribute and only class list and the attribute list reside in
memory.> Onir- ny
Decisim Tree Taduehon °-
=> Sherepratn leer ques
»> Beparati. dali 4d jnls Smaller Sober
> Classe she & « form ef dale Analy
List ee lines moedsle daferbing smopartant
= ates , Arneh me clels called
n
cert en" preter wagered) lone (Oeint
Wires) do clarsiticaten works"’
Date Clasehist wo tro hep presen.
fie Learnt shep
a. clarnpeste ~Slep-
" Leanne Ata - ;
ay pe lempern x bute debetbeni aw
prodarinicad Set of late Clarres or corepa.
SS where a clariificshy gets butte!
[ne we by Ara lgezs leat
Charrh eno
tors is i
=> 4 Erining Seb make op oF
fples anh (rar avon La) Clara
labels> & lepe b representinf by R=
bbc vebtor X= Crud, x -.- Xn)
> n maceosenent ton mos be oy eee Up
“incon n: databas — Uocdaale
 
5 Sagan cts!” 2 reheatUnit WM -Classification
Classification -It is also called as Supervised Learning .
Supervision
~The training data are used to learn classifier.
~the training data are labeled data.
-New data(unlabeled) are classified using the training data,
Principles
construct models(functions) based on some training examples
-describe and distinguish classes or concepts for future prediction
-predict some unknown‘calss labels
   
 
 
 
 
 
 
 
 
 
Comparing Classification and Prediction Method:
S.No ion Prediction ae
I
-Classification predicts -A predictor predicts a continuous-valued function, or
categorical (discrete, unordered) | ordered value.ie predicts unknown or missing values.
labels. Regression! Analysis is used for prediction.
Uses labels of the training data | -Predictior also cailed as numeric prediction.
to classify new data,
2
-Classification model to Prediction model to predict the expenditures of
categorize bank loan potential customers on purchase of computer
applications as either safe or | equipment. based o their income and occupation.
risky
T Accuracy:
Classifier Accuracy-the ability of a classifier to predict class labels.
Predictor Accuracy -how well a given predictor can guess the value of the predicted
attribute for new or previously unseen data.
0 Spe
 
Time to construct the model(training time)
Time to use the model(classification/prediction time)
G Robustness:
Handling noisy data or data with missing values.) Sealabili ‘
 
This refers to the ability to construct the classifier or predictor efficiently given large amounts of
data.
| Interpretability:
‘This refers to the level of understanding and insight that is provided by the classifier or predictor. |
Applications
= Creditloan approval:
1 Medical diagnosis: if'a tumor is cancerous or benign
Fraud detection: if a transaction is fraudulent
   
 
 
*
Web page categorization: which category itis
Classification—A Two-Step Process
* Model construction leaming or training step): construct a classification model based on
training data.
‘Training Data F
|
Each tuple/sample is assumed to belong to a predefined class, as determined by |
the class label attribute,
The set of tuples used for mode! construction is training set |
The model is represented as classification rules, decision trees, or mathematical |
formulae
é
‘© Model usage: for classifying future or unknown objects
Estimate accuracy of the model
To measure the accuracy of a model we need test data
Accuracy rate is the percentage of test set samples that are correctly t
classified by the model
 
ing set (otherwise over fitting)
   
Test set is independent of trai
Irthe accuracy is acceptable, use the model to classify new data
Note: If the test set is used to select models, itis called validation (test) set
 AlgovilKn BackPrfargs hoo ~ Nelars) no hvore
oe for Classifreshsn Co numeric _preolie fon
he
we fe boing beck prepaqa hoo algorithm
 
Tnpab:
Da date Set esis of (ty, Eresnnp tuple and
And thal, MSouals benget valuce .
1 Che Berg fake
Ne(sorle-> A _ Mulkleyou Pew forwards nobuwk
 
 
 
 
catput!
Inhali2e att weight ancl biases my naGoode °
»® Lone Lerminghng Cncdihon isnot Sab feel -
ae - 5
 
for Gach Freining haps X in D-dabe sel
t
for Cach input leyer on 7S
Oj = LyV Howtput of am Jp uoits fs 16
| fer exch hidden © output layer un 5
£
Tj = 2; Wi p+ O=3 Compe tenet
BE ae ag tap
=A {2 prods a c
< 7 . 4 ~ of = t - =
Thasly fhe oxtmers Sk CF
for ere urit_y to the outpst feyax
trey = of C= oy) au Bom
z = ber teryeh velue of tun
+} For each unit fio Gu hidden layer From
last be te frst hiololen leper
 
melErr) = O,U-0 ud,
E
 
vespeck bo the non hot layer, le
For each wag 9)3 % nebroodc
 
 
 
 
 
 
 
 
 
 
 
 
2
Avy = (6) Em OF FN doug Lic TEnORs
Wij = 0) 4 A uy ; (rawglb updels
for each bias G4 th Preto
£ - a
A Oz = (Lh) Evy ‘ s
Crp Bore ~ ear
a a -
3 : —*
 
Ex vosul Sareea Gol onshes PRP Bs: back peebagshs
 
 
 
 
Frist Lradning, kup ¥s Clio, )wilk aclas Lad
of “”AL
The (ap is fod Into Oe nelaok And tte nce
 
ip and omtpeb ef each ait
 
era Lompeld |MP weght Srl pac Veluss—-C @malt ae
on
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mlers © range, —1.0 tb 1-0 (0) 0.5 oo)
MO %3 7 iy wis ry Woe Yay Oe eae 7
t oO { O02 -02 OF dt -OAS Or -03 -o.2
Oy Os &
~O4 62 of
"
Nek pak anal akpat Cakealahs, *
2 Unite, 4 nek yp, i Outpt Of,
ti= Zvip Gey ot
0.2 40205-0.4 =-07 GE") =o.g
B= 02b bo 40.2 + 0.22. Votge2') 20-525
bo ¢-© 2y.¢H.299) = 60. 2900595Viny 1 C3 )-6.u1Y
 
= 0-105
 
Calailehenr Of Ure Crreor gi Gech Ned,
 
 
tani ery =O; =o) (43-03)
 
G
S,
4
Coury) (0.474) Cow) = 6.13
(05s) (1-0 S25) Ceaggm) (-0 2) = ~ 0-005
(o- 33.2) (1-0-332)C0-13") (~9- 8)=-0 Copy>
 
Lit = eae ol Bias
 
 
= . io tient x
 
 
i
 
 
 
 
 
> PF ebvarhe
Inf weigleted Sano funds
‘p Cotutpd di
Peas ows Laajes
 
 
 
 
 
 
@ rat ied  gnd oubpiot BP acl, tt, i
the Inddes Snd Sp 1oyea ane Comptia)
 
 
 
 
b Se tp tad ste 4
Ohh aed prego.
Sa Lau
E Pree oe weal
 
 
weft saad pt ay
 
[> Bech unit 1h [ke hiclden anol oubpet leyeg tat
nok fp Anh Spplios achvethss finches
 
nok
of
 
|
a
Gel pe tan Gye
ast fy Com sx tual
_ Jvate.
 
Op a] :
d — 0h Ave}
e — -€ oepunsnhs
constent
9 bn) 2 ih Velux apprestionds
2-Nne
d
\
{ L5 Ly af [=
Pe > (noe Valu
CLP toil gen poor tuple
od Biainiedy
&
be. waithies Sam of (he ere of (bs ests
Connaabid te ini) ih tha hoped Lays age ——
ee ee Leyes task 4
Exr off) Aealars = a
Onnect ote
ery of umt EWS:
 
 
(ey wegik ark biases ane plates) ts retlaed
fie _prepagaled €3ve — rake reales po te
2a ved
Awl} = Cg) Ew oO;
Wis ip yA AS}H— me
~ ee
alt j. wel Anpd Tf Tea ;
oD} DX-G} +I -P-S uy. S201 | |
ee Dx-o +1e ps bao re
TIE O05 Oy
> 0-38
TX- ORF Oxo Plxyo-2 FO-2 = —O-1
 (0 S25) 504
c 19s
Err
 
 
Erry
 
:
Fa PTO TTS SF eae
a
HTH) U20-41y) Green? =O +1311
 
 
Caldlabe, fos weight and Biss Updabig “
~
Ay Arg
idee shih t-etaa)-(dulSil) 6o.2)."6 100
« =(orS9 SO moesrg (Outsiiyere) = — ire ok
e7
 
 
 
-~_Wyp |-o-8 4 C009) CH.1 21) Coe 922) = — O'2b) AAs lf
: ou) terd oF
foarneg vali x
 
wee |-o24 Coa)(o12t! > (0.826) =-0 128
why |e + Coq) tooe 0871) =O j40
WLS |= o.2 + (95). (-6 0065) C1) =o = 206
Woy | 0-4 Co) Co post) Yo) ay
Was | o-l 4 (0.5) gp noes) Co) = oifi*"
Way ef G + Co Ss) (0-68 1) C1) <0. 50k
eee [o.2 2005) C7 6 bs )C) = 0-154
ll)
I"
 
   
5) (9-18!) ) = ale
onyS .
Chars fration (yniy may eli gate) ©
sdepetical Ulesifier,
PrbelAes euch
toe
Bayer
~ Bory e1/aD awifiert are
preclict Clas menbethe
cu fhe GP rupee heloage
parficulas Clats, BY ds bad 9
OS heti
eye WBeore « Napier have ho haw high Aduracy
4% ped. uber applied to lay Clalebase.
. Naive bayear Classifiers o> A\stume spat He
e offet of wn atobule velo on @ ger clas 0
joclipencleot of Se veelas of We
~ ofker abvhula ,
Bu auumptico ww called Class cone tool Ja clipenda
’ Bayern behef efworki ap Pre Gh
ahh untike nate Bay e7eP ofan fiet, ebioe
pepresen fet (0 of dependn pebseé of
odbrhle. [eter autoroah elig orders
Mey can
prebebly that a
even Cher ficelsr
cal mete,
he.
ae emf
or cabagor 204 ALE inte one ©)
) Oe ee c cabacie clases
a lena] be filter Heer 73 class Labal
ay es Yheorem 41 name’ Ar bays, &
enghih Lergy WW who corked wh
tuning the 1B century
e@
~~ (h) Baye Shore .
poncon fermitt
probably f heesro0 Hheoy
Lt x be a alate Hyph. to bay er ar Serme,
cbied! as “bopdne” ket Wh som hypohers
x wu cons
a
puch cor Hae he belle Aple x belongs fo
peapred ch c. Ther cluificales problom +
chetermiuch hy plHfe) , whe prosablly hat te.
eeae
 
Aypothecte  W fords JD Phe "oyicliots! om oo
chserecd lala Ayphe x. ‘i
© prebabr tity oO Clan feed a foo A yper
G Poste wen probeh&ly ¥
@Y poor probability ’
fosterion probabitity obs ee ae Pretsbililg Crab
PCH/D) 4 He poster’ ‘on erokadly » Or @ pos Pot
pubabilty, of 4 conchtioad 09K. eae
Lramph : 9
suppose ous coorld of dalz faples a con fired
bb curdomers chrenbed by The atytuler %g¢ ancl jacome,
rerpectivtly...and. Mat x ts @ ae year-old Cus leomer
WH a income of $40,000 Qnd H 2 He bypokers
Hat our cur heer wil buy a compile, Then plus;
afk te probably tat cushimer x will Y
a Compulr Ger tek we knon Hhe Carlen ee &
G inece, é
Prer probabitty
- PLHD 4 the poor probably, or
probabhily . of 4,
@ pron
Aram phe: hi ti She picbably hat any Ziven
Curher coil) boy & Compeltr, resardhu Of Ye, Incony
by any OA er vopormeatsn . (0, He proba, Any: yp
) , >
at Cusrkmers, x, 6 35 years old and Carne =
Sho,coc, Given mat Ae cushoner wit buf A Compulir ,
Probability Amabon +
pews, Plx/H). anol pcss mey be udmalkd prom he
gen calc, Bryer Beover Li wiepl in Pra 10 proves
a way of caleulalirs He posterior prbhl fs PAA),
fiow plHs, ptx/Ad, wd pers,
@ Baye Hhecve
PlAUKS - POH) PCH)
> PCx).
 
Naive Bayerer chaceiJicetion
jhe naiwe Baye model ti 4
mehod ox poor rons cupewited earns of 2
clanifi cabo problem. THE puve bayerin Claes ficy
g make He auunphoo of clan conditiooad
) sncbeperdact , ye froen He Clu lebel f %& Hpk,
Aimple G wel] kiewo
yhe vals of Fhe abpyder ae avumed so bg
conditienally dependent of one anotber.
ws naive bayeren clatifier, (Bid Singh bageren
Clauifier lworks a4 follocos ;em eeeree. OT
By ae (ne suena
i, Li d & a pairing set of +upler, anf
bear avoual chu hbelt. fach Jeph +i
seprevenhkt by an p-cbrmtnciontl Abvhule vec ler,
X= (Nis), adpich9 nn meanuremnk mack &
 
Adv«ma
of 4 given
eonof
a nceneli hen, | pritee bi hh
Knewn we
a
erndcinp 4 Le
4
2 ryphk pon 1 aAn bali , vepectively BD,» Ag+. 8p
prwbabili
eS
x
Suppae hat Hee ae m chauer, C,, ly,..6m » Green
typle > x, He clawifiey will pri thot y belers
Whosa conditions) nd
) Compdédd after ob,
 
a
to He Clase having the highe! porheoon press tty
fo
 
The tandithens|
oditioved bn x. That 4, He naive Bey b1 en che
puclilr Het ph & belong fo He Clas Cy pf apf
ooh of
POOP) > PCYSH) fr sss em, Ske
event
t
event
Thu, We maksnixe Pl (x). Be Chau Cl) yor
whh Pl /x) & masrimzed «4 called He.
MAL) MUD pestevion’ hypotbess , by bayer PheO.em
PCGPe) = PlAfce) p Cor) .
Pp tx)
(py ps Plt 4 constant for _a// Claes, ook
 
 
de mavmixed. If he
207 410.00, VA de
Clavel Gre equally
ple lecd Plee) veects to
class poor pres abhi ae
Comoe Y auumed Hat He
Likely » Heb
plod = plad=...2 plem) and
w cutel» Enron cle Yo
late sefe with — many ab bule,
iodhep ence ab madls
&
:
Clan |eble of +h
he fore MALI ni Se pute). OM tr 180 , Ue
Cs SO, We MQAL/ Li
perfec) PCG)»
Vt be ould
comp le
he exhrenely Compttationally expens/oe fo
cpalualing
12
plr/ti) + To  xecbuce computati?
fen Of Clas ~C nd Hence
plx let), the paive amumpr
Thai preseme Phat He adpibele dbus ve
conch tool jnctependen? ef ove
Aph Lie. Hat Hoe ve
afpbelT d+ Thus,
anoHser, 1? He
pe cteperclina
melation ships among Phe
pixie) =U PC fee)
kel
= pls
[a)* pla Jer) r+ atp Lia) t)
wae putter en Uo caly ortnel lem
praio'y pple
uferc %& — pepert yo He value ef Phe
por tpl vee
tach abvbale — Lok at Seker He abr bale 4
tuned .
Con 710 WO va
cons cler 2
adn hele A
calzone! or
Jor instana , tO sompeli plrlec), x
follows ;(@ Lf Pe ti Cebegenecl » “Hen ple Jec) t: &?
He nudes of Aepls ef Claw te jo D having
te valac te for es Muicled by Jee, ye waz
nwoke of tpl: of Claw lyin d.
can tinucu_trhued, Heo
D Coodnuoedt Vehsd Oty bale
Gauuieh chifols
Ef Pe a axe neecl to
do a bt mek wvrk,
wi typcally Avemwed 4o have 4
“6 end sHindard ateviallss oF
Ya @
wih @ mean
abefrved & prstocts lily dohibubed that oo |
fae. abt ee i a oy
glijto 2 Liaw pee)
Vare- Ge © lees reat
Seceaan Chak Mol:
so that ae
—_—Q@
plate [ted =f Cte Me? oe)
yo compile tc, and Te _
cue need yo Va c. = ; Cchnctued tures.)
of the vale of adwbele Py tf 720#0°F ay @
Ly pid
clan of C- Then plef Shere fue yuan?
7) 70g cer wth es fo ert jm ole
jofo eye ,
plylecs.
Jer erfeatioo Com v)
: Learned medled 41 represen tra at f H-4a
?
cer. usd we ertimine hows Fach reles Me aed fos
Classification. Then fo Chucdy the ey jn ubch they
can Le genralecd ether ftom a
oy
 
& aeeisren Pee
ue diectly fron He Training dale wing a
Seqyecn hal foveng algo vn
for cles
Wing H- Hho Rules for chasti feo bso
Rule are a goo Ly os pepreen ting 1,
A ete bared Chauifier We &
An H- ten yeele
 
Pe ee
see of H- Phe puter for Chats: jpeoksa «
ys an express ef He foro
py _xcendidion> then conckreo,
An example a rule Ri,
2 BiH aye = gots Pwd chutert = ¥ eyet THEA
a __ coma
— Say- comple = = fee
“the de past Cor heft -8 sided of a eh known as
coochtied. ‘The “ren” part Cor
fe pole snteeedeol or _prccanclite
yight side) Ai we vue Conse wnt « Zn He yule
naberdet, the (encki(an consish
abrbelia sede Ce, age = yout $ cudnt = ye)
The rulei —Loncegsreatr/ Coord bag
fone Or mere.
Hat are logically Awad
a Can precliction «
arFITEYQASSSS
WW Role __eeteattien fom
eS
cwotip y
R Can ake &
&, lage yours) A Ctductent = vet) => Chu. Comp:
er.
f
Ut pe cenadice Cie, al) He adybale Pet) a a rule.
avkecectent holds prue fxr & GU” Ake, we Sey Hak
The rule ankcectent is satiofed and At He yule
 
Covers She spl:
he Aueacd by 8 couse F
can a
yule R can Oh yoo a Cal Jab elect Bali
pisos Of puple covered by
Pp
aceurayy - Gin a Teh,
set, Ds ews do He
pret.
rack hs He  oandr of pape I,
 
23 Prom
Lr of 7p
and [pf ba Ae 2”
Cast bel ay Bo sauy ob 2H
d pe coe covereepe and atuny
° we
Ni S
coverope CRI = Seg
/D/
Quanacy (2) = Deore ct
Ndeovers . e
That , & yulels coverage ui ¥, of fyphy Hat are
Covered 4y the vel.
a. Decisien p tres
Sha deal with kono TER, a rh-
haued! Classifier fy er/valting Mf Hen rules ton
a clecsion fret» Zo Comparten wis & pecit/on
dre, the When riley may fo carer for Samant
fo enclrctand , parprewtar ly Wo He Cbeasio free.
be Vey sage ehFo anlat nebes
ahak yules pom a Aelision Aree, One
i crealid for bach pars prom He peer = 2
fach Sphting Crifenion aherg A gGiun pow # Lycee
form fhe rule anheedent ("I¢'g pat). The
preclicton + forming Hho
heed woe
Avded to
feaf pode tolcls Fhe
yule enceyerd 0" Ther” pues.
Leamph
Conver tag Abettion Tree (0% hen adler.
 
} Rs HF age= youk Awd shiderd =00 The busy compiln
Ro: 1h ages Sor — pur Sheen) = ser Then boys. lompelis
Pas Tf Ye= wich - aed phen boys. Compal -%
Par if Cyts tener — per creekd.rari} cereetlioy Thin buys
Compuls ye
pri If eget: seo gape Grdkteverhry fete TRE Saye. Comple
sae
ae Gn we Pree Be rele wok ?
ae ale enhecdlint, Oy cond:tioo Heel
Jor a green *
bsfimald acunay of He
cloes not dm prove He
can be pruned Cle removed), Hareby Generale ™4 phe veh.
The trainif Taple § Adio anyone d clus Labels
an usd to enim ale yutle Acura . Poy yuk, el
doe 107 pa poem ioueal) accel 7, of Ae
ve vule. St can oho = be prenedl ,
yale
 
enti
ayoer probleme = punrg quel Prensag .
lin Powever the rales wilh 0 longer fo mut
exclusive % echarhive» For Loo flel yerolaboo 9 Ces
aclepte a cha. bared Orch ving scheme. Lf Frys
Popeher @l/ ruler for a dingh Chur, and Keo Aefemnes
@ reanking of Hest claw yah sh. within @ rbe tl,
She sels are oF orclered + Chas vichens aac ee
ged so cu fo mininiee Fhe pesodee of (6% pos tee
envoré + The clea seule eet wih te bear) num ber —
of fats potiee ts 04 amined fed » OneL. preset
th complele, a fine check i alone fo remove WY
hupheate
Rule Incluction wig a eG, wanhal cove nng
Deere a
Y- then rele cap ha ee ratlel olivectly show
Be training AAR Cwithout owing fo geherak a iy
olecisian free pier) euing a sequen yet covenng
alzon tin. (tey yee one Learned hey wn prelly Lone abe
Hime), where each mth sor a en Clan rll jchally,
Cover many of fre clan Heaplers A epycen pal Couenny,
Crlyenthor Are Ke mer widely wed UYproach to
miing Aiyuntive se of clanzrcab sale.
There ane mmo seayentd efgonithroe
oO Ae D RippeR ete
vi) evaom
Genrral Shral§y flows
) Puls cue btaroe coe at @ dime.
i fearned, Ho puple Covered
aw
by Yoo seule ave
on He
of le 4 in coohast fo
Fah time 4 gele
romoued , and He psocen oop”
Wey negucnt al Laine
rem ating Agples -
alecttion per incheleS
Algontbm + Seyucntial buenng .
dean Q Set Of B- Fer yuls fo, Clow fitalsn
i Lnpet
* D, a ocala set of Chu. Jabeld Juphe
pi.val, ye seh of all atnbuls % rifete poet he
outputs A sel of Lf. THEN velo. eles
METHOD :
Wid Pubes = $237) tnifrad get of xute gee
@ te each claucelo eye
a & repent
tp Rute = Learn. be Bale CD, by valt, J;
ie remove tuple: covecd by Rule hom Dd
w Rube_see = Rute cet + pole; Dada ated yucle
@ worl Permmat rf Coneh'Iroo Ye set
@gY enor
y FeJurn Lule -00et- ;Aods Hye "BEI.
The earn ~one - Pale poceelare
curred Jet Y
cunent tan, given We
yule for He
Praidirg rupbes.
How are veles teanedd 9
i Typreally » valor Qe frowo yn Oo genes cl ~re-
gan Sind
Thy 6 Mk 2%
Specific manne
empy wih en 99 cchetlld
aD empy Y g Hin Grae all,
— slat df wh
keep appending adn huliy tele to it.
— we gppend by ade th adwhule fehl O12 y
Gyical Conjnt to phe existing Cone: W100 wf 4a
yl an fececlef ‘
syper our peiniy Feb D, Ceoush of hab
appuiceetin tale. Aba buli: regarchg each applicant
potluck tkeir
— He, income, echucetion evel, verichau, trecket
ating, Y ibe Ferm ef a bead
The clawityng _attabeli 6 auep/ed ( gale)
ban chitceo .
reected (nicky)
 
coith Me aude fr He chau acest we ofad ff
ewird mat powerful yale anfececkat 2} empty
The weber
ip Then loan cleisivo= accepti —
G
THEN loan. chetc'oo = ace?
   
       
 
  
    
Ip 10 meh
HEN (pan. fect 0
sacs
If loanderm:
adem: Tye sncome = hiph
yhen loan. dhtiaioo = accept
  
  
   
 
 
      
   
™~
NN
®
> peor high Po
. Manocedgi)  imone sh 6 Uae gh Aap
AnD *ft= cred 1 vat = realtor? Cyed’h vary -fair
mm ddlt-ofe Then [oan- febisao-
lyhen Aan. Then Iran. decis0o= LEP ”
Cle wit «= J) accep
Le
v
Ligh ood
yf Upeomes
age = outs
e then loan_che ctsi09
ye err
Pale preening
Learn.00e- Rul Clow aod empley q fed set ashen
eueluctting yeler. Bile ney poser axl! 00 He
freining Clata , but Ae well 00 tebsequeat hts. To
Compenseele fox Hi we can prune He yer, Whe ti
pruned by yemeungy 2 conjren te Cab bul jut),
 
3)wwe Choere fo pre a rule, By A Hho prrect LS)
veroon of R hat greale quatty , a aseuccl 00
an Independent Set Of Aeplu, Ae P
prunitg, ux defer to Ahi feb as Q  prenry seb
tesco fee
Vane proning shares tan £0 deed such a
Phe pessimistic prebhin§ ~ eppoath -
folk tues a Singh gee Cffictive meted.  Crveny
Q rele, 2
 
fore. prone Ch) 2 2YW PF
Pet AD , i
7 )
tap lee cokn,
where pos 4 neg - Mp. of posite 4g negative
4y respectively
Thi Valeo will /ncvectte with fhe Atel 2 iy, fk
On a prening sek.
ot fou-pre velue ai higher for ThE
pruned versv00 2, ee es
°
Chace rfitatiod By packpiopé Ae
OO RY Se
sek of Connell
 
 
 
A
foput [ oupet ente 10 which bc connethan fas &
evergh) auccelid cit vt. Dy fxg Ate Learning P in,
yee net work keane $7 Adffthry ge cxightt tow
jo  wbh to preckict torreck ches Fabel of Ma
jnpek typle peural Atehoork fearniy clue to YEA
Cun,
32
conne tion hobvern