0% found this document useful (0 votes)

71 views13 pages

Multi-Classification by Using Tri-Class SVM

Uploaded by

Joseph Jose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views13 pages

Multi-Classification by Using Tri-Class SVM

Uploaded by

Joseph Jose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Neural Processing Letters (2006) 23:89–101 © Springer 2006

DOI 10.1007/s11063-005-3500-3

Multi-Classiﬁcation by Using Tri-Class SVM

CECILIO ANGULO1, , FRANCISCO J. RUIZ1 , LUIS GONZÁLEZ2 , and

JUAN ANTONIO ORTEGA3
1
Grup de Recerca en Enginyeria del Coneixement, Universitat Politècnica de Catalunya,
Av. Vı́ctor Balaguer s/n. 08800 – Vilanova i la Geltrú, Spain. e-mail: cecilio.angulo@upc.edu
2
Departamento de Economı́a Aplicada I, Universidad de Sevilla, Avenida Ramón y Cajal,
1. 41018 – Sevilla, Spain
3
Escuela Técnica Superior de Ingeniería Informática, Universidad de Sevilla, Avenida Reina
Mercedes, s/n. 41012 – Sevilla, Spain

Abstract. The standard form for dealing with multi-class classification problems when bi-
classifiers are used is to consider a two-phase (decomposition, reconstruction) training
scheme. The most popular decomposition procedures are pairwise coupling (one versus one,
1-v-1), which considers a learning machine for each Pair of classes, and the one-versus-all
scheme (one versus all, 1-v-r), which takes into consideration each class versus the remain-
ing classes. In this article a 1-v-1 tri-class Support Vector Machine (SVM) is presented. The
expansion of the architecture of this machine into three categories specifically addresses the
decomposition problem of how to prevent the loss of information which occurs in the usual
1-v-1 training procedure. The proposed machine, by means of a third class, allows all the
information to be incorporated into the remaining training patterns when a multi-class prob-
lem is considered in the form of a 1-v-1 decomposition. Three general structures are pre-
sented where each improves some features from the precedent structure. In order to deal
with multi-classification problems, it is demonstrated that the final machine proposed allows
ordinal regression as a form of decomposition procedure. Examples and experimental results
are presented which illustrate the performance of the new tri-class SV machine.
Key words. bi-classifier, multi-classification, ordinal regression, Support Vector Machine
Abbreviations. 1-v-1 – one versus one; all versus all; pairwise coupling; 1-v-r – one ver-
sus the rest; one versus all; s.t. – subject to; SV – Support Vector; SVM – Support Vector
Machine

1. Introduction
Support Vector Machines (SVMs) are learning machines which implement the
structural risk minimization inductive principle to obtain good generalization on
a limited number of learning patterns. This theory was originally developed by
Vapnik on the basis of a separable binary classiﬁcation problem with signed out-
puts ±1 [21].
The SVM presents good theoretical properties and behaviour in problems of
binary classiﬁcation [9]. There are several papers which generalize the original

Corresponding author.
90 CECILIO ANGULO ET AL.

bi-class approach to multi-classiﬁcation problems [16, 17, 1] through different algo-

rithms, such as 1-v-r SVM or 1-v-1 SVM (see [15] for a comparison of SVM
multi-class methods). In this work it is assumed that problems with more than
2 classes will be considered, hence the original bi-class SVM is extended to a
more general tri-class SVM approach. The proposed final tri-class machine is
presented in a three-stage procedure: first the original idea of a third class is
introduced which was developed by Angulo and Català [3, 2]; secondly a more spe-
cific machine, as proposed by Angulo and González [5] is presented; finally, the
proposed novel tri-class SVM is explained, which implies a huge computational
cost reduction with respect to the former proposals, and a meeting point for both
classification and ordinal regression techniques.
The rest of the article is organized as follows: in Section 2, the standard SVM
classification learning paradigm is briefly presented in order to introduce some
notation. Section 3 is devoted to a short introduction about SVMs for multi-
classification. In Section 4, the 1-v-1 tri-class SV Machine is presented, and its
faster computational counterpart is derived in Section 5. Examples and experi-
mental results are displayed in Section 6 to illustrate its behaviour and strengths.
Finally, some conclusions are drawn and future research suggested.

2. Bi-Class SV Machine Learning

The SV Machine is an implementation of a more general regularization principle
known as the large margin principle. Let

Z = {(x1 , y1 ), . . . , (xn , yn )} = {z1 , . . . , zn } ∈ (X × Y)n (1)

be a training set, where X is the input space and

Y = {θ1 , θ2 } = {−1, +1} (2)

the output space. Let

φ : X → F ⊆ Rd (3)

be a feature mapping, with φ = (φ1 , . . . , φd ), for the usual ‘kernel trick’. F is

named feature space. Let
def
x = φ(x) ∈ F (4)

be the representation of x ∈ X . A binary linear classiﬁer,

fw (x) = φ(x), w + b = x, w + b (5)

is sought in the space F, with fw : X → F → R, b ∈ R, and where outputs are

obtained by thresholding the classiﬁer, hw (x) = sign(fw (x)). According to [12], the
MULTI-CLASSIFICATION BY USING TRI-CLASS SVM 91

classiﬁer w with the largest geometrical margin on a given training sample Z can
be written as
def
wSVM = arg max w
1
· min yi xi , w. (6)
w∈F zi ∈Z

One practical method of dealing with the problem is to minimize the norm w in
(6) with the geometrical margin ﬁxed to unity

min 1 w2
w∈F 2
(7)
s.t. yi xi , w 1 zi ∈ Z.

The solution can be expressed in the form

wSVM = αi yi xi ; fwSVM (x) = αi yi k(xi , x), (8)
i i

where k(x, x ) = φ(x), φ(x ) = x, x is the kernel function, and only a few αi are
not zero; those associated to the so-called support vectors.

3. SV Machine for Multi-Classiﬁcation

Let Z be a training set. Now, a set of possible labels {θ1 , . . . , θ }, with > 2 will
be considered. Subsets Zk ∈ Z, deﬁned as

Zk = {zi = (xi , yi ): yi = θk } (9)

generate a partition in Z, and nk = #Zk , hence n = n1 + n2 + · · · + n . If Ik is deﬁned

as the set of indexes i where zi ∈ Zk , it follows that,

{(xi , yi )} = Zk . (10)
i∈Ik

A very common decomposition procedure for multi-classiﬁcation when SVMs

are considered is 1-v-1 SVM: a ﬁrst decomposition phase generates several learning
machines in parallel, whereby each machine takes only two classes into consid-
eration. A reconstruction scheme then allows the calculation of the overall out-
put by merging outputs from the decomposition phase. In this approach, ·(−1) 2
binary classiﬁers are trained to generate hyperplanes fkh , 1 k < h , by separat-
ing training vectors Zk with label θk from training vectors in class θh , Zh . If fkh
discriminates without error then sign(fkh (xi )) = 1, for zi ∈ Zk and sign(fkh (xi )) =
−1, for zi ∈ Zh . Remaining training vectors Z\ {Zk ∪ Zh } are not considered in the
optimization problem. Hence, for a new entry x, the numeric output from each
machine fkh (x) is interpreted as,

θ if sign(fkh (x)) = 1
(fkh (x)) = k (11)
θh if sign(fkh (x)) = −1.
92 CECILIO ANGULO ET AL.

In the reconstruction phase, the label distribution generated by the trained

machines in the parallel decomposition is considered through a merging scheme.
The 1-v-1 multi-classiﬁcation approach is usually preferred to the 1-v-r scheme
[16] because it takes less training time, despite studies such as [19]. Moreover,
according to [15] it would be difﬁcult to say which one gives better accuracy. The
main drawback for this approach is that only data from two classes is consid-
ered for the training of each machine, therefore output variance is high and any
information from the rest of the classes is ignored.

If a hyperplane fkh must classify an input xi with i ∈ / Ik Ih , only output
fkh (xi ) = 0 will be translated into a correct interpretation. The natural improve-
ment to be analysed is the enforcement of every training input in a different class
from θk and θh to be contained in the separating hyperplane fkh (x) = 0.

3.1. the k-svcr machine. a ﬁrst approach

In [2], the ﬁrst tri-class procedure, the K-SVCR machine, was presented where
remaining training vectors are forced to be encapsulated in a δ-tube, 0 δ < 1,
along the separation hyperplane. Parameter δ allows the creation of a slack zone
(a ‘tube’) around the hyperplane where remaining training vectors are covered. The
separating hyperplane must solve the optimization problem,
1
min w2 + C1 · ξi + C 2 · (ϕj + ϕj∗ )
w∈F 2
i j
⎧
⎪
⎪ yi w, xi 1 − ξi zi ∈ Z1,3 (12)
⎪
⎨ −δ − ϕ ∗ w, x δ + ϕ
j j j zj ∈ Z2
s.t.
⎪
⎪ ξ i 0 z i ∈ Z1,3
⎪
⎩ ∗
ϕj , ϕj 0 zj ∈ Z2 ,

where Z1,3 are the patterns belonging to the classes labelled as {−1, +1} and Z2
are those labelled with 0. The solution has a similar form to (8), where αi are the

multipliers associated to the problem, such that i αi = 0. For a new entry x, the
numeric output from the machine fw (x) is interpreted as
⎧
⎨ 1 if fw (x) > δ
(fw (x)) = −1 if fw (x) < −δ (13)
⎩
0 if |fw (x)| δ.
This approach has demonstrated good results on standard ‘benchmarks’ [2],
however, for the general case, it is necessary to select many parameters,1 such
as: (i) k, kernel function; (ii) C1 , associated weight for the sum of errors in the
two discriminated classes; (iii) C2 , associated weight for the sum of errors in the
remaining classes; (iv) δ, insensitivity parameter.

1 An extended study can be found in [10].

MULTI-CLASSIFICATION BY USING TRI-CLASS SVM 93

3.2. robust decomposition – reconstruction procedure

The K-SVCR machine improves standard algorithms treating 2-class classiﬁcation
problems during the decomposing phase of a general multi-class scheme: by focus-
ing the learning on 2 classes, but using all the disposable information on the pat-
terns. Now, a second theoretical advantage of the ‘third-class approach’ will be
enunciated, the robustness of the reconstruction procedure [6]. To make evident
this assertion, a deﬁnition must be done.

DEFINITION 1. Let x ∈ X be an entry having a known output, θm . Let

#fmerr
εrob (x, F ) =
Lm
be the rate between the number of classifiers concerning class θm producing a
wrong output, #fmerr , and the total number of concerned classifiers with class
θm , Lm , being correct the final multi-class architecture output, F (x) = θm . The
robustness parameter

εrob (F ) = arg min εrob (x, F ) ∀x ∈ X

determines that a general decomposition and reconstruction multi-class architec-

ture A1 is more robust than A2 if
1 1 2 2
εrob = min εrob (F ) > min εrob (F ) = εrob , (14)
F ∈A1 F ∈A2

where superscripts refer to the global architecture being considered.

Basically, the robustness parameter speciﬁes, for the worst case, how many clas-
siﬁers concerned with the class of the entry could be wrong while the multi-class
architecture output is still correct.
Now, it can be enunciated the following Proposition [6].

PROPOSITION 2. If K is the number of classes in consideration, the multi-class

architecture based on a three-classes machine, like K-SVCR machine, with a voting
reconstruction scheme F has a robustness parameter
2 (K − 2)
εrob = .
K (K − 1)

In a similar way the following Proposition can be demonstrated [6].

PROPOSITION 3. A standard multi-class architecture based on 1-v-r 2-class classi-

ﬁers decomposition and a voting reconstruction scheme has a robustness parameter

εrob = 0.
94 CECILIO ANGULO ET AL.

A standard multi-class architecture based on 1-v-1 2-class classiﬁers decomposition

and voting reconstruction scheme has a robustness parameter

εrob = 0.

A ‘pairwise’ multi-class architecture [16] based on 1-v-1 2-class classiﬁers decom-

position and ‘pairwise’ voting reconstruction scheme has a robustness parameter

εrob = 0.

A DAGSVM architecture [18] has a robustness parameter

εrob = 0.

4. 1-v-1 Tri-Class SVM. A Second Approach

The number of tuning parameters can be reduced if the margin to be maximized
in (7) is that deﬁned between the patterns assigned with output {−1, +1}, and the
entries labelled with 0, which are the remaining patterns. In this case, the width
of the ‘decision tube’ along the decision hyperplane where 0-labelled patterns are
allocated is not considered ‘a priori’ and the δ parameter is eliminated. A classiﬁer
with this characteristic must accomplish

def 1
wSV 3 = arg max · min yi xi , w − max |xi , w| . (15)
w∈F w zi ∈Z1,3 zi ∈Z2

When w is minimized while the rest of the product is ﬁxed to the unitary dis-
tance, (15) can be translated into the more manageable2
2
2 w
1
min
w∈F
(16)
s.t. yi xi , w 1 + xj , w zi ∈ Z1,3 ; zj ∈ Z2 .

This optimization problem is consistent with the standard formulation since if

all the 0-labelled training patterns are exactly on the decision hyperplane, (i.e. no
incorrect interpretation is possible), or these patterns are not considered in the
problem, then the novel machine would be similar to the 1-v-1 SVM machine.
Restrictions can be relaxed to allow some degree of noise on the ±1-labelled
training patterns by using ‘slack’ variables

ξi = 1 + max xj , w − yi xi , w 0 zi ∈ Z1,3 (17)
zj ∈Z2

2 Constraints are slightly stricter than (15).

MULTI-CLASSIFICATION BY USING TRI-CLASS SVM 95

and restrictions in (16) can be manipulated to obtain the optimization problem [5]

2
2 w + C
1
min ξi
w∈F
i
(18)
yi xi − xj , w − 1 + ξi 0 zi ∈ Z1,3 ; zj ∈ Z2
s.t.
ξi 0 zi ∈ Z1,3 .

When Lagrange multipliers are applied to the original optimization problem, it is

obtained

L = 21 w2 + C ξi + αij (1 − ξi − yi xi − xj , w) − µi ξi (19)
i ij i

with

0 αij C, zi ∈ Z1,3 ; w= yi αij (xi − xj ). (20)
j ij

The dual problem is therefore,

max αij − yi yk αij αkl xi − xj , xk − xl
α
i,j ij kl
⎧
⎨0 αij C (21)
s.t. j
⎩
αij , αkl 0, zi , zk ∈ Z1,3 ; zj , zl ∈ Z2

and the solution function can be written,

fw (x) = αij yi k(xi , x) − k(xj , x) . (22)
ij

For a new entry x, output is interpreted in accordance with (13), where

δ = max fw (xj ) = max w, xj . (23)
zj ∈Z2 zj ∈Z2

In Figure 1, the behaviour of the 1-v-1 tri-class machine is illustrated by using

a simple linearly separable problem with a Gaussian kernel. Support vectors (SVs)
are those patterns with associated null parameters, i.e. a null row or column in the
parameter matrix. As expected, the number of support vectors is limited and they
lie in the margin. Solid lines indicate the δ-tube for the ‘remaining vectors’ belong-
ing to the ‘third class’ and the dotted line represents the separating hyperplane. It
must be noted that values for 0 < δ < 1 are very low, in this example 0.1126, 0.1750
and 0.2159.
96 CECILIO ANGULO ET AL.

(a) Training da ta . (b) Class 1 vs 2. 9 SVs.

(c) Class 1 vs 3. 9 SVs. (d) Class 2 vs 3. 8 SVs.

Figure 1. Results of the 1-v-1 tri-class machine applied to a simple separable problem with 45 patterns.

5. 1-v-1 Tri-Class SVM Revised. A Third Approach

By means of a tri-class scheme, both the K-SVCR and the 1-v-1 tri-class SVM
allow the incorporation of all the information contained in the training patterns
when a multi-class problem is considered. For the 1-v-1 tri-class SVM, information
from ‘remaining patterns’ is captured in a δ-tube, where δ is an optimal param-
eter which is automatically obtained by maximizing the margin between classes.
However, this automatic tuning of the parameter leads to a computationally more
expensive optimization problem. This computational effort must be reduced.
By observing the nature of the constraints in the optimization problem (18), an
almost direct relation with respect to ordinal regression problems could be investi-
gated. In this sense, Shashua and Levin [20] have recently developed a ﬁxed mar-
gin strategy to deal with ordinal regression problems by means of large margin
algorithms such as SVMs. This strategy considers all the classes at once, but with-
out squaring the size of the training data. Hence, the procedure seeks parallel
hyperplanes by separating consecutive classes through the optimization problem
j ∗j +1

2
2 w + C ξi + ξi
1
min
w∈F ;bj ∈R
⎧ i j
j
⎪
⎨ x i , w − b j −1 + ξi zi ∈ Zj (24)
∗j +1
s.t. xi , w − bj 1 − ξi zi ∈ Zj +1
⎪
⎩ j ∗j +1
ξi , ξ i 0

where j = 1, . . . , − 1.
MULTI-CLASSIFICATION BY USING TRI-CLASS SVM 97

When comparing the 1-v-1 tri-class approach (18) and the formulation in (24),
it follows that (18) can be obtained from (24) when the number of categories to
be considered is three, = 3, if the constraints which have similar bias bj are sub-
tracted , and a double value for the margin is considered. Hence,

min 1
2 w2 + C ξi1 + ξi2 + ξi∗2 + ξi∗3
w∈F ;b1 ,b2 ∈R
⎧ i
⎪
⎪ x , w − b 1 −1 + ξi1 zi ∈ Z1
⎪
⎪
i
∗2
⎪
⎨ xi , w − b1 1 − ξi zi ∈ Z2 (25)
2
s.t. xi , w − b2 −1 + ξi zi ∈ Z2
⎪
⎪
⎪ xi , w − b2 1 − ξi∗3
⎪ zi ∈ Z3
⎪
⎩ 1 2 ∗2 ∗3
ξi , ξ i , ξ i , ξ i 0
leads to

min 2
2 w + C
1
ξi1 + ξi2 + ξi∗2 + ξi∗3
w∈F ;b1 ,b2 ∈R
⎧ i
⎪ x
⎨ j − x i , w 2 − ξi1 − ξj∗2 zi ∈ Z1 ; zj ∈ Z2 (26)
s.t. xi − xj , w 2 − ξj2 − ξi∗3 zi ∈ Z3 ; zj ∈ Z2
⎪
⎩ ξi1 + ξi∗2 , ξi2 + ξi∗3 0

which is the same problem as (18) but with a double margin.

Indeed, it has been demonstrated that this ordinal regression approach can be
used in a similar way to the tri-class SVM in the decomposition – reconstruction
multi-classification procedure established in previous sections, by separating all the
patterns into three ensembles, labelled {−1, 0, 1}.
The size of the optimization problem associated to the 1-v-1 tri-class machine
has been drastically reduced. Hence, if a multi-classification problem of classes
is considered, where each class has the same number of patterns, (i.e. n patterns
for classes labelled ±1 and (−2)n
patterns for the 0-labelled class), the first opti-
mization problem has to fulfil a number of restrictions of O(n2 ), while the new
version has an order of O(n). When all the necessary 1-v-1 tri-class machines are
considered in the multi-classification schema, (−1)2 , then the global number of
constraints is O(2 n).
In Figure 2, the performance of the novel machine is shown when it is applied
with a Gaussian kernel on a non linearly separable multiclass problem. Classifiers
are combined by a majority voting scheme to produce the final multiclass classi-
fication. It can be observed that a little band between classes remains unclassified
since the outputs from the parallel decomposition phase assign this zone to differ-
ent classes.

6. Experimental Results
In this section, experimental results are presented for several problems from the
usual UCI Repository of machine learning databases [7]. A summary of the
98 CECILIO ANGULO ET AL.

Figure 2. Results of the 1-v-1 tri-class machine applied to a simple separable problem with 45 patterns.

Table 1. Characteristics of the selected datasets

from the UCI repository.

Dataset Patterns Classes Features

Iris 150 3 4
Wine 178 3 13
Glass 214 6 9
Vowel 528 11 10
Vehicle 846 4 18
DNA 2000 (1186) 3 180

characteristics of the selected datasets (Iris, Wine, Glass, Vowel, Vehicle and DNA)
is in Table 1. DNA dataset contains training and testing data.
The results have been obtained by following the experimental framework which
was proposed by [15] and was continued in [1], but with some modifications intro-
duced to incorporate the suggestions in [14] and [22]. Hence, training data have
not been scaled for their inclusion in [−1, +1], but have been normalized, (that is,
mean zero and standard deviation one), in order to avoid problems with outliers.
Test data are normalized accordingly.
The algorithms considered are the standard 1-v-1 and 1-v-r formulation and the
1-v-1 Tri-Class SVM in its final revised form for multi-classification. Their perfor-
mance, (in the form of accuracy rate), has been evaluated on models using the
Gaussian kernel,

−xi −xj
2
2σ 2
k(xi , xj ) = exp (27)

therefore two hyperparameters must be set: the regularization term C and the
width of the kernel σ . This space is explored
on a two-dimensional grid with the
following values: C = 2 , 2 , . . . , 2
4 3 −10 12 11 −2
and γ = 2 , 2 , . . . , 2 , where γ = 2σ1 2 .
MULTI-CLASSIFICATION BY USING TRI-CLASS SVM 99

Table 2. A comparison of the best accuracy rates using the RBF

kernel.

Dataset CV 1-v-1 (C,γ ) 1-v-r (C,γ ) Tri-class (C,γ )

Iris 30 96.73 (20 , 23 ) 96.00 (26 , 24 ) 95.49 (28 , 2−2 )

Wine 25 98.39 (211 , 23 ) 97.86 (22 , 23 ) 97.06 (27 , 23 )
Glass 10 70.91 (23 , 21 ) 71.11 (29 , 24 ) 71.81 (2−1 , 2−7 )
Vowel 10 98.95 (23 , 20 ) 98.48 (23 , 2−1 ) 99.36 (23 , 20 )
Vehicle 3 84.17 (28 , 24 ) 86.21 (28 , 24 ) 88.18 (26 , 22 )
DNA – 95.45 (23 , 2−5 ) 95.78 (21 , 2−6 ) 95.86 (22 , 2−7 )

The criteria used to estimate the generalized accuracy is a ten-fold cross-validation

on the whole training data, except for the DNA dataset. This procedure is repeated
between 3 and 30 times, according to the size of the dataset, in order to ensure
good statistical behaviour. The optimization algorithm used is the exact quadratic
program-solver provided by Matlab, except for the Vowel and DNA datasets, that an
iterative solver has been employed [8]. The best cross-validation mean rate among
the several pairs (C, γ ) is reported in Table 2.
It can be observed that similar performance results are obtained by all three
approaches, however slight differences can be appreciated.

7. Conclusions and Future Work

In this paper, a new kernel machine has been designed to solve multi-
classification problems. Initially, it has been proved that, by means of a tri-class
scheme, the machine allows the incorporation of all the information contained in
the training patterns when a multi-class problem is considered. Information from
‘remaining patterns’ is captured in a δ-tube, where δ is an optimal parameter which
can be automatically obtained by maximizing the margin between classes.
New formulation with automatic tuning of parameter δ is very time-consuming,
since comparisons between classes must be realized, similar to an ordinal regres-
sion procedure. An algorithm in [20] avoids making comparisons between classes
when a preference learning task is performed, which speeds up the computation
time considerably. However, all the hyperplanes considered must be parallel, hence
the explanation power of the machine is reduced, and the use of the machine is
restricted to ordinal regression. Our approach is an improvement on the machine
in [20], since it is possible that hyperplanes are not parallel, which improves their
explanation power, and our approach can therefore be used for multi-classification
tasks.
By observing the constraints in the optimization problem, a more direct exten-
sion to ordinal regression problems is under investigation. A first natural choice
would be to use a 1-v-1 tri-class SVM to solve preference learning problems, in
the same way as the K-SVCR machine was developed for this utilization in [4],
100 CECILIO ANGULO ET AL.

in accordance with the approach presented in [13]. However, it is still necessary to

use constraints on the differences between patterns of different classes.
When hyperplanes are merged to obtain the ﬁnal multi-class solution, only
signed outputs are considered in the voting scheme, so ties between classes are con-
sidered as errors. An initiated research line is the probabilistic interpretation of the
outputs in accordance with their value [11].

Acknowledgements
This study was partially supported by Junta de Andalucı́a grant ACPAI-2003/014,
and Spanish MCyT grant TIC2002-04371-C02-01.

References
1. Anguita, D., Ridella, S. and Sterpi, D.: A New Method for Multiclass Support Vector
Machines. In: Proceedings of the IEEE IJCNN2004. Budapest (Hungary), 2004.
2. Angulo, C.: Learning with Kernel Machines into a Multi-Class Environment. Doctoral
thesis, Technical University of Catalonia. In Spanish, 2001.
3. Angulo, C. and Català, A.: A Multi-class Support Vector Machine. Lecture Notes in
Computer Science, 1810 (2000), 55–64.
4. Angulo, C. and Català, A.: Ordinal regression with K-SVCR machines. In: J. Mira and
A. Prieto (eds.), Proceedings of IWANN 2001, Part I, Vol. 2084 of Lecture Notes in
Computer Science. pp. 661–668, 2001.
5. Angulo, C. and González, L.: 1-v-1 tri-class SV machine. In: Proceedings of the 11th
European Symposium on Artificial Neural Networks. Bruges (Belgium), pp. 355–360, 2003.
6. Angulo, C., Parra, X. and Català, A.: K-SVCR. A support vector machine for multi-
class classification. Neurocomputing, 55(1–2), (2003) 57–77.
7. Blake, C. and Merz, C.: UCI Repository of Machine Learning Databases, 1998.
8. Canu, S., Grandvalet, Y. and Rakotomamonjy, A.: SVM and Kernel Methods Matlab
Toolbox. Perception Systèmes et Information. INSA de Rouen, Rouen, France, 2003.
9. Cristianini, N. and Shawe-Taylor, J.: An Introduction to Support Vector Machines and
other Kernel-based Learning Methods. Cambridge University press, 2000.
10. González, L.: Discriminative analysis using kernel vector machines support. The similar-
ity kernel function. Doctoral thesis, University of Seville. In Spanish, 2002.
11. González, L., Angulo, C., Velasco, F., and Vílchez, M.: Máquina -SVCR con salidas
probabilı́sticas (-SVCR machine with probabilistic outputs). Inteligencia Artificial. Revi-
sta Iberoamericana de IA (17) (2002) 72–82. In Spanish.
12. Hebrich, R.: Learning Kernel Classifiers. Theory and Algorithms. The MIT Press, 2002.
13. Herbrich, R., Graepel, T. and Obermayer, K.: Advances in Large Margin Classifiers,
Chapt. Large Margin Rank Boundaries for Ordinal Regression, pp. 115–132. Cambridge,
MA: MIT Press, 2000.
14. Hsu, C.-W., Chang, C.-C. and Lin, C.-J.: A practical guide to support vector classifica-
tion. Technical report, Department of Computer Science and Information Engineering,
National Taiwan University, 2003.
15. Hsu, C.-W. and Lin, C.-J.: A Comparison of methods for multiclass support vector
machine. IEEE Transactions on Neural Networks, 13(2), (2002) 415–425.
16. Kressel, U.: Pairwise classification and support vector machine. In B. Schölkopf, C.
Burgues and A. Smola (eds.) Advances in Kernel Methods: Support Vector Learning,
pp. 255–268, Cambridge, MA: MIT Press, 1999.
MULTI-CLASSIFICATION BY USING TRI-CLASS SVM 101

17. Mayoraz, E. and Alpaydin, E.: Support vector machines for multi-class classification. In:
J. Mira and J. V. Sánchez-Andrés (eds.), Proceedings of IWANN 1999, Part II, Vol. 1607
of Lecture Notes in Computer Science, 1999.
18. Platt, J., Cristianini, N. and Shawe-Taylor, J.: Large margin DAGs for multiclass classifi-
cation. Neural Information Processing Systems, 12 (2000).
19. Rifkin, R. and Klautau, A.: In defense of one-vs-all classification. Journal of Machine
Learning Research, 5 (2004), 101–141.
20. Shashua, A. and Levin, A.: Taxonomy of large margin principle algorithms for ordinal
regression problems. Neural Information Processing Systems, 16 (2002).
21. Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, Inc., 1998.
22. Vert, J.-P., Tsuda, K. and Schölkopf, B.: Kernel Methods in Computational Biology,
Chapt. A Primer on Kernel Methods, pp. 35–70. The MIT Press, 2004.

Multi-Class Classification Using Support Vector Machines in Binary Tree Architecture
No ratings yet
Multi-Class Classification Using Support Vector Machines in Binary Tree Architecture
6 pages
SVM&Hiercarcal
No ratings yet
SVM&Hiercarcal
30 pages
A Comparison of Methods For Multi-Class Support Vector Machines
No ratings yet
A Comparison of Methods For Multi-Class Support Vector Machines
26 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
A Fuzzy Classification Method Based On Support Vector Machine
No ratings yet
A Fuzzy Classification Method Based On Support Vector Machine
4 pages
Module10 - Support Vector Machine
No ratings yet
Module10 - Support Vector Machine
23 pages
SVM MJJ
No ratings yet
SVM MJJ
19 pages
Support Vector Machine in R Paper
No ratings yet
Support Vector Machine in R Paper
28 pages
SVM Versus Least Squares SVM
No ratings yet
SVM Versus Least Squares SVM
8 pages
A New Heuristic of The Decision Tree Induction: Ning Li, Li Zhao, Ai-Xia Chen, Qing-Wu Meng, Guo-Fang Zhang
No ratings yet
A New Heuristic of The Decision Tree Induction: Ning Li, Li Zhao, Ai-Xia Chen, Qing-Wu Meng, Guo-Fang Zhang
6 pages
Support Vectors
No ratings yet
Support Vectors
7 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Machine Learning Term Test 2
No ratings yet
Machine Learning Term Test 2
20 pages
A Comparison of Binary and Multiclass Support Vector
No ratings yet
A Comparison of Binary and Multiclass Support Vector
5 pages
Fuzzy Support Vector Machines: IEEE Transactions On Neural Networks March 2002
No ratings yet
Fuzzy Support Vector Machines: IEEE Transactions On Neural Networks March 2002
9 pages
Taz TFG 2016 2057
No ratings yet
Taz TFG 2016 2057
52 pages
A Practical Guide To Support Vector Classification: I I I N L
No ratings yet
A Practical Guide To Support Vector Classification: I I I N L
12 pages
UBICC Article 522 522
No ratings yet
UBICC Article 522 522
8 pages
SVM for Customer Classification
No ratings yet
SVM for Customer Classification
28 pages
Information 12 00515
No ratings yet
Information 12 00515
19 pages
SVM (Repaired)
No ratings yet
SVM (Repaired)
39 pages
Support Vector Machine Ensemble With Bagging
No ratings yet
Support Vector Machine Ensemble With Bagging
13 pages
1.4. Support Vector Machines - Scikit-Learn 1.5.1 Documentation
No ratings yet
1.4. Support Vector Machines - Scikit-Learn 1.5.1 Documentation
20 pages
Support Vector Machine
No ratings yet
Support Vector Machine
29 pages
2012 - Huang Et Al. Extreme Learning Machine For Regression and Multiclass Classification
No ratings yet
2012 - Huang Et Al. Extreme Learning Machine For Regression and Multiclass Classification
17 pages
Lecture10 SVM
No ratings yet
Lecture10 SVM
22 pages
Support Vector Machines
No ratings yet
Support Vector Machines
19 pages
SVMs: Theory Meets Practice
No ratings yet
SVMs: Theory Meets Practice
12 pages
Advanced SVM Optimization Techniques
No ratings yet
Advanced SVM Optimization Techniques
20 pages
A Class-Incremental Learning Method For Multi-Class Support Vector Machines in Text Classification
No ratings yet
A Class-Incremental Learning Method For Multi-Class Support Vector Machines in Text Classification
5 pages
Support Vector Machine Guide
No ratings yet
Support Vector Machine Guide
21 pages
The SVM Classifier Based On The Modified Particle Swarm Optimization
No ratings yet
The SVM Classifier Based On The Modified Particle Swarm Optimization
9 pages
Liu, 2021 - Projection - Multiobj - SVM
No ratings yet
Liu, 2021 - Projection - Multiobj - SVM
13 pages
11597-Article Text-7395-2-10-20220813
No ratings yet
11597-Article Text-7395-2-10-20220813
5 pages
SVM Tutorial
No ratings yet
SVM Tutorial
28 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Artigo Smallex
No ratings yet
Artigo Smallex
17 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
w04 LectureSlices MA4550
No ratings yet
w04 LectureSlices MA4550
32 pages
ML - 05 - Support Vector Machines
No ratings yet
ML - 05 - Support Vector Machines
52 pages
Learning Theory: y For Examples
No ratings yet
Learning Theory: y For Examples
11 pages
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
No ratings yet
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
7 pages
SVM Basics Paper
No ratings yet
SVM Basics Paper
7 pages
Kobe University Repository: Kernel
No ratings yet
Kobe University Repository: Kernel
7 pages
A Practical Guide To Support Vector Classification: I I I N L
No ratings yet
A Practical Guide To Support Vector Classification: I I I N L
15 pages
Multi-Class SVM Classification Methods
No ratings yet
Multi-Class SVM Classification Methods
17 pages
A Comprehensive Survey On Support Vector Machine Classification Applications, Challenges and Trends - 2019
No ratings yet
A Comprehensive Survey On Support Vector Machine Classification Applications, Challenges and Trends - 2019
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Large Margin DAGs for Multiclass Classification
No ratings yet
Large Margin DAGs for Multiclass Classification
8 pages
Support Vector Machines Guide
No ratings yet
Support Vector Machines Guide
19 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machines As Probabilistic Models
No ratings yet
Support Vector Machines As Probabilistic Models
8 pages
Imp Document3
No ratings yet
Imp Document3
6 pages
Tutorial On Support Vector Machine (SVM) : Abstract
No ratings yet
Tutorial On Support Vector Machine (SVM) : Abstract
13 pages
Pressure Switches: A Basic Guide
No ratings yet
Pressure Switches: A Basic Guide
6 pages
An Engineer's Guide To Selecting A Pressure Relief Valve (Updated 3.3.21)
No ratings yet
An Engineer's Guide To Selecting A Pressure Relief Valve (Updated 3.3.21)
17 pages
Energy Conversion and Management: A B B C D
No ratings yet
Energy Conversion and Management: A B B C D
15 pages
Neurocomputing: Yukun Bao, Zhongyi Hu, Tao Xiong
No ratings yet
Neurocomputing: Yukun Bao, Zhongyi Hu, Tao Xiong
9 pages
Hose Reliability Isn'T A: Stretch
No ratings yet
Hose Reliability Isn'T A: Stretch
38 pages
SVM Parameter Optimization Using Grid Search and G
No ratings yet
SVM Parameter Optimization Using Grid Search and G
8 pages
Event Based Control and Signal Processing Embedded Systems
100% (1)
Event Based Control and Signal Processing Embedded Systems
573 pages
Sensors: Event-Based Feature Extraction Using Adaptive Selection Thresholds
No ratings yet
Sensors: Event-Based Feature Extraction Using Adaptive Selection Thresholds
24 pages
Rotating Machinery Fault Diagnosis
No ratings yet
Rotating Machinery Fault Diagnosis
13 pages
Fault Classification of Uid Power Systems Using A Dynamics Feature Extraction Technique and Neural Networks
No ratings yet
Fault Classification of Uid Power Systems Using A Dynamics Feature Extraction Technique and Neural Networks
11 pages
Machines 05 00021 v3 PDF
No ratings yet
Machines 05 00021 v3 PDF
28 pages
Event-Driven Data Acquisition and Digital Signal Processing-A Tutorial
No ratings yet
Event-Driven Data Acquisition and Digital Signal Processing-A Tutorial
5 pages
Feature Selection Via Sensitivity Analysis of SVM Probabilistic Outputs
No ratings yet
Feature Selection Via Sensitivity Analysis of SVM Probabilistic Outputs
20 pages
10 1016@j Isatra 2019 11 006
No ratings yet
10 1016@j Isatra 2019 11 006
32 pages
SVM Feature Selection Guide
No ratings yet
SVM Feature Selection Guide
19 pages
Feedback Linearisation Applied On A Hydraulic Serv
No ratings yet
Feedback Linearisation Applied On A Hydraulic Serv
7 pages
Pattern Recognition and AI Using Matlab Textbook PDF
No ratings yet
Pattern Recognition and AI Using Matlab Textbook PDF
263 pages
Fault Detection Identification and Accommodation For An Electrohydraulic System An Adaptive Robust Approach
No ratings yet
Fault Detection Identification and Accommodation For An Electrohydraulic System An Adaptive Robust Approach
6 pages
IMECE2003-41457: Multi Criteria Design Optimization of Backhoe Loader Front Mechanism
No ratings yet
IMECE2003-41457: Multi Criteria Design Optimization of Backhoe Loader Front Mechanism
7 pages
Evaluation of Single-Bucket Excavators Energy Consumption: Sciencedirect
No ratings yet
Evaluation of Single-Bucket Excavators Energy Consumption: Sciencedirect
6 pages
Library Progress Paper
No ratings yet
Library Progress Paper
15 pages
PATHFit 4 Volleyball BSED
100% (1)
PATHFit 4 Volleyball BSED
7 pages
7 Bal Vikash Evam Shiksha Shastra Book
No ratings yet
7 Bal Vikash Evam Shiksha Shastra Book
57 pages
The Shri Ram School Progress Report: Name: Viii-E Class: Vasundhara 15-November-2008 Date of Birth: House: Aadit Tuli
No ratings yet
The Shri Ram School Progress Report: Name: Viii-E Class: Vasundhara 15-November-2008 Date of Birth: House: Aadit Tuli
14 pages
Final Defense
No ratings yet
Final Defense
18 pages
Guidance: Mental Health and Psychosocial Support in Emergency Settings
No ratings yet
Guidance: Mental Health and Psychosocial Support in Emergency Settings
125 pages
Ethnography Research Design
No ratings yet
Ethnography Research Design
19 pages
Neopatrimonialism in African Politics
No ratings yet
Neopatrimonialism in African Politics
3 pages
Why Are Families So Important?: Creating A Strong Family
No ratings yet
Why Are Families So Important?: Creating A Strong Family
2 pages
L5 Slides - Computing Systems - Y12
No ratings yet
L5 Slides - Computing Systems - Y12
25 pages
Factors Affecting Learning Mathematics at Secondary A Case of Marma Rural Municipality
No ratings yet
Factors Affecting Learning Mathematics at Secondary A Case of Marma Rural Municipality
28 pages
Mark T. Calhoun, CLAUSEWITZ AND JOMINI Contrasting Intellectual Frameworks in Military Theory
No ratings yet
Mark T. Calhoun, CLAUSEWITZ AND JOMINI Contrasting Intellectual Frameworks in Military Theory
17 pages
Portfolio Assignment Unit 1 - BUS 5511
No ratings yet
Portfolio Assignment Unit 1 - BUS 5511
6 pages
Collaborative Project (20%)
No ratings yet
Collaborative Project (20%)
2 pages
Meso-Level Social Dynamics
No ratings yet
Meso-Level Social Dynamics
2 pages
AI Governance Study Guide
No ratings yet
AI Governance Study Guide
5 pages
Ethics Reviewer
No ratings yet
Ethics Reviewer
4 pages
University of Luxembourg Masters Brochure
No ratings yet
University of Luxembourg Masters Brochure
52 pages
OA - MG Open Access For Everyone Download and Read Over 240 Million Research Papers
No ratings yet
OA - MG Open Access For Everyone Download and Read Over 240 Million Research Papers
1 page
Module 3: Responsible Use of Media and Information
No ratings yet
Module 3: Responsible Use of Media and Information
3 pages
Community Dynamics & Career Paths
No ratings yet
Community Dynamics & Career Paths
14 pages
Hochschild - Review Sexrole
No ratings yet
Hochschild - Review Sexrole
20 pages
Understanding Attribute Substitution
No ratings yet
Understanding Attribute Substitution
5 pages
Embeddedness Debates in Economic Sociology
No ratings yet
Embeddedness Debates in Economic Sociology
5 pages
Nigeria Police Force Recruitment Portal
No ratings yet
Nigeria Police Force Recruitment Portal
2 pages
New Directions in Migration Studies: Towards Methodological De-Nationalism
No ratings yet
New Directions in Migration Studies: Towards Methodological De-Nationalism
13 pages
4Th Quarter: Media and Information Languages
No ratings yet
4Th Quarter: Media and Information Languages
25 pages
Project-Math-IX - 2022-2023 NEW
No ratings yet
Project-Math-IX - 2022-2023 NEW
3 pages
Edexcel GCSE (9-1) Psychology Student Book Brain Instant Download
100% (1)
Edexcel GCSE (9-1) Psychology Student Book Brain Instant Download
137 pages
Human Exceptionality School Community and Family 11th Edition Hardman Test Bank 1
100% (107)
Human Exceptionality School Community and Family 11th Edition Hardman Test Bank 1
13 pages

Multi-Classification by Using Tri-Class SVM

Uploaded by

Multi-Classification by Using Tri-Class SVM

Uploaded by

Neural Processing Letters (2006) 23:89–101 © Springer 2006

Multi-Classiﬁcation by Using Tri-Class SVM

CECILIO ANGULO1, , FRANCISCO J. RUIZ1 , LUIS GONZÁLEZ2 , and

bi-class approach to multi-classiﬁcation problems [16, 17, 1] through different algo-

2. Bi-Class SV Machine Learning

Z = {(x1 , y1 ), . . . , (xn , yn )} = {z1 , . . . , zn } ∈ (X × Y)n (1)

be a training set, where X is the input space and

Y = {θ1 , θ2 } = {−1, +1} (2)

the output space. Let

be a feature mapping, with φ = (φ1 , . . . , φd ), for the usual ‘kernel trick’. F is

be the representation of x ∈ X . A binary linear classiﬁer,

fw (x) = φ(x), w + b = x, w + b (5)

is sought in the space F, with fw : X → F → R, b ∈ R, and where outputs are

The solution can be expressed in the form

3. SV Machine for Multi-Classiﬁcation

Zk = {zi = (xi , yi ): yi = θk } (9)

generate a partition in Z, and nk = #Zk , hence n = n1 + n2 + · · · + n . If Ik is deﬁned

A very common decomposition procedure for multi-classiﬁcation when SVMs

In the reconstruction phase, the label distribution generated by the trained

3.1. the k-svcr machine. a ﬁrst approach

1 An extended study can be found in [10].

3.2. robust decomposition – reconstruction procedure

DEFINITION 1. Let x ∈ X be an entry having a known output, θm . Let

εrob (F ) = arg min εrob (x, F ) ∀x ∈ X

determines that a general decomposition and reconstruction multi-class architec-

where superscripts refer to the global architecture being considered.

PROPOSITION 2. If K is the number of classes in consideration, the multi-class

In a similar way the following Proposition can be demonstrated [6].

PROPOSITION 3. A standard multi-class architecture based on 1-v-r 2-class classi-

A standard multi-class architecture based on 1-v-1 2-class classiﬁers decomposition

A ‘pairwise’ multi-class architecture [16] based on 1-v-1 2-class classiﬁers decom-

A DAGSVM architecture [18] has a robustness parameter

4. 1-v-1 Tri-Class SVM. A Second Approach

This optimization problem is consistent with the standard formulation since if

2 Constraints are slightly stricter than (15).

When Lagrange multipliers are applied to the original optimization problem, it is

The dual problem is therefore,

and the solution function can be written,

For a new entry x, output is interpreted in accordance with (13), where

In Figure 1, the behaviour of the 1-v-1 tri-class machine is illustrated by using

(a) Training da ta . (b) Class 1 vs 2. 9 SVs.

(c) Class 1 vs 3. 9 SVs. (d) Class 2 vs 3. 8 SVs.

5. 1-v-1 Tri-Class SVM Revised. A Third Approach

which is the same problem as (18) but with a double margin.

Table 1. Characteristics of the selected datasets

Dataset Patterns Classes Features

Table 2. A comparison of the best accuracy rates using the RBF

Dataset CV 1-v-1 (C,γ ) 1-v-r (C,γ ) Tri-class (C,γ )

Iris 30 96.73 (20 , 23 ) 96.00 (26 , 24 ) 95.49 (28 , 2−2 )

The criteria used to estimate the generalized accuracy is a ten-fold cross-validation

7. Conclusions and Future Work

in accordance with the approach presented in [13]. However, it is still necessary to

You might also like

fw (x) = φ(x), w + b = x, w + b (5)

generate a partition in Z, and nk = #Zk , hence n = n1 + n2 + · · · + n . If Ik is deﬁned