Information 12 00515
Information 12 00515
Article
A Multiclass Nonparallel Parametric-Margin Support
Vector Machine
Shu-Wang Du 1 , Ming-Chuan Zhang 2 , Pei Chen 2 , Hui-Feng Sun 2 , Wei-Jie Chen 1,2, * and Yuan-Hai Shao 3
Abstract: The twin parametric-margin support vector machine (TPMSVM) is an excellent kernel-based
nonparallel classifier. However, TPMSVM was originally designed for binary classification, which
is unsuitable for real-world multiclass applications. Therefore, this paper extends TPMSVM for
multiclass classification and proposes a novel K multiclass nonparallel parametric-margin support
vector machine (MNP-KSVC). Specifically, our MNP-KSVC enjoys the following characteristics. (1)
Under the “one-versus-one-versus-rest” multiclass framework, MNP-KSVC encodes the complicated
multiclass learning task into a series of subproblems with the ternary output {−1, 0, +1}. In contrast
to the “one-versus-one” or “one-versus-rest” strategy, each subproblem not only focuses on separating
the two selected class instances but also considers the side information of the remaining class
instances. (2) MNP-KSVC aims to find a pair of nonparallel parametric-margin hyperplanes for each
subproblem. As a result, these hyperplanes are closer to their corresponding class and at least one
distance away from the other class. At the same time, they attempt to bound the remaining class
Citation: Du, S.-W.; Zhang, M.-C.; instances into an insensitive region. (3) MNP-KSVC utilizes a hybrid classification and regression
Chen, P.; Sun, H.-F.; Chen, W.-J.; Shao, loss joined with the regularization to formulate its optimization model. Then, the optimal solutions
Y.-H. A Multiclass Nonparallel are derived from the corresponding dual problems. Finally, we conduct numerical experiments to
Parametric-Margin Support Vector compare the proposed method with four state-of-the-art multiclass models: Multi-SVM, MBSVM,
Machine. Information 2021, 12, 515. MTPMSVM, and Twin-KSVC. Experimental results demonstrate the feasibility and effectiveness of
https://doi.org/10.3390/info12120515 MNP-KSVC in terms of multiclass accuracy and learning time.
EEG classification [7,8], pathological diagnosis [9,10], and bioinformatics [8] to power
applications [11].
One limitation in the classical SVM is the strictly parallel requirements for support
vector hyperplanes. Namely, parallel hyperplanes are challenging to use to capture a data
structure with a cross-plane distribution [12,13], such as “XOR” problems. To alleviate
the above issue, the nonparallel SVM models have been proposed in the literature [13–17]
during the past years. This approach relaxes the parallel requirement in SVM and seeks
nonparallel hyperplanes for different classes. The pioneering work is the generalized
eigenvalue proximal SVM (GEPSVM) proposed by Mangasarian and Wild [12], which
attempts to find a pair of nonparallel hyperplanes via solving eigenvalue problems
(EPs). Subsequently, Jayadeva et al. [13] proposed a novel QPP-type nonparallel model
for classification, named TWSVM. The idea of TWSVM is to generate two nonparallel
hyperplanes such that each hyperplane is closer to one of the two classes and is at least one
apart from the other class. Compared with the classical SVM, the nonparallel SVM models
(GEPSVM and TWSVM) have lower computational complexity and better generalization
ability. Therefore, in the last few years, they have been studied extensively and developed
rapidly, including a least squares version of TWSVM (LSTSVM) [16], structural risk
minimization version of TWSVM (TBSVM) [17], ν-PTSVM [18], nonparallel SVM (NPSVM)
[19,20], nonparallel projection SVM (NPrSVM) [21], and so on [21–28].
The above nonparallel models were mainly proposed for binary classification problems.
However, most real-world applications [29–32] are related to multiclass classifications such
as disease diagnosis, fault detection, image recognition, and text categorization. Therefore,
many researchers are interested in extending SVM models from binary to multiclass
classification. Generally, the decomposition procedure has been considered to be an
effective way to achieve multiclass extensions. Yang et al. [33] proposed a multiple birth
SVM (MBSVM) for multiclass classification based on the “one-versus-rest” strategy, which
is the first multiclass extension of the nonparallel SVM model. Angulo et al. [34] proposed
an “one-versus-one-versus-rest” multiclass framework. In contrast to the “one-versus-one”
strategy, it constructs k(k − 1)/2 binary classifiers with all data points, which can avoid
the risk of information loss and class distortion problems. Following this framework,
Xu et al. [35] proposed a multiclass extension of TWSVM, termed Twin-KSVC. Results
show that Twin-KSVC has a better generalization ability than MBSVM in most cases.
Nasiri et al. [36] formulated Twin-KSVC in the least-squared sense to boost learning
efficiency and further presented the LST-KSVC model. Lima et al. [32] proposed an
improvement on LST-KSVC (ILST-KSVC) with regularization to implement the structural
risk minimization principle.
As a successful extension of SVM, the twin parametric-margin support vector machine
(TPMSVM) [15] was proposed to pursue a pair of parametric-margin nonparallel hyperplanes.
Unlike GEPSVM and TWSVM, each hyperplane in TPMSVM aims to be closer to its
class and far away from the other class. The parametric-margin mechanism enables
TPMSVM to be suitable for many cases and results in better generalization performance.
However, TPMSVM can only deal with binary classification learning tasks. The above
motivates us to propose a novel K multiclass nonparallel parametric-margin support vector
machine, termed MNP-KSVC. The proposed MNP-KSVC is endowed with the following
attractive advantages:
• The proposed MNP-KSVC encodes the K multiclass learning task into a series of
“one-versus-one-versus-rest” subproblems with all the training instances. Then, it
encodes the outputs of subproblems with the ternary output {−1, 0, +1}, which helps
to deal with imbalanced cases.
• For each subproblem, MNP-KSVC aims to find a pair of nonparallel parametric-margin
to separate the two selected classes together with the remaining classes. Unlike
TMPSVM, each parametric-margin hyperplane is closer to its class and at least at a
distance away from the other class, meanwhile mapping the remaining instances into
a region.
Information 2021, 12, 515 3 of 19
• To implement the empirical risks, MNP-KSVC considers the hybrid classification and
regression loss. The Hinge loss is utilized to penalize the errors of the focused two
class instances and the ε-intensive loss for the remaining class instances.
• Extensive numerical experiments are performed on several multiclass UCI benchmark
datasets, and their results are compared with four models (Multi-SVM, MBSVM,
MTPMSVM, and Twin-KSVC). The comparative results indicate the effectiveness and
feasibility of the proposed MNP-KSVC for multiclass classification.
The remainder of this paper is organized as follows. Section 2 briefly introduces
notations and related works. Section 3 proposes our MNP-KSVC. The model optimization
is also discussed in Section 3. The nonlinear version of MNP-KSVC is extended in
Section 4. Experimental results are described in Sections 5 and 6 presents a discussion and
future work.
2. Preliminaries
In this section, we first describe the notations used throughout the paper. Then, we
briefly revisit the nonparallel classifier PTSVM and its variants.
2.1. Notations
In this paper, scalars are denoted by lowercase italic letters, vectors by lowercase
bold face letters, and matrices by uppercase letters. All vectors are column vectors unless
transformed to row vectors by a prime superscript (·)0 . A vector of zeros of arbitrary
dimensions is represented by 0. In addition, we denote e as a vector of ones and I as an
identity matrix of arbitrary dimensions. Moreover, let k · k stand for the L2 -norm.
2.2. TPMSVM
The TPMSVM [15] is originally proposed for binary classification with a heteroscedastic
noise structure. It attempts to seek two nonparallel parametric-margin hyperplanes via the
following QPPs:
1
min kw1 k2 + ν1 ∑ ξ 1i + c1 ∑ η1j ,
w1 ,b1 2 i ∈I j∈I
+ −
(1)
s.t. w10 xi + b1 ≥ −ξ 1i , ξ 1i ≥ 0,
w10 x j + b1 = η1j ,
and
1
min kw2 k2 + ν2 ∑ ξ 2i − c2 ∑ η2j ,
w2 ,b2 2 i ∈I j∈I
− +
(2)
s.t. w20 xi + b2 ≤ ξ 1i , ξ 2i ≥ 0,
w20 x j + b2 = η2j ,
where ν1 , ν2 , c1 , c2 are positive parameters, and the decision hyperplane is half of the sum
of the two hyperplanes. To obtain the solutions of problems (1) and (2), one needs to resort
to the dual problems
1
min
α1
∑
2 i∈I ∑ α21i xi0 x j − ν1 ∑ ∑ α1i xi0 x j ,
+ j ∈I+ i ∈I+ j∈I−
s.t. ∑ α1i = ν1 , (3)
i ∈I+
0 ≤ α1i ≤ c1
Information 2021, 12, 515 4 of 19
and
1
min
α2
∑
2 i∈I ∑ α22i xi0 x j − ν2 ∑ ∑ α2i xi0 x j ,
− j∈I− i ∈I− j∈I+
s.t. ∑ α2i = ν2 , (4)
i ∈I−
0 ≤ α2i ≤ c2 .
Then, the solutions (w1 , b1 ) and (w2 , b2 ) of dual problems (3) and (4) can be calculated
according to the Karush–Kuhn–Tucker (KKT) conditions [3],
where ISV1 and ISV2 are indices of the support vector set.
Note that TPMSVM can capture more complex heteroscedastic error structures via
parametric-margin hyperplanes compared with TWSVM [13]. However, the variable b
in TMPSVM is not strictly convex, leading to a lack of a unique solution. Moreover,
the optimization problems (1) and (2) are only designed for binary classification tasks, and
thus are unsuitable for many real-world multiclass learning applications.
such that each one is approximate to the corresponding class while as far as possible from
the other class on one side. Moreover, the remaining classes are restricted in a region
between these hyperplanes.
Formally, we use Ik to express the set of indices for instances belonging to the k label
in the subproblem, where k is in {−1, 0, 1}. Inspired by the TPMSVM [15], our MNP-KSVC
considers the following two loss functions for the above ternary output learning problem:
∑ ∑
emp
R1 =ν1 (w10 x j + b1 ) + c1 max(0, 1 − (w10 xi + b1 ))
j∈I− i ∈I+
(8)
+ c2 ∑ max(0, ε − 1 + (w10 xl + b1 ))
l ∈I0
and
∑ ∑
emp
R2 = − ν2 (w20 xi + b2 ) + c3 max(0, 1 + (w20 x j + b2 ))
i ∈I− j∈I+
(9)
+ c4 ∑ max(0, ε − 1 − (w20 xl + b2 )),
l ∈I0
Information 2021, 12, 515 5 of 19
and
1
min (kw2 k2 + b22 ) − ν2 ∑ (w20 xi + b2 ) + c2 ∑ ξ 2i + c4 ∑ η2l
w2 ,ξ 2 ,η2 2 i ∈I j∈I l ∈I
+ − 0
(11)
s.t. − (w20 x j + b2 ) ≥ 1 − ξ 2j , ξ 2j ≥ 0,
(w20 xl + b2 ) ≥ ε − 1 − η2l , η2l ≥ 0,
1
min ku k2 + ν1 e0− Bu1 + c1 e0+ ξ 1 + c3 e00 η1
u1 ,ξ 1 ,η1 2 1
(12)
s.t. Au1 ≥ e+ − ξ 1 , ξ 1 ≥ 0,
− Cu1 ≥ (ε − 1)e0 − η1 , η1 ≥ 0
and
1
min ku2 k2 − ν2 e0+ Au2 + c2 e0− ξ 2 + c4 e00 η2
u2 ,ξ 2 ,η2 2
(13)
s.t. − Bu2 ≥ e− − ξ 2 , ξ 2 ≥ 0,
Cu2 ≥ (ε − 1)e0 − η2 , η2 ≥ 0.
AA0 − AC 0
1 0 0 α1
min [ α1 β 1 ]
α1 ,β 1 2 −CA0 CC 0 β1
α1 (14)
− ν1 e0− [ BA0 − BC 0 ] + [e0+ (ε − 1)e00 ]
β1
s.t. 0 ≤ α1 ≤ c1 e+ , 0 ≤ β 1 ≤ c3 e0
and
BB0 − BC 0
1 0 0 α2
min [ α2 β 2 ]
α2 ,β 2 2 −CB0 CC 0 β2
α2 (15)
− ν2 e0+ [ AB0 − AC 0 ] + [e0− (ε − 1)e00 ]
β2
s.t. 0 ≤ α2 ≤ c3 e− , 0 ≤ β 2 ≤ c4 e0
are the dual problems of (12) and (13), respectively.
Proof of Theorem 1. Taking problem (12) as an example, firstly, we construct its Lagrangian
function as
1
L(Ξ1 ) = ku k2 + ν1 e0− Bu1 + c1 e0+ ξ 1 + c3 e00 η1
2 1
− α10 ( Au1 + ξ 1 − e+ ) − β01 (−Cu1 + η1 − (ε − 1)e0 ) (16)
− ϕ01 ξ 1 − γ1 η1 ,
∇Lu1 = u1 + ν1 B0 e− − A0 α1 + C 0 β 1 = 0, (17)
∇Lξ 1 = c1 e+ − α1 − ϕ1 = 0, (18)
∇Lη1 = c3 e0 − β 1 − γ1 = 0, (19)
α10 ( Au1 + ξ 1 − e+ ) = 0, (20)
β01 (−Cu1 + η1 − (ε − 1)e0 ) = 0, (21)
ϕ01 ξ 1 = 0, (22)
γ1 η1 = 0. (23)
u1 = A0 α1 − C 0 β 1 − ν1 B0 e− . (24)
Since ϕ1 , γ1 ≥ 0, from (18) and (19), we derive
0 ≤ α1 ≤ c1 e+ and 0 ≤ β 1 ≤ c3 e0 . (25)
Information 2021, 12, 515 7 of 19
Finally, substituting (24) into the Lagrangian function (16) and using KKT conditions
(17)–(23), the dual problem of (12) can be formulated as
AA0 − AC 0
1 0 0 α1
min [ α1 β 1 ]
α1 ,β 1 2 −CA0 CC 0 β1
α1 (26)
− ν1 e0− [ BA0 − BC 0
] + [e0+ (ε − 1)e00 ]
β1
s.t. 0 ≤ α1 ≤ c1 e+ , 0 ≤ β 1 ≤ c3 e0 .
1 0
min q H q − d1 q1
q1 2 1 1 1 (27)
s.t. 0 ≤ q1 ≤ eq1
where
AA0 − AC 0
H1 = N1 N10 = (28)
−CA0 CC 0
and
d1 = ν1 e0− BN10 + h10 = ν1 e0− [ BA0 − BC 0 ] + [e0+ (ε − 1)e00 ] (29)
Similarly, define q2 = [α2 ; β 2 ], N2 = [− B; C ], h2 = [e− ; (ε − 1)e0 ], and eq2 = [c2 e− ; c4 e0 ]
for problem (15). Then, it can be reformulated as
1 0
min q H q − d2 q2
q2 2 2 2 2 (30)
s.t. 0 ≤ q2 ≤ eq2
where
BB0 − BC 0
H2 = N2 N20 = (31)
−CB0 CC 0
and
d2 = ν2 e0+ AN20 + h20 = ν2 e0+ [ AB0 − AC 0 ] + [e0− (ε − 1)e00 ] (32)
After solving dual problems (27) and (30) by the standard QPP solver, we can obtain the solutions
to primal problems (12) and (13) by Proposition 1 according to KKT conditions without proof.
Proposition 1. Suppose that q1 = [α1 ; β 1 ] and q2 = [α2 ; β 2 ] are solutions to dual problems (27) and (30),
respectively. Then, solutions u1 and u2 to primal problems (12) and (13) can be formulated by
u1 = N10 q1 − ν1 B0 e− = A0 α1 − C 0 β 1 − ν1 B0 e− (33)
and
u2 = − N20 q2 + ν2 A0 e+ = − B0 α2 + C 0 β 2 + ν2 A0 e+ (34)
an example, firstly, we reformulate the parametric-margin hyperplanes (7) w.r.t. u = [w; b] and
x̃ = [ x 1] as
f 1 ( x) = w10 x + b1 = u10 x̃ and f 2 ( x) = w20 x + b2 = u20 x̃, (35)
If x is located above the “+” hyperplane f 1 ( x) > 1 − ε—i.e., satisfying the condition u10 x̃ >
1 − ε—we vote it to the k i class. On the other hand, if x is located below the “-” hyperplane
f 2 ( x) < −1 + ε, we vote it to the k j class. Otherwise, it belongs to the remaining classes. In summary,
the decision function for the (k1 , k2 )-pair classifier can be expressed as
+1 f 1 ( x) > 1 − ε,
g( x) = −1 f 2 ( x) < −1 + ε, (36)
0 otherwise.
Finally, the given unseen instance x is assigned to the class label that gets the most votes.
In summary, the whole procedure of MNP-KSVC is established in Algorithm 1 with the Figure 1.
and
k2 -class : f 2 ( x) = w20 x + b2 = −q20 N2 x + ν2 e0+ Ax (38)
9: end for
10: end for
Predicting Procedure:
11: For an unseen instance x, assign it to class y via the following vote strategy.
12: Initialize the vote vector vote = 0 for K classes.
13: for k i in range(1, · · · , K ) do
14: for k j in range(k i + 1, · · · , K ) do
15: Compute the decision function g( x) for the (k1 , k2 )-pair classifier w.r.t. (36).
16: if g( x) == 1 then
17: Update vote(k i ) + = 1
18: else if g( x) == −1 then
19: Update vote(k j ) + = 1
20: end if
21: end for
22: end for
23: Finally, assign the most votes of class to x via
Figure 1. The flowchart for the training and predicting procedures of the proposed MNP-KSVC model.
Define Aφ = {φ( x̃i )}i∈I+ , Bφ = {φ( x̃ j )} j∈I− , and C φ = {φ( x̃l )} j∈I0 . Then, the nonlinear
MNP-KSVC model optimizes the following two primal QPPs:
1
min ku k2 + ν1 e0− Bφ u1 + c1 e0+ ξ 1 + c3 e00 η1
u1 ,ξ 1 ,η1 2 1
(40)
s.t. Aφ u1 ≥ e+ − ξ 1 , ξ 1 ≥ 0,
− Cφ u1 ≥ (ε − 1)e0 − η1 , η1 ≥ 0
and
1
min ku2 k2 − ν2 e0+ Aφ u2 + c2 e0− ξ 2 + c4 e00 η2
u2 ,ξ 2 ,η2 2
(41)
s.t. − Bφ u2 ≥ e− − ξ 2 , ξ 2 ≥ 0,
Cφ u2 ≥ (ε − 1)e0 − η2 , η2 ≥ 0.
Definition 1. Suppose that K (·, ·) is an appropriate kernel function; then, the kernel operation in
matrix form is defined as
K ( A, B) = Aφ , Bφ , (42)
whose ij-th element can be computed by
K ( A, B)ij = φ( xi )0 φ( x j ) = K ( xi , x j ). (43)
Then, we can derive the dual problems of (40) and (41) by Theorem 2.
and
1 0 0 K ( B, B) −K ( B, C ) α2
min [ α2 β 2 ]
α2 ,β 2 2 −K (C, B) K (C, C ) β2
α2 (45)
− ν2 e0+ [K ( A, B) − K ( A, C )] + [e0− (ε − 1)e00 ]
β2
s.t. 0 ≤ α2 ≤ c3 e− , 0 ≤ β 2 ≤ c4 e0
are the dual problems of (40) and (41), respectively.
1
L(Ξ1 ) = ku k2 + ν1 e0− Bφ u1 + c1 e0+ ξ 1 + c3 e00 η1
2 1
− α10 ( Aφ u1 + ξ 1 − e+ ) − β01 (−Cφ u1 + η1 − (ε − 1)e0 ) (46)
− ϕ01 ξ 1 − γ1 η1 ,
Information 2021, 12, 515 11 of 19
0 ≤ α1 ≤ c1 e+ and 0 ≤ β 1 ≤ c3 e0 . (55)
Finally, substituting (54) into the Lagrangian function (16) and using KKT conditions
(47)–(53), the dual problem of (40) can be formulated as (44). Similarly, we can derive the
dual problem of (41) as problem (45).
The procedure of the nonlinear MNP-KSVC is similar to that of the linear one, but with
the following minor modifications in Algorithm 1:
• In contrast to some existing nonparallel SVMs, we do not need to consider the extra
kernel-generated technique since only inner products appear in dual problems (14)
and (15) of the linear MNP-KSVC. These dual formulations enable MNP-KSVC to
behave consistently in linear and nonlinear cases. Thus, taking appropriate kernel
functions instead of inner products in the Hessian matrix of dual problem (14) and
(15), i.e.,
K ( A, A) −K ( A, C ) K ( B, B) −K ( B, C )
Hφ1 = and Hφ2 = , (56)
−K (C, A) K (C, C ) −K (C, B) K (C, C )
we can obtain the dual formation of the nonlinear MNP-KSVC in (44) and (45).
• Once we obtain the solution q1 = [α1 ; β 1 ] and q2 = [α2 ; β 2 ] to problems (44) and (45)
respectively, the corresponding primal solutions u1 and u2 in feature space can be
formulated by
and
• For an unseen instance x, construct the following decision functions for (i, j)-pair
nonlinear MNP-KSVC classifier as
+1 f φ1 ( x̃) > 1 − ε,
gφ ( x) = −1 f φ2 ( x̃) < −1 + ε, (59)
0 otherwise,
Information 2021, 12, 515 12 of 19
f φ1 ( x̃)) = K (u1 , x̃) = α10 K ( A, x̃) − β01 K (C, x̃) − ν1 e0− K ( B, x̃) (60)
and
f φ2 ( x̃)) = K (u2 , x̃) = −α20 K ( B, x̃) + β02 K (C, x̃) + ν2 e0+ K ( A, x̃) (61)
5. Numerical Experiments
5.1. Experimental Setting
To demonstrate the validity of MNP-KSVC, we perform extensive experiments on
several benchmark datasets that are commonly used for testing machine learning algorithms.
In experiments, we focus on comparing MNP-KSVC and four state-of-the-art multiclass
models—Multi-SVM, MBSVM, MTPMSVM, and Twin-KSVC—detailed as follows:
• Multi-SVM [38]: The idea is similar to the “one-versus-all” SVM [3]. However, it
generates K binary SVM classifiers by solving the one large dual QPP. That is, the k-th
classifier is trained with the k-th class instances encoded with positive labels and the
remaining class instances with negative labels. Then, the label of an unseen instance
is assigned by the “voting” scheme. The penalty parameter for each classifier in
Multi-SVM is c.
• MBSVM [33]: It is the multiclass extension of the binary TWSVM, which is based
on the “one-versus-all” strategy. MBSVM aims to find K nonparallel hyperplanes
by solving K QPPs simultaneously. Specifically, the k-th class instances are as far
away as the k-th hyperplane while the remaining instances are proximal to the k-th
hyperplane. An unseen instance is assigned to the label depending on to which of
the K hyperplanes it lies farthest away. The penalty parameter for each classifier in
MBSVM is c.
• MTPMSVM: Inspired by MBSVM [33], we use the “one-versus-all” strategy to implement
the multiclass version of TPMSVM [15] as a baseline. In contrast to MBSVM, it aims
to find K parametric-margin hyperplanes, such that each hyperplane is closer to its
corresponding class instances and as far away from the remaining class instances.
The penalty parameters for each classifier in MTPMSVM are (ν, c).
• Twin-KSVC [35]: It is another novel multiclass extension of TWSVM. Twin-KSVC
evaluates all the training instances in a “one-versus-one-versus-rest” structure with
the ternary output {−1, 0, +1}. It aims to find a pair of nonparallel hyperplanes for
each of two kinds of samples selected from K classes. The remaining class instances
are mapped into a region within these two nonparallel hyperplanes. The penalty
parameters for each classifier in Twin-KSVC are (c1 , c2 , c3 , c4 , ε).
All methods are implemented by MATLAB on a PC with an i7 Intel Core processor
with 32 GB RAM. The quadratic programming problems (QPPs) of all the classifiers are
solved by the “quadprog” function in MATLAB. Now, we describe the setting of our
experiments:
• Similar to [35,38], we use multiclass accuracy to measure each classifier, defined as
K
1
accuracy =
N ∑ ∑ I ( ĝ( x) = k) (62)
k =1 x∈Ik
where N is the scalar of the dataset, K is the total classes, ĝ( x ) is the prediction of the
classifier, and I (·) is an indicator function, which returns 1 if the class matches and 0
otherwise. Moreover, we adopt training time to represent the learning efficiency.
• To reduce the complexity of parameter selection for multiclass classifiers, we use
the same parameter setting for each learning subproblem. Specifically, we set the
same for all classifiers c in MBSVM and MBSVM, ν, c in MTPSVM, c1 = c2 , c3 = c4
Information 2021, 12, 515 13 of 19
Table 1. Statistics for benchmark datasets used in experiments. # denotes the corresponding quantity.
Table 2 and 3 contain a summary of learning results for the proposed MNP-KSVC
model with other compared methods using linear and nonlinear kernels, respectively.
The results on 10 benchmark datasets include the mean and standard of the testing
multiclass accuracy (%), whose best performance is highlighted in bold. The comparison
results reveal the following:
• MNP-KSVC yields better performance than other classifiers in terms of accuracy on
almost all datasets. This confirms the efficacy of the proposed MNP-KSVC on the
multiclass learning tasks.
• Nonparallel based models (MBSVM, MTPMSVM, Twin-KSVC, and MNP-KSVC)
outperform the traditional SVM model. The reason is that SVM simply utilizes the
parallel hyperplanes to learn the decision function, leading to less capability to capture
the underlying multiclass distributions.
• MNP-KSVC has a better generalization ability than MBSVM and Twin-KSVC in most
cases. For instance, MNP-KSVC obtains a higher accuracy (84.59%) than MBSVM
(80.82%) and Twin-KSVC (80.12%) on the Ecoli dataset in the linear case. Similar
results can be obtained from the other datasets. Since MBSVM and Twin-KSVC simply
Information 2021, 12, 515 14 of 19
implement empirical risk minimization, they are easy to overfit. The regularization
terms are considered in our MNP-KSVC, which regulates the model complexity to
avoid overfitting.
• MTPMSVM is another multiclass extension, which is based on the “one-versus-rest”
strategy. With the help of the “hybrid classification and regression” learning paradigm,
our MNP-KSVC can learn more multiclass discriminate information.
• Furthermore, we count the number of Superior/Inferior (W/L) instances to the
compared classifier on all datasets for both linear and nonlinear cases, listed at the
bottom of Tables 2 and 3. The results indicate that MNP-KSVC achieves the best
results against others in terms of both W/L and average accuracy.
Table 2. Performance comparison on benchmark datasets for linear classifiers, in terms of mean ± std
of the testing multiclass accuracy (%). The Win/Loss (W/L) denotes the number of datasets for
which MNP-KSVC is superior/inferior to the compared classifiers. Ave. Acc and rank denote each
classifier’s average accuracy and rank over all datasets.
To provide more statistical evidence [39–41], we have further performed the non-
parametric Friedman test to check whether there are significant differences between
MNP-KSVC and other compared classifiers. The bottom lines of Tables 2 and 3 list the
average ranks of each classifier calculated by the Friedman test according to multiclass
accuracy. Results show that MNP-KSVC has the first rank in linear and nonlinear cases,
followed by NPSVM and RPTSVM successively. Now, we calculate the X F2 value for the
Friedman test as
" #
k 2
12 × N k ( k + 1 )
k(k + 1) i∑
X F2 = ri2 − (63)
=1
4
where N is the number of datasets, k is the number of classifiers, and ri is the average rank
on N datasets for the i-th model. For the linear case, we compute term ∑ik=1 ri2 in Table 2 as
k
∑ ri2 = 4.62 + 2.92 + 3.12 + 2.72 + 1.72
(64)
i =1
≈ 49.1989
Information 2021, 12, 515 15 of 19
Table 3. Performance comparison on benchmark datasets for nonlinear classifiers, in terms of the
mean ± std of the testing multiclass accuracy (%). The Win/Loss (W/L) denotes the number of
datasets for which MNP-KSVC is superior/inferior to the compared classifiers. Ave. Acc and rank
denote each classifier’s average accuracy and rank over all datasets.
5(5+1)2
h i
12×10
X F2 = 5(5+1)
49.1989 − 4 ≈ 16.7956 (65)
( N − 1) × X F2 9 × 16.7956
FF = 2
= ≈ 6.5143 (66)
N ( k − 1) − X F 10 × 4 − 16.7956
Moreover, we compute the p-value, which rejects the null hypothesis at the level
of significance α = 0.05. Similarly, we calculate the statistic for the nonlinear case, as
summarized in Table 4. The results reject the null hypothesis for both linear and nonlinear
cases and reveal the existence of significant differences in the performances of classifiers.
Furthermore, we record the average learning time of each classifier for the above
UCI datasets experiments, as shown in Figures 2 and 3. The results show that our
MNP-KSVC is faster than Multi-SVM and Twin-KSVC, while slightly slower than MBSVM
and MTPMSVM for linear and nonlinear cases. Multi-SVM performs the slowest of all
classifiers because Multi-SVM needs to solve larger problems than the nonparallel-based
classifiers. Moreover, the Hessian matrix of dual QPPs in MNP-KSVC avoids the time-costly
matrix inversion, leading to greater effectiveness than Twin-KSVC. Overall, the above
results confirm the feasibility of Twin-KSVC.
Information 2021, 12, 515 16 of 19
5.286
2
2.056
1.735
1
1.370
1.167
1.022
5
0.577
0.471
Time (s)
2
0.244
0.234
0.1
0.105
0.089
5
0.068
0.066
0.059
0.051
0.051
0.039
2
0.026
0.024
0.01
0.017
0.017
0.015
0.015
5
2
0.001
Balance Ecoli Iris Glass Wine
Datasets
Multi-SVM MBSVM MTPMSVM Twin-KSVC MNP-KSVC
10
5
5.338
4.774
4.336
2
2.876
2.015
1.839
1
1.142
1.017
0.841
5
0.772
0.591
0.479
0.429
Time (s)
0.409
0.386
0.351
0.336
0.325
2
0.247
0.179
0.1
0.146
0.098
5
0.052
0.034
2
0.019
0.01
5
2
0.001
Thyroid Dermatology Shuttle Contraceptive Pen Based
Datasets
5
6.805
3.945
3.604
3.042
2
2.633
2.49
2.037
1.757
1.617
1.606
1.528
1
1.426
1.385
1.308
1.098
1.066
1.016
Time (s)
0.872
0.782
0.753
2
0.225
0.174
0.153
0.1
0.115
0.01
Balance Ecoli Iris Glass Wine
Datasets
Multi-SVM MBSVM MTPMSVM Twin-KSVC MNP-KSVC
100
5
37.476
2
25.904
21.384
10
13.742
11.844
10.466
9.829
8.434
8.334
7.827
7.805
5
6.493
6.269
5.497
5.032
4.185
Time (s)
3.489
2
2.333
1.952
1
1.669
1.354
5
0.638
0.446
0.381
0.316
2
0.1
5
2
0.01
Thyroid Pen Based Dermatology Shuttle Contraceptive
Datasets
Author Contributions: Conceptualization, S.-W.D. and W.-J.C.; Funding acquisition, W.-J.C. and Y.-H.S.;
Investigation, M.-C.Z. and P.C.; Methodology, S.-W.D., M.-C.Z. and Y.-H.S.; Project administration,
S.-W.D. and W.-J.C.; Supervision, W.-J.C. and Y.-H.S.; Validation, M.-C.Z. and H.-F.S.; Visualization, P.C.
and H.-F.S.; Writing—original draft, S.-W.D., P.C. and H.-F.S.; Writing—review and editing, S.-W.D. and
W.-J.C. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China, Grant
number: 61603338, 11871183, 61866010, and 11426202; the Natural Science Foundation of Zhejiang
Province of China, Grant number: LY21F030013; the Natural Science Foundation of Hainan Province
of China, Grant number: 120RC449; the Scientific Research Foundation of Hainan University, Grant
number: kyqd(sk)1804.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data sharing not applicable. No new data were created or analyzed in
this study.
Conflicts of Interest: The authors declare no conflict of interest.
Information 2021, 12, 515 18 of 19
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques; Morgan Kaufmann: San Francisco, CA, USA, 2012.
2. Vapnik, V. Statistical Learning Theory; Wiley: New York, NY, USA, 1998.
3. Deng, N.; Tian, Y.; Zhang, C. Support Vector Machines: Theory, Algorithms and Extensions; CRC Press: Philadelphia, PA, USA, 2013.
4. Sitaula, C.; Aryal, S.; Xiang, Y.; Basnet, A.; Lu, X. Content and context features for scene image representation. Knowl.-Based Syst.
2021, 232, 107470. [CrossRef]
5. Ma, S.; Cheng, B.; Shang, Z.; Liu, G. Scattering transform and LSPTSVM based fault diagnosis of rotating machinery. Mech. Syst.
Signal Process. 2018, 104, 155–170. [CrossRef]
6. Liu, T.; Yan, D.; Wang, R.; Yan, N.; Chen, G. Identification of Fake Stereo Audio Using SVM and CNN. Information 2021, 12, 263.
[CrossRef]
7. You, S.D. Classification of Relaxation and Concentration Mental States with EEG. Information 2021, 12, 187. [CrossRef]
8. Kang, J.; Han, X.; Song, J.; Niu, Z.; Li, X. The identification of children with autism spectrum disorder by SVM approach on EEG
and eye-tracking data. Comput. Biol. Med. 2020, 120, 103722. [CrossRef]
9. Lazcano, R.; Salvador, R.; Marrero-Martin, M.; Leporati, F.; Juarez, E.; Callico, G.M.; Sanz, C.; Madronal, D.; Florimbi, G.;
Sancho, J.; et al. Parallel Implementations Assessment of a Spatial-Spectral Classifier for Hyperspectral Clinical Applications.
IEEE Access 2019, 7, 152316–152333. [CrossRef]
10. Fabelo, H.; Ortega, S.; Szolna, A.; Bulters, D.; Pineiro, J.F.; Kabwama, S.; J-O’Shanahan, A.; Bulstrode, H.; Bisshopp, S.;
Kiran, B.R.; et al. In-Vivo Hyperspectral Human Brain Image Database for Brain Cancer Detection. IEEE Access 2019,
7, 39098–39116. [CrossRef]
11. Roy, S.D.; Debbarma, S. A novel OC-SVM based ensemble learning framework for attack detection in AGC loop of power
systems. Electr. Power Syst. Res. 2022, 202, 107625. [CrossRef]
12. Mangasarian, O.L.; Wild, E.W. Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE
Trans. Pattern Anal. Mach. Intell. 2006, 28, 69–74. [CrossRef] [PubMed]
13. Jayadeva.; Khemchandani, R.; Chandra, S. Twin Support Vector Machines for Pattern Classification. IEEE Trans. Pattern Anal.
Mach. Intell. 2007, 29, 905–910. [CrossRef] [PubMed]
14. Ding, S.; Hua, X. An overview on nonparallel hyperplane support vector machine algorithms. Neural Comput. Appl. 2014,
25, 975–982. [CrossRef]
15. Peng, X. TPMSVM: A novel twin parametric-margin support vector machine for pattern recognition. Pattern Recogn. 2011,
44, 2678–2692. [CrossRef]
16. Arun Kumar, M.; Gopal, M. Least squares twin support vector machines for pattern classification. Expert Syst. Appl. 2009,
36, 7535–7543. [CrossRef]
17. Shao, Y.; Zhang, C.; Wang, X.; Deng, N. Improvements on Twin Support Vector Machines. IEEE Trans. Neural Netw. 2011,
22, 962–968. [CrossRef]
18. Chen, W.J.; Shao, Y.H.; Li, C.N.; Liu, M.Z.; Wang, Z.; Deng, N.Y. ν-projection twin support vector machine for pattern classification.
Neurocomputing 2020, 376, 10–24. [CrossRef]
19. Tian, Y.; Qi, Z.; Ju, X.; Shi, Y.; Liu, X. Nonparallel Support Vector Machines for Pattern Classification. IEEE Trans. Cybern. 2014,
44, 1067–1079. [CrossRef]
20. Tian, Y.; Ping, Y. Large-scale linear nonparallel support vector machine solver. Neural Netw. 2014, 50, 166–174. [CrossRef]
[PubMed]
21. Chen, W.; Shao, Y.; Li, C.; Wang, Y.; Liu, M.; Wang, Z. NPrSVM: Nonparallel sparse projection support vector machine with
efficient algorithm. Appl. Soft Comput. 2020, 90, 106142. [CrossRef]
22. Chen, W.; Shao, Y.; Li, C.; Deng, N. MLTSVM: A novel twin support vector machine to multi-label learning. Pattern Recogn. 2016,
52, 61–74. [CrossRef]
23. Bai, L.; Shao, Y.H.; Wang, Z.; Chen, W.J.; Deng, N.Y. Multiple Flat Projections for Cross-Manifold Clustering. IEEE Trans. Cybern.
2021, 1–15. Available online: https://ieeexplore.ieee.org/document/9343292 (accessed on 20 October 2021).
Information 2021, 12, 515 19 of 19
24. Hou, Q.; Liu, L.; Zhen, L.; Jing, L. A novel projection nonparallel support vector machine for pattern classification. Eng. Appl.
Artif. Intel. 2018, 75, 64–75. [CrossRef]
25. Liu, L.; Chu, M.; Gong, R.; Zhang, L. An Improved Nonparallel Support Vector Machine. IEEE Trans. Neur. Net. Lear. 2021,
32, 5129–5143. [CrossRef]
26. Chen, W.; Shao, Y.; Deng, N.; Feng, Z. Laplacian least squares twin support vector machine for semi-supervised classification.
Neurocomputing 2014, 145, 465–476. [CrossRef]
27. Shao, Y.; Chen, W.; Zhang, J.; Wang, Z.; Deng, N. An efficient weighted Lagrangian twin support vector machine for imbalanced
data classification. Pattern Recogn. 2014, 47, 3158–3167. [CrossRef]
28. Chen, W.; Shao, Y.; Ning, H. Laplacian smooth twin support vector machine for semi-supervised classification. Int. J. Mach. Learn.
Cyber. 2014, 5, 459–468. [CrossRef]
29. Gao, Z.; Fang, S.C.; Gao, X.; Luo, J.; Medhin, N. A novel kernel-free least squares twin support vector machine for fast and
accurate multi-class classification. Knowl.-Based Syst. 2021, 226, 107123. [CrossRef]
30. Ding, S.; Zhao, X.; Zhang, J.; Zhang, X.; Xue, Y. A review on multi-class TWSVM. Artif. Intell. Rev. 2017, 52, 775–801. [CrossRef]
31. Qiang, W.; Zhang, J.; Zhen, L.; Jing, L. Robust weighted linear loss twin multi-class support vector regression for large-scale
classification. Signal Process. 2020, 170, 107449. [CrossRef]
32. de Lima, M.D.; Costa, N.L.; Barbosa, R. Improvements on least squares twin multi-class classification support vector machine.
Neurocomputing 2018, 313, 196–205. [CrossRef]
33. Yang, Z.; Shao, Y.; Zhang, X. Multiple birth support vector machine for multi-class classification. Neural Comput. Appl. 2013,
22, 153–161. [CrossRef]
34. Angulo, C.; Parra, X.; Català, A. K-SVCR. A support vector machine for multi-class classification. Neurocomputing 2003, 55, 57–77.
[CrossRef]
35. Xu, Y.; Guo, R.; Wang, L. A Twin Multi-Class Classification Support Vector Machine. Cogn. Comput. 2013, 5, 580–588. [CrossRef]
36. Nasiri, J.A.; Moghadam Charkari, N.; Jalili, S. Least squares twin multi-class classification support vector machine. Pattern
Recogn. 2015, 48, 984–992. [CrossRef]
37. Mangasarian, O.L. Nonlinear Programming; SIAM Press: Philadelphia, PA, USA, 1993.
38. Tomar, D.; Agarwal, S. A comparison on multi-class classification methods based on least squares twin support vector machine.
Knowl.-Based Syst. 2015, 81, 131–147. [CrossRef]
39. Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 1937,
200, 675–701. [CrossRef]
40. Yu, Z.; Wang, Z.; You, J.; Zhang, J.; Liu, J.; Wong, H.S.; Han, G. A New Kind of Nonparametric Test for Statistical Comparison of
Multiple Classifiers Over Multiple Datasets. IEEE Trans. Cybern. 2017, 47, 4418–4431. [CrossRef] [PubMed]
41. Hatamlou, A. Black hole: A new heuristic optimization approach for data clustering. Inform. Sci. 2013, 222, 175–184. [CrossRef]
42. Chen, W.; Shao, Y.; Xu, D.; Fu, Y. Manifold proximal support vector machine for semi-supervised classification. Appl. Intell. 2014,
40, 623–638. [CrossRef]
43. Li, Y.; Sun, H.; Yan, W.; Cui, Q. R-CTSVM+: Robust capped L1-norm twin support vector machine with privileged information.
Inform. Sci. 2021, 574, 12–32. [CrossRef]