0% found this document useful (0 votes)
68 views19 pages

Information 12 00515

Uploaded by

jacasil293
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views19 pages

Information 12 00515

Uploaded by

jacasil293
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

information

Article
A Multiclass Nonparallel Parametric-Margin Support
Vector Machine
Shu-Wang Du 1 , Ming-Chuan Zhang 2 , Pei Chen 2 , Hui-Feng Sun 2 , Wei-Jie Chen 1,2, * and Yuan-Hai Shao 3

1 Zhijiang College, Zhejiang University of Technology, Shaoxing 312030, China; dusw@zjut.edu.cn


2 College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310024, China;
2112012210@zjut.edu.cn (M.-C.Z.); 2112112292@zjut.edu.cn (P.C.); 2112112244@zjut.edu.cn (H.-F.S.)
3 Management School, Hainan University, Haikou 570228, China; shaoyuanhai@hainanu.edu.cn
* Correspondence: wjchen@zjut.edu.cn

Abstract: The twin parametric-margin support vector machine (TPMSVM) is an excellent kernel-based
nonparallel classifier. However, TPMSVM was originally designed for binary classification, which
is unsuitable for real-world multiclass applications. Therefore, this paper extends TPMSVM for
multiclass classification and proposes a novel K multiclass nonparallel parametric-margin support
vector machine (MNP-KSVC). Specifically, our MNP-KSVC enjoys the following characteristics. (1)
Under the “one-versus-one-versus-rest” multiclass framework, MNP-KSVC encodes the complicated
multiclass learning task into a series of subproblems with the ternary output {−1, 0, +1}. In contrast
to the “one-versus-one” or “one-versus-rest” strategy, each subproblem not only focuses on separating
the two selected class instances but also considers the side information of the remaining class
instances. (2) MNP-KSVC aims to find a pair of nonparallel parametric-margin hyperplanes for each

subproblem. As a result, these hyperplanes are closer to their corresponding class and at least one
 distance away from the other class. At the same time, they attempt to bound the remaining class
Citation: Du, S.-W.; Zhang, M.-C.; instances into an insensitive region. (3) MNP-KSVC utilizes a hybrid classification and regression
Chen, P.; Sun, H.-F.; Chen, W.-J.; Shao, loss joined with the regularization to formulate its optimization model. Then, the optimal solutions
Y.-H. A Multiclass Nonparallel are derived from the corresponding dual problems. Finally, we conduct numerical experiments to
Parametric-Margin Support Vector compare the proposed method with four state-of-the-art multiclass models: Multi-SVM, MBSVM,
Machine. Information 2021, 12, 515. MTPMSVM, and Twin-KSVC. Experimental results demonstrate the feasibility and effectiveness of
https://doi.org/10.3390/info12120515 MNP-KSVC in terms of multiclass accuracy and learning time.

Academic Editor: Ognjen


Keywords: multiclass classification; twin support vector machine; nonparallel hyperplane; hybrid
Arandjelović
classification and regression; MNP-KSVC

Received: 15 November 2021


Accepted: 8 December 2021
Published: 10 December 2021
1. Introduction
Publisher’s Note: MDPI stays neutral Data mining has become an essential tool for integrating information technology
with regard to jurisdictional claims and industrialization due to the growing size of available databases [1]. One of the main
in published maps and institutional applications in data mining is supervised classification. This aims to assign a label to
affiliations. an unseen instance that is as correct as possible from a given set of classes based on the
learning model. Recently, support vector machine (SVM) [2,3] has been a preeminent
maximum-margin learning paradigm for data classification. The basic idea of SVM is
to find an optimal decision boundary via maximizing the margin between two parallel
Copyright: © 2021 by the authors. support hyperplanes. Compared with the neural network, SVM has the following attractive
Licensee MDPI, Basel, Switzerland. features [3]: (1) the structural risk minimization principle is implemented in SVM to control
This article is an open access article the upper bound of the generalization error, leading to an excellent generalization ability;
distributed under the terms and (2) the global optimum can be achieved by optimizing a quadratic programming problem
conditions of the Creative Commons (QPP). Furthermore, kernel techniques enable SVM to handle complicated nonlinear
Attribution (CC BY) license (https:// learning tasks effectively. During recent decades, SVM has been successfully applied
creativecommons.org/licenses/by/ in a wide variety of fields ranging from scene classification [4], fault diagnosis [5,6],
4.0/).

Information 2021, 12, 515. https://doi.org/10.3390/info12120515 https://www.mdpi.com/journal/information


Information 2021, 12, 515 2 of 19

EEG classification [7,8], pathological diagnosis [9,10], and bioinformatics [8] to power
applications [11].
One limitation in the classical SVM is the strictly parallel requirements for support
vector hyperplanes. Namely, parallel hyperplanes are challenging to use to capture a data
structure with a cross-plane distribution [12,13], such as “XOR” problems. To alleviate
the above issue, the nonparallel SVM models have been proposed in the literature [13–17]
during the past years. This approach relaxes the parallel requirement in SVM and seeks
nonparallel hyperplanes for different classes. The pioneering work is the generalized
eigenvalue proximal SVM (GEPSVM) proposed by Mangasarian and Wild [12], which
attempts to find a pair of nonparallel hyperplanes via solving eigenvalue problems
(EPs). Subsequently, Jayadeva et al. [13] proposed a novel QPP-type nonparallel model
for classification, named TWSVM. The idea of TWSVM is to generate two nonparallel
hyperplanes such that each hyperplane is closer to one of the two classes and is at least one
apart from the other class. Compared with the classical SVM, the nonparallel SVM models
(GEPSVM and TWSVM) have lower computational complexity and better generalization
ability. Therefore, in the last few years, they have been studied extensively and developed
rapidly, including a least squares version of TWSVM (LSTSVM) [16], structural risk
minimization version of TWSVM (TBSVM) [17], ν-PTSVM [18], nonparallel SVM (NPSVM)
[19,20], nonparallel projection SVM (NPrSVM) [21], and so on [21–28].
The above nonparallel models were mainly proposed for binary classification problems.
However, most real-world applications [29–32] are related to multiclass classifications such
as disease diagnosis, fault detection, image recognition, and text categorization. Therefore,
many researchers are interested in extending SVM models from binary to multiclass
classification. Generally, the decomposition procedure has been considered to be an
effective way to achieve multiclass extensions. Yang et al. [33] proposed a multiple birth
SVM (MBSVM) for multiclass classification based on the “one-versus-rest” strategy, which
is the first multiclass extension of the nonparallel SVM model. Angulo et al. [34] proposed
an “one-versus-one-versus-rest” multiclass framework. In contrast to the “one-versus-one”
strategy, it constructs k(k − 1)/2 binary classifiers with all data points, which can avoid
the risk of information loss and class distortion problems. Following this framework,
Xu et al. [35] proposed a multiclass extension of TWSVM, termed Twin-KSVC. Results
show that Twin-KSVC has a better generalization ability than MBSVM in most cases.
Nasiri et al. [36] formulated Twin-KSVC in the least-squared sense to boost learning
efficiency and further presented the LST-KSVC model. Lima et al. [32] proposed an
improvement on LST-KSVC (ILST-KSVC) with regularization to implement the structural
risk minimization principle.
As a successful extension of SVM, the twin parametric-margin support vector machine
(TPMSVM) [15] was proposed to pursue a pair of parametric-margin nonparallel hyperplanes.
Unlike GEPSVM and TWSVM, each hyperplane in TPMSVM aims to be closer to its
class and far away from the other class. The parametric-margin mechanism enables
TPMSVM to be suitable for many cases and results in better generalization performance.
However, TPMSVM can only deal with binary classification learning tasks. The above
motivates us to propose a novel K multiclass nonparallel parametric-margin support vector
machine, termed MNP-KSVC. The proposed MNP-KSVC is endowed with the following
attractive advantages:
• The proposed MNP-KSVC encodes the K multiclass learning task into a series of
“one-versus-one-versus-rest” subproblems with all the training instances. Then, it
encodes the outputs of subproblems with the ternary output {−1, 0, +1}, which helps
to deal with imbalanced cases.
• For each subproblem, MNP-KSVC aims to find a pair of nonparallel parametric-margin
to separate the two selected classes together with the remaining classes. Unlike
TMPSVM, each parametric-margin hyperplane is closer to its class and at least at a
distance away from the other class, meanwhile mapping the remaining instances into
a region.
Information 2021, 12, 515 3 of 19

• To implement the empirical risks, MNP-KSVC considers the hybrid classification and
regression loss. The Hinge loss is utilized to penalize the errors of the focused two
class instances and the ε-intensive loss for the remaining class instances.
• Extensive numerical experiments are performed on several multiclass UCI benchmark
datasets, and their results are compared with four models (Multi-SVM, MBSVM,
MTPMSVM, and Twin-KSVC). The comparative results indicate the effectiveness and
feasibility of the proposed MNP-KSVC for multiclass classification.
The remainder of this paper is organized as follows. Section 2 briefly introduces
notations and related works. Section 3 proposes our MNP-KSVC. The model optimization
is also discussed in Section 3. The nonlinear version of MNP-KSVC is extended in
Section 4. Experimental results are described in Sections 5 and 6 presents a discussion and
future work.

2. Preliminaries
In this section, we first describe the notations used throughout the paper. Then, we
briefly revisit the nonparallel classifier PTSVM and its variants.

2.1. Notations
In this paper, scalars are denoted by lowercase italic letters, vectors by lowercase
bold face letters, and matrices by uppercase letters. All vectors are column vectors unless
transformed to row vectors by a prime superscript (·)0 . A vector of zeros of arbitrary
dimensions is represented by 0. In addition, we denote e as a vector of ones and I as an
identity matrix of arbitrary dimensions. Moreover, let k · k stand for the L2 -norm.

2.2. TPMSVM
The TPMSVM [15] is originally proposed for binary classification with a heteroscedastic
noise structure. It attempts to seek two nonparallel parametric-margin hyperplanes via the
following QPPs:

1
min kw1 k2 + ν1 ∑ ξ 1i + c1 ∑ η1j ,
w1 ,b1 2 i ∈I j∈I
+ −
(1)
s.t. w10 xi + b1 ≥ −ξ 1i , ξ 1i ≥ 0,
w10 x j + b1 = η1j ,

and
1
min kw2 k2 + ν2 ∑ ξ 2i − c2 ∑ η2j ,
w2 ,b2 2 i ∈I j∈I
− +
(2)
s.t. w20 xi + b2 ≤ ξ 1i , ξ 2i ≥ 0,
w20 x j + b2 = η2j ,

where ν1 , ν2 , c1 , c2 are positive parameters, and the decision hyperplane is half of the sum
of the two hyperplanes. To obtain the solutions of problems (1) and (2), one needs to resort
to the dual problems

1
min
α1

2 i∈I ∑ α21i xi0 x j − ν1 ∑ ∑ α1i xi0 x j ,
+ j ∈I+ i ∈I+ j∈I−
s.t. ∑ α1i = ν1 , (3)
i ∈I+
0 ≤ α1i ≤ c1
Information 2021, 12, 515 4 of 19

and
1
min
α2

2 i∈I ∑ α22i xi0 x j − ν2 ∑ ∑ α2i xi0 x j ,
− j∈I− i ∈I− j∈I+
s.t. ∑ α2i = ν2 , (4)
i ∈I−
0 ≤ α2i ≤ c2 .

Then, the solutions (w1 , b1 ) and (w2 , b2 ) of dual problems (3) and (4) can be calculated
according to the Karush–Kuhn–Tucker (KKT) conditions [3],

w1 = ∑ α1i xi − ν1 ∑ x j and w2 = − ∑ α2i xi + ν2 ∑ xj, (5)


i ∈I+ j∈I− i ∈I− j∈I+
1 1
b1 = − ∑ wk0 xi and b2 = − |ISV | ∑ wk0 xi ,
|ISV1 | i∈I
(6)
SV1 2 i ∈I SV2

where ISV1 and ISV2 are indices of the support vector set.
Note that TPMSVM can capture more complex heteroscedastic error structures via
parametric-margin hyperplanes compared with TWSVM [13]. However, the variable b
in TMPSVM is not strictly convex, leading to a lack of a unique solution. Moreover,
the optimization problems (1) and (2) are only designed for binary classification tasks, and
thus are unsuitable for many real-world multiclass learning applications.

3. The Proposed MNP-KSVC


3.1. Model Formulation
To address the above issues in TPMSVM, this subsection proposes a novel K multiclass
nonparallel parametric-margin support vector machine, termed MNP-KSVC. Inspired by
the “hybrid classification and regression” learning paradigm [35], MNP-KSVC decomposes
the complicated K multiclass learning task into a series of “one-versus-one-versus-rest”
subproblems. Each subproblem focuses on the separation of the two selected classes
together with the remaining classes. Here, we utilize {−1, 1} to represent the label of the
two selected classes and 0 to label the rest. Namely, the subproblem is encoded with the
ternary output {−1, 0, 1}. The main idea of MNP-KSVC is to find a pair of nonparallel
parametric-margin hyperplanes for each subproblem,

f 1 ( x) = w10 x + b1 = 0 and f 2 ( x) = w20 x + b2 = 0, (7)

such that each one is approximate to the corresponding class while as far as possible from
the other class on one side. Moreover, the remaining classes are restricted in a region
between these hyperplanes.
Formally, we use Ik to express the set of indices for instances belonging to the k label
in the subproblem, where k is in {−1, 0, 1}. Inspired by the TPMSVM [15], our MNP-KSVC
considers the following two loss functions for the above ternary output learning problem:

∑ ∑
emp
R1 =ν1 (w10 x j + b1 ) + c1 max(0, 1 − (w10 xi + b1 ))
j∈I− i ∈I+
(8)
+ c2 ∑ max(0, ε − 1 + (w10 xl + b1 ))
l ∈I0

and
∑ ∑
emp
R2 = − ν2 (w20 xi + b2 ) + c3 max(0, 1 + (w20 x j + b2 ))
i ∈I− j∈I+
(9)
+ c4 ∑ max(0, ε − 1 − (w20 xl + b2 )),
l ∈I0
Information 2021, 12, 515 5 of 19

where ν1 , ν2 , c1 , c2 , c3 , c4 > 0 are penalty parameters and ε ∈ (0, 1] is a margin parameter.


Introducing the regularization term kwk22 + b2 yields the primal problems of
MNP-KSVC:
1
min (kw1 k2 + b12 ) + ν1 ∑ (w10 x j + b1 ) + c1 ∑ ξ 1i + c3 ∑ η1l
w1 ,ξ 1 ,η1 2 j∈I i ∈I l ∈I
− + 0
(10)
s.t. (w10 xi + b1 ) ≥ 1 − ξ 1j , ξ 1i ≥ 0,
− (w10 xl + b1 ) ≥ ε − 1 − η1l , η1l ≥ 0

and
1
min (kw2 k2 + b22 ) − ν2 ∑ (w20 xi + b2 ) + c2 ∑ ξ 2i + c4 ∑ η2l
w2 ,ξ 2 ,η2 2 i ∈I j∈I l ∈I
+ − 0
(11)
s.t. − (w20 x j + b2 ) ≥ 1 − ξ 2j , ξ 2j ≥ 0,
(w20 xl + b2 ) ≥ ε − 1 − η2l , η2l ≥ 0,

where ξ 1 , η1 , ξ 2 , η2 are non-negative slack vectors.


To deliver the mechanism of MNP-KSVC, we now give the following analysis and
geometrical explanation for problem (10):
• The first term is the L2 -norm of w1 and b1 . Minimizing this is done with the aim of
regulating the model complexity of MNP-KSVC and avoiding over-fitting. Furthermore,
this regularization term makes the QPPs strictly convex, leading to a unique solution.
• The second term is the sum of the projection value of the −1 labeled instances x j∈I−
on f 1 ( x j ). Optimizing this term leads instances x j∈I− to be as far as possible from the
+1 labeled parametric-margin hyperplane f 1 ( x j ).
• The third term with the first constraint requires the projection values of the +1 labeled
instances xi∈I+ on hyperplane f 1 ( x j ) to be not less than 1. Otherwise, a slack vector
ξ 1i is introduced to measure its error when the constraint is violated.
• The last term with the second constraint aims for the projection values of the remaining
0 labeled instances xl ∈I0 on hyperplane f 1 ( x j ) to be not more than 1 − ε. Otherwise,
a slack variable η1l is utilized to measure the corresponding error. Optimizing this is
done with the aim of keeping instances xl ∈I0 at least ε distance from the +1 labeled
instances. Moreover, ε controls the margin between “+” and “0” labeled instances.
The geometrical explanation for problem (11) is similar. Let u1 = [w1 b1 ], u2 = [w2 b2 ],
x̃ = [ x 1]. For the sake of simplicity, denote A = { x̃}i∈I+ , B = { x̃} j∈I− and C = { x̃}l ∈I0 as
the instances belonging to +1, −1, and 0 labels, respectively. Then, the matrix formulations of
problems (10) and (11) can be expressed as

1
min ku k2 + ν1 e0− Bu1 + c1 e0+ ξ 1 + c3 e00 η1
u1 ,ξ 1 ,η1 2 1
(12)
s.t. Au1 ≥ e+ − ξ 1 , ξ 1 ≥ 0,
− Cu1 ≥ (ε − 1)e0 − η1 , η1 ≥ 0

and
1
min ku2 k2 − ν2 e0+ Au2 + c2 e0− ξ 2 + c4 e00 η2
u2 ,ξ 2 ,η2 2
(13)
s.t. − Bu2 ≥ e− − ξ 2 , ξ 2 ≥ 0,
Cu2 ≥ (ε − 1)e0 − η2 , η2 ≥ 0.

In what follows, we discuss the solutions of problems (12) and (13).


Information 2021, 12, 515 6 of 19

3.2. Model Optimization


To obtain solutions to problems (12) and (13), we first derive their dual problems by
Theorem 1.

Theorem 1. Optimization problems

AA0 − AC 0
  
1 0 0 α1
min [ α1 β 1 ]
α1 ,β 1 2 −CA0 CC 0 β1
 
 α1 (14)
− ν1 e0− [ BA0 − BC 0 ] + [e0+ (ε − 1)e00 ]
β1
s.t. 0 ≤ α1 ≤ c1 e+ , 0 ≤ β 1 ≤ c3 e0

and
BB0 − BC 0
  
1 0 0 α2
min [ α2 β 2 ]
α2 ,β 2 2 −CB0 CC 0 β2
 
 α2 (15)
− ν2 e0+ [ AB0 − AC 0 ] + [e0− (ε − 1)e00 ]
β2
s.t. 0 ≤ α2 ≤ c3 e− , 0 ≤ β 2 ≤ c4 e0
are the dual problems of (12) and (13), respectively.

Proof of Theorem 1. Taking problem (12) as an example, firstly, we construct its Lagrangian
function as
1
L(Ξ1 ) = ku k2 + ν1 e0− Bu1 + c1 e0+ ξ 1 + c3 e00 η1
2 1
− α10 ( Au1 + ξ 1 − e+ ) − β01 (−Cu1 + η1 − (ε − 1)e0 ) (16)

− ϕ01 ξ 1 − γ1 η1 ,

where α1 , β 1 , ϕ1 , γ1 are the non-negative Lagrange multipliers to constraints of problem (12),


and Ξ1 = {u1 , ξ 1 , η1 , α1 , β 1 , ϕ1 , γ1 }. According to KKT conditions [3,37], the Lagrangian
function (16) has to be maximized with its dual variables α1 , β 1 , ϕ1 , γ1 while being minimized
with its primal variables u1 , ξ 1 , η1 . Differentiate L(Ξ1 ) with respect to u1 , η1 , ξ 1 ; then,
optimal conditions of problem (12) are obtained by

∇Lu1 = u1 + ν1 B0 e− − A0 α1 + C 0 β 1 = 0, (17)
∇Lξ 1 = c1 e+ − α1 − ϕ1 = 0, (18)
∇Lη1 = c3 e0 − β 1 − γ1 = 0, (19)
α10 ( Au1 + ξ 1 − e+ ) = 0, (20)
β01 (−Cu1 + η1 − (ε − 1)e0 ) = 0, (21)
ϕ01 ξ 1 = 0, (22)
γ1 η1 = 0. (23)

From (17), we have

u1 = A0 α1 − C 0 β 1 − ν1 B0 e− . (24)
Since ϕ1 , γ1 ≥ 0, from (18) and (19), we derive

0 ≤ α1 ≤ c1 e+ and 0 ≤ β 1 ≤ c3 e0 . (25)
Information 2021, 12, 515 7 of 19

Finally, substituting (24) into the Lagrangian function (16) and using KKT conditions
(17)–(23), the dual problem of (12) can be formulated as

AA0 − AC 0
  
1 0 0 α1
min [ α1 β 1 ]
α1 ,β 1 2 −CA0 CC 0 β1
 
 α1 (26)
− ν1 e0− [ BA0 − BC 0
] + [e0+ (ε − 1)e00 ]
β1
s.t. 0 ≤ α1 ≤ c1 e+ , 0 ≤ β 1 ≤ c3 e0 .

Similarly, we can derive the dual problem of (13) as problem (15).

For ease of notation, define q1 = [α1 ; β 1 ], N1 = [ A; −C ], h1 = [e+ ; (ε − 1)e0 ], and eq1 =


[c1 e+ ; c3 e0 ]. Then, problem (14) can be succinctly reformulated as

1 0
min q H q − d1 q1
q1 2 1 1 1 (27)
s.t. 0 ≤ q1 ≤ eq1

where
AA0 − AC 0
 
H1 = N1 N10 = (28)
−CA0 CC 0
and
d1 = ν1 e0− BN10 + h10 = ν1 e0− [ BA0 − BC 0 ] + [e0+ (ε − 1)e00 ] (29)
Similarly, define q2 = [α2 ; β 2 ], N2 = [− B; C ], h2 = [e− ; (ε − 1)e0 ], and eq2 = [c2 e− ; c4 e0 ]
for problem (15). Then, it can be reformulated as
1 0
min q H q − d2 q2
q2 2 2 2 2 (30)
s.t. 0 ≤ q2 ≤ eq2

where
BB0 − BC 0
 
H2 = N2 N20 = (31)
−CB0 CC 0
and
d2 = ν2 e0+ AN20 + h20 = ν2 e0+ [ AB0 − AC 0 ] + [e0− (ε − 1)e00 ] (32)
After solving dual problems (27) and (30) by the standard QPP solver, we can obtain the solutions
to primal problems (12) and (13) by Proposition 1 according to KKT conditions without proof.

Proposition 1. Suppose that q1 = [α1 ; β 1 ] and q2 = [α2 ; β 2 ] are solutions to dual problems (27) and (30),
respectively. Then, solutions u1 and u2 to primal problems (12) and (13) can be formulated by

u1 = N10 q1 − ν1 B0 e− = A0 α1 − C 0 β 1 − ν1 B0 e− (33)

and

u2 = − N20 q2 + ν2 A0 e+ = − B0 α2 + C 0 β 2 + ν2 A0 e+ (34)

3.3. Decision Rule


As mentioned in Section 3.1, our MNP-KSVC decomposes the multiclass learning task into a series
of subproblems with the “one-versus-one-versus-rest” strategy. Specifically, we construct K (K − 1)/2
MNP-KSVC classifiers for K-class classification. For each (k1 , k2 )-pair classifier, we relabel the dataset
with ternary outputs {−1, 0, +1} according to the two selected and the remaning class instances. Namely,
we label “+1”, “−1”, and “0” to instances belonging to the k i class, k j class, and all the remaining classes,
respectively. Then, we train it on the new relabeled dataset by solving problems (12) and (13).
As for the decision, we predict the label of an unseen instance x on the voting strategy from
the ensemble results of K (K − 1)/2 MNP-KSVC classifiers. Namely, we determine the vote via
each classifier according to the regions in which x is located. Taking the (k1 , k2 )-pair classifier as
Information 2021, 12, 515 8 of 19

an example, firstly, we reformulate the parametric-margin hyperplanes (7) w.r.t. u = [w; b] and
x̃ = [ x 1] as
f 1 ( x) = w10 x + b1 = u10 x̃ and f 2 ( x) = w20 x + b2 = u20 x̃, (35)
If x is located above the “+” hyperplane f 1 ( x) > 1 − ε—i.e., satisfying the condition u10 x̃ >
1 − ε—we vote it to the k i class. On the other hand, if x is located below the “-” hyperplane
f 2 ( x) < −1 + ε, we vote it to the k j class. Otherwise, it belongs to the remaining classes. In summary,
the decision function for the (k1 , k2 )-pair classifier can be expressed as

+1 f 1 ( x) > 1 − ε,


g( x) = −1 f 2 ( x) < −1 + ε, (36)

0 otherwise.

Finally, the given unseen instance x is assigned to the class label that gets the most votes.
In summary, the whole procedure of MNP-KSVC is established in Algorithm 1 with the Figure 1.

Algorithm 1 The procedure of MNP-KSVC


1: Input dataset T = {( xi , yi )|1 ≤ i ≤ m}, where xi ∈ Rn and yi = {1, · · · , K }.
2: Choose parameters ν1 , ν2 , c1 , c2 , c3 , c4 > 0 and ε ∈ (0, 1].
Training Procedure:
3: for k i in range(1, · · · , K ) do
4: for k j in range(k i + 1, · · · , K ) do
5: Relabel “+1”, “−1”, and “0” to instances belonging to k i , k j , and the rest classes.
6: Construct A, B, and C for problem (12) and (13) of (k1 , k2 )-pair classifier.
7: Solve the corresponding dual problem (27) and (30) by QPP solver. Then, get
their solution q1 = [α1 ; β 1 ] and q2 = [α2 ; β 2 ].
8: Build the auxiliary functions for classifier w.r.t. Proposition 1

k1 -class : f 1 ( x) = w10 x + b1 = q10 N1 x − ν1 e0− Bx (37)

and
k2 -class : f 2 ( x) = w20 x + b2 = −q20 N2 x + ν2 e0+ Ax (38)
9: end for
10: end for
Predicting Procedure:
11: For an unseen instance x, assign it to class y via the following vote strategy.
12: Initialize the vote vector vote = 0 for K classes.
13: for k i in range(1, · · · , K ) do
14: for k j in range(k i + 1, · · · , K ) do
15: Compute the decision function g( x) for the (k1 , k2 )-pair classifier w.r.t. (36).
16: if g( x) == 1 then
17: Update vote(k i ) + = 1
18: else if g( x) == −1 then
19: Update vote(k j ) + = 1
20: end if
21: end for
22: end for
23: Finally, assign the most votes of class to x via

label( x) ← arg max vote(k) (39)


k={1,··· ,K }
Information 2021, 12, 515 9 of 19

Figure 1. The flowchart for the training and predicting procedures of the proposed MNP-KSVC model.

4. Model Extension to the Nonlinear Case


In practice, the linear classifier is sometimes not suitable for many real-world nonlinear
learning tasks [18,21,30]. One of the effective solutions is to map linearly non-separable
instances into the feature space. Thus, in this section, we focus on the nonlinear extension
of MNP-KSVC.
To construct our nonlinear MNP-KSVC, we consider the feature mapping x ϕ =
ϕ( x) : Rn → H (RKHS, Reproducing Kernel Hilbert Space) with the kernel trick [3].
Information 2021, 12, 515 10 of 19

Define Aφ = {φ( x̃i )}i∈I+ , Bφ = {φ( x̃ j )} j∈I− , and C φ = {φ( x̃l )} j∈I0 . Then, the nonlinear
MNP-KSVC model optimizes the following two primal QPPs:

1
min ku k2 + ν1 e0− Bφ u1 + c1 e0+ ξ 1 + c3 e00 η1
u1 ,ξ 1 ,η1 2 1
(40)
s.t. Aφ u1 ≥ e+ − ξ 1 , ξ 1 ≥ 0,
− Cφ u1 ≥ (ε − 1)e0 − η1 , η1 ≥ 0

and
1
min ku2 k2 − ν2 e0+ Aφ u2 + c2 e0− ξ 2 + c4 e00 η2
u2 ,ξ 2 ,η2 2
(41)
s.t. − Bφ u2 ≥ e− − ξ 2 , ξ 2 ≥ 0,
Cφ u2 ≥ (ε − 1)e0 − η2 , η2 ≥ 0.

where ξ 1 , η1 , ξ 2 , η2 are non-negative slack vectors. Because the formulations of nonlinear


problems (40) and (41) are similar to the linear cases (12) and (13), we can obtain their
solutions in a similar manner.
In what follows, we define the kernel operation for MNP-KSVC.

Definition 1. Suppose that K (·, ·) is an appropriate kernel function; then, the kernel operation in
matrix form is defined as
K ( A, B) = Aφ , Bφ , (42)
whose ij-th element can be computed by

K ( A, B)ij = φ( xi )0 φ( x j ) = K ( xi , x j ). (43)

Then, we can derive the dual problems of (40) and (41) by Theorem 2.

Theorem 2. Optimization problems


  
1 0 0 K ( A, A) −K ( A, C ) α1
min [α1 β 1 ]
α1 ,β 1 2 −K (C, A) K (C, C ) β1
 
 α1 (44)
− ν1 e0− [K ( B, A) − K ( B, C )] + [e0+ (ε − 1)e00 ]
β1
s.t. 0 ≤ α1 ≤ c1 e+ , 0 ≤ β 1 ≤ c3 e0

and   
1 0 0 K ( B, B) −K ( B, C ) α2
min [ α2 β 2 ]
α2 ,β 2 2 −K (C, B) K (C, C ) β2
 
 α2 (45)
− ν2 e0+ [K ( A, B) − K ( A, C )] + [e0− (ε − 1)e00 ]
β2
s.t. 0 ≤ α2 ≤ c3 e− , 0 ≤ β 2 ≤ c4 e0
are the dual problems of (40) and (41), respectively.

Proof of Theorem 2. By introducing non-negative Lagrange multipliers α1 , β 1 , ϕ1 , and γ1


to constraints of problem (40), its Lagrangian function is built as

1
L(Ξ1 ) = ku k2 + ν1 e0− Bφ u1 + c1 e0+ ξ 1 + c3 e00 η1
2 1
− α10 ( Aφ u1 + ξ 1 − e+ ) − β01 (−Cφ u1 + η1 − (ε − 1)e0 ) (46)

− ϕ01 ξ 1 − γ1 η1 ,
Information 2021, 12, 515 11 of 19

where Ξ1 = {u1 , ξ 1 , η1 , α1 , β 1 , ϕ1 , γ1 }. Differentiate L(Ξ1 ) with respect to u1 , η1 , ξ 1 ; then,


optimal conditions of problem (40) are obtained by

∇Lu1 = u1 + ν1 Bφ0 e− − A0φ α1 + Cφ0 β 1 = 0, (47)


∇Lξ 1 = c1 e+ − α1 − ϕ1 = 0, (48)
∇Lη1 = c3 e0 − β 1 − γ1 = 0, (49)
α10 ( Aφ u1 + ξ 1 − e+ ) = 0, (50)
β01 (−Cφ u1 + η1 − (ε − 1)e0 ) = 0, (51)
ϕ01 ξ 1 = 0, (52)
γ1 η1 = 0. (53)

From (47), we have


u1 = A0φ α1 − Cφ0 β 1 − ν1 Bφ0 e− . (54)
Since ϕ1 , γ1 ≥ 0, from (48) and (49), we derive

0 ≤ α1 ≤ c1 e+ and 0 ≤ β 1 ≤ c3 e0 . (55)

Finally, substituting (54) into the Lagrangian function (16) and using KKT conditions
(47)–(53), the dual problem of (40) can be formulated as (44). Similarly, we can derive the
dual problem of (41) as problem (45).

The procedure of the nonlinear MNP-KSVC is similar to that of the linear one, but with
the following minor modifications in Algorithm 1:
• In contrast to some existing nonparallel SVMs, we do not need to consider the extra
kernel-generated technique since only inner products appear in dual problems (14)
and (15) of the linear MNP-KSVC. These dual formulations enable MNP-KSVC to
behave consistently in linear and nonlinear cases. Thus, taking appropriate kernel
functions instead of inner products in the Hessian matrix of dual problem (14) and
(15), i.e.,
   
K ( A, A) −K ( A, C ) K ( B, B) −K ( B, C )
Hφ1 = and Hφ2 = , (56)
−K (C, A) K (C, C ) −K (C, B) K (C, C )

we can obtain the dual formation of the nonlinear MNP-KSVC in (44) and (45).
• Once we obtain the solution q1 = [α1 ; β 1 ] and q2 = [α2 ; β 2 ] to problems (44) and (45)
respectively, the corresponding primal solutions u1 and u2 in feature space can be
formulated by

u1 = A0φ α1 − Cφ0 β 1 − ν1 Bφ0 e− (57)

and

u2 = − Bφ0 α2 + Cφ0 β 2 + ν2 A0φ e+ (58)

• For an unseen instance x, construct the following decision functions for (i, j)-pair
nonlinear MNP-KSVC classifier as

+1 f φ1 ( x̃) > 1 − ε,



gφ ( x) = −1 f φ2 ( x̃) < −1 + ε, (59)


0 otherwise,
Information 2021, 12, 515 12 of 19

where x̃ = [ x 1], the auxiliary functions f φ1 ( x) and f φ2 ( x) in feature space can be


expressed as

f φ1 ( x̃)) = K (u1 , x̃) = α10 K ( A, x̃) − β01 K (C, x̃) − ν1 e0− K ( B, x̃) (60)

and

f φ2 ( x̃)) = K (u2 , x̃) = −α20 K ( B, x̃) + β02 K (C, x̃) + ν2 e0+ K ( A, x̃) (61)

5. Numerical Experiments
5.1. Experimental Setting
To demonstrate the validity of MNP-KSVC, we perform extensive experiments on
several benchmark datasets that are commonly used for testing machine learning algorithms.
In experiments, we focus on comparing MNP-KSVC and four state-of-the-art multiclass
models—Multi-SVM, MBSVM, MTPMSVM, and Twin-KSVC—detailed as follows:
• Multi-SVM [38]: The idea is similar to the “one-versus-all” SVM [3]. However, it
generates K binary SVM classifiers by solving the one large dual QPP. That is, the k-th
classifier is trained with the k-th class instances encoded with positive labels and the
remaining class instances with negative labels. Then, the label of an unseen instance
is assigned by the “voting” scheme. The penalty parameter for each classifier in
Multi-SVM is c.
• MBSVM [33]: It is the multiclass extension of the binary TWSVM, which is based
on the “one-versus-all” strategy. MBSVM aims to find K nonparallel hyperplanes
by solving K QPPs simultaneously. Specifically, the k-th class instances are as far
away as the k-th hyperplane while the remaining instances are proximal to the k-th
hyperplane. An unseen instance is assigned to the label depending on to which of
the K hyperplanes it lies farthest away. The penalty parameter for each classifier in
MBSVM is c.
• MTPMSVM: Inspired by MBSVM [33], we use the “one-versus-all” strategy to implement
the multiclass version of TPMSVM [15] as a baseline. In contrast to MBSVM, it aims
to find K parametric-margin hyperplanes, such that each hyperplane is closer to its
corresponding class instances and as far away from the remaining class instances.
The penalty parameters for each classifier in MTPMSVM are (ν, c).
• Twin-KSVC [35]: It is another novel multiclass extension of TWSVM. Twin-KSVC
evaluates all the training instances in a “one-versus-one-versus-rest” structure with
the ternary output {−1, 0, +1}. It aims to find a pair of nonparallel hyperplanes for
each of two kinds of samples selected from K classes. The remaining class instances
are mapped into a region within these two nonparallel hyperplanes. The penalty
parameters for each classifier in Twin-KSVC are (c1 , c2 , c3 , c4 , ε).
All methods are implemented by MATLAB on a PC with an i7 Intel Core processor
with 32 GB RAM. The quadratic programming problems (QPPs) of all the classifiers are
solved by the “quadprog” function in MATLAB. Now, we describe the setting of our
experiments:
• Similar to [35,38], we use multiclass accuracy to measure each classifier, defined as

K
1
accuracy =
N ∑ ∑ I ( ĝ( x) = k) (62)
k =1 x∈Ik

where N is the scalar of the dataset, K is the total classes, ĝ( x ) is the prediction of the
classifier, and I (·) is an indicator function, which returns 1 if the class matches and 0
otherwise. Moreover, we adopt training time to represent the learning efficiency.
• To reduce the complexity of parameter selection for multiclass classifiers, we use
the same parameter setting for each learning subproblem. Specifically, we set the
same for all classifiers c in MBSVM and MBSVM, ν, c in MTPSVM, c1 = c2 , c3 = c4
Information 2021, 12, 515 13 of 19

in Twin-KSVC, and ν1 = ν2 , c1 = c2 , c3 = c4 in MNP-KSVC. For the nonlinear


k x − x k2
case, the RBF kernel K ( xi , x j ) = exp( i γ j ) is considered, where γ > 0 is the
kernel parameter.
• It is usually unknown beforehand which parameters are optimal for classifiers at
hand. Thus, we employ the 10-fold cross-validation technique [3] for parameter
selection. In detail, each dataset is randomly partitioned into 10 subsets with similar
sizes and distributions. Then, the union of 9 subsets is used as the training set, while
the remaining 1 is used as the testing set. Furthermore, we apply the grid-based
approach [3] to obtain the optimal parameters of each classifier. Namely, the penalty
parameters c, c1 , c2 , ν, ν1 and the kernel parameter γ are selected from {2i |i = −6, ..., 6},
while the margin parameter ε is chosen from {i |i = 0.1, 0.2, ..., 0.9}. Once selected, we
return them to learn the final decision function.

5.2. Result Comparison and Discussion


For comparison, we consider 10 real-world multiclass datasets from the UCI machine
learning repository (the UCI datasets are available at http://archive.ics.uci.edu/ml (accessed
on 10 September 2021)), whose statistics are summarized in Table 1. These datasets represent a
wide range of domains (include phytology, bioinformatics, pathology, and so on), sizes (from
178 to 2175), features (from 4 to 34), and classes (from 3 to 10). All datasets are normalized
before training such that features are transformed into [−1, 1]. Moreover, we carry out
experiments as follows. Firstly, each dataset is divided into 2 subsets: 70% for training and
30% for testing. Then, we train classifiers with 10-fold cross-validation executions. Finally, we
predict the testing set with the fine-tuning classifiers. Each experiment is repeated 10 times.

Table 1. Statistics for benchmark datasets used in experiments. # denotes the corresponding quantity.

Datasets #Instances #Training #Testing #Attributes #Class


Balance 625 438 187 4 3
Ecoli 327 229 98 7 5
Iris 150 105 45 4 3
Glass 214 150 64 13 6
Wine 178 125 53 13 3
Thyroid 215 150 65 5 3
Dermatology 358 251 107 34 6
Shuttle 2175 1522 653 9 5
Contraceptive 1473 1031 442 9 3
Pen Based 1100 770 330 16 10

Table 2 and 3 contain a summary of learning results for the proposed MNP-KSVC
model with other compared methods using linear and nonlinear kernels, respectively.
The results on 10 benchmark datasets include the mean and standard of the testing
multiclass accuracy (%), whose best performance is highlighted in bold. The comparison
results reveal the following:
• MNP-KSVC yields better performance than other classifiers in terms of accuracy on
almost all datasets. This confirms the efficacy of the proposed MNP-KSVC on the
multiclass learning tasks.
• Nonparallel based models (MBSVM, MTPMSVM, Twin-KSVC, and MNP-KSVC)
outperform the traditional SVM model. The reason is that SVM simply utilizes the
parallel hyperplanes to learn the decision function, leading to less capability to capture
the underlying multiclass distributions.
• MNP-KSVC has a better generalization ability than MBSVM and Twin-KSVC in most
cases. For instance, MNP-KSVC obtains a higher accuracy (84.59%) than MBSVM
(80.82%) and Twin-KSVC (80.12%) on the Ecoli dataset in the linear case. Similar
results can be obtained from the other datasets. Since MBSVM and Twin-KSVC simply
Information 2021, 12, 515 14 of 19

implement empirical risk minimization, they are easy to overfit. The regularization
terms are considered in our MNP-KSVC, which regulates the model complexity to
avoid overfitting.
• MTPMSVM is another multiclass extension, which is based on the “one-versus-rest”
strategy. With the help of the “hybrid classification and regression” learning paradigm,
our MNP-KSVC can learn more multiclass discriminate information.
• Furthermore, we count the number of Superior/Inferior (W/L) instances to the
compared classifier on all datasets for both linear and nonlinear cases, listed at the
bottom of Tables 2 and 3. The results indicate that MNP-KSVC achieves the best
results against others in terms of both W/L and average accuracy.
Table 2. Performance comparison on benchmark datasets for linear classifiers, in terms of mean ± std
of the testing multiclass accuracy (%). The Win/Loss (W/L) denotes the number of datasets for
which MNP-KSVC is superior/inferior to the compared classifiers. Ave. Acc and rank denote each
classifier’s average accuracy and rank over all datasets.

Datasets Multi-SVM MBSVM MTPMSVM Twin-KSVC MNP-KSVC


Balance 79.91 ± 4.86 86.14 ± 4.63 87.43 ± 3.56 86.64 ± 1.92 88.33 ± 2.03
Ecoli 72.32 ± 7.03 73.62 ± 4.95 73.77 ± 4.33 74.62 ± 3.86 75.91 ± 2.52
Iris 93.24 ± 2.66 92.96 ± 2.24 93.21 ± 2.53 92.69 ± 3.24 94.13 ± 2.49
Glass 69.18 ± 9.85 72.89 ± 7.28 68.68 ± 6.79 71.65 ± 5.96 71.38 ± 5.38
Wine 93.24 ± 3.02 95.28 ± 1.46 98.19 ± 1.61 97.23 ± 1.12 97.14 ± 1.27
Thyroid 90.24 ± 2.53 93.74 ± 1.58 92.92 ± 1.43 96.97 ± 1.08 97.52 ± 1.54
Dermatology 81.82 ± 3.79 86.67 ± 1.69 84.46 ± 3.29 90.37 ± 2.32 89.06 ± 3.16
Shuttle 71.58 ± 4.78 84.04 ± 2.92 77.17 ± 3.3 78.76 ± 4.81 83.16 ± 1.92
Contraceptive 38.53 ± 3.76 43.95 ± 2.51 44.65 ± 2.93 42.25 ± 2.45 44.22 ± 2.03
Pen Based 79.59 ± 3.75 85.94 ± 1.26 81.94 ± 2.04 83.21 ± 2.77 86.78 ± 1.37
Ave. Acc 76.97 81.52 80.24 81.43 82.76
W/L 10/0 8/2 8/2 9/1 /
Ave. rank 4.6 2.9 3.1 2.67 1.7

To provide more statistical evidence [39–41], we have further performed the non-
parametric Friedman test to check whether there are significant differences between
MNP-KSVC and other compared classifiers. The bottom lines of Tables 2 and 3 list the
average ranks of each classifier calculated by the Friedman test according to multiclass
accuracy. Results show that MNP-KSVC has the first rank in linear and nonlinear cases,
followed by NPSVM and RPTSVM successively. Now, we calculate the X F2 value for the
Friedman test as
" #
k 2
12 × N k ( k + 1 )
k(k + 1) i∑
X F2 = ri2 − (63)
=1
4

where N is the number of datasets, k is the number of classifiers, and ri is the average rank
on N datasets for the i-th model. For the linear case, we compute term ∑ik=1 ri2 in Table 2 as

k
∑ ri2 = 4.62 + 2.92 + 3.12 + 2.72 + 1.72
(64)
i =1
≈ 49.1989
Information 2021, 12, 515 15 of 19

Table 3. Performance comparison on benchmark datasets for nonlinear classifiers, in terms of the
mean ± std of the testing multiclass accuracy (%). The Win/Loss (W/L) denotes the number of
datasets for which MNP-KSVC is superior/inferior to the compared classifiers. Ave. Acc and rank
denote each classifier’s average accuracy and rank over all datasets.

Datasets Multi-SVM MBSVM MTPMSVM Twin-KSVC MNP-KSVC


Balance 79.94 ± 5.57 87.13 ± 4.57 89.92 ± 3.27 90.17 ± 4.08 91.41 ± 3.42
Ecoli 79.81 ± 5.19 80.82 ± 4.25 84.74 ± 3.49 80.12 ± 4.49 84.59 ± 3.83
Iris 90.26 ± 2.65 96.89 ± 1.83 97.39 ± 2.14 94.36 ± 2.03 98.04 ± 1.47
Glass 58.66 ± 4.76 52.64 ± 4.08 62.73 ± 3.95 56.01 ± 4.26 64.12 ± 2.98
Wine 94.36 ± 2.12 94.31 ± 1.87 98.38 ± 2.62 97.26 ± 1.61 98.04 ± 1.45
Thyroid 91.82 ± 1.75 93.27 ± 1.95 95.34 ± 0.89 94.14 ± 1.05 95.63 ± 0.84
Pen Based 86.53 ± 3.93 89.51 ± 3.29 85.06 ± 3.68 86.12 ± 2.47 88.78 ± 2.68
Dermatology 84.43 ± 4.29 83.82 ± 3.86 84.51 ± 2.64 83.33 ± 3.29 85.26 ± 3.12
Shuttle 74.36 ± 3.39 83.74 ± 2.73 87.06 ± 2.89 86.91 ± 2.38 89.37 ± 1.73
Contraceptive 42.09 ± 4.95 44.28 ± 4.02 45.93 ± 3.85 47.51 ± 3.57 47.47 ± 3.68
Ave. Acc 78.22 80.64 83.11 81.59 84.27
W/L 10/0 9/1 8/2 9/1 /
Ave. rank 4.3 3.69 2.3 3.3 1.4

Then, substituting k = 5, N = 10 and (64) into (63), we have

5(5+1)2
h i
12×10
X F2 = 5(5+1)
49.1989 − 4 ≈ 16.7956 (65)

Based on the above Friedman statistic X F2 = 16.7956, we calculate the F-distribution


statistic F F with (k − 1, (k − 1)( N − 1)) = (4, 36) degrees of freedom as

( N − 1) × X F2 9 × 16.7956
FF = 2
= ≈ 6.5143 (66)
N ( k − 1) − X F 10 × 4 − 16.7956

Moreover, we compute the p-value, which rejects the null hypothesis at the level
of significance α = 0.05. Similarly, we calculate the statistic for the nonlinear case, as
summarized in Table 4. The results reject the null hypothesis for both linear and nonlinear
cases and reveal the existence of significant differences in the performances of classifiers.

Table 4. Results of Friedman test on learning results.

Statistic X F2 p-Value Hypothesis


Linear 6.5143 2.9503 × 10−4 reject
Nonlinear 10.2004 1.2267 × 10−5 reject

Furthermore, we record the average learning time of each classifier for the above
UCI datasets experiments, as shown in Figures 2 and 3. The results show that our
MNP-KSVC is faster than Multi-SVM and Twin-KSVC, while slightly slower than MBSVM
and MTPMSVM for linear and nonlinear cases. Multi-SVM performs the slowest of all
classifiers because Multi-SVM needs to solve larger problems than the nonparallel-based
classifiers. Moreover, the Hessian matrix of dual QPPs in MNP-KSVC avoids the time-costly
matrix inversion, leading to greater effectiveness than Twin-KSVC. Overall, the above
results confirm the feasibility of Twin-KSVC.
Information 2021, 12, 515 16 of 19

Multi-SVM MBSVM MTPMSVM Twin-KSVC MNP-KSVC


10
5
5.566

5.286
2

2.056
1.735
1

1.370
1.167
1.022
5
0.577
0.471
Time (s)

2
0.244
0.234

0.1

0.105
0.089
5

0.068
0.066

0.059
0.051

0.051
0.039
2

0.026

0.024
0.01

0.017

0.017
0.015

0.015
5

2
0.001
Balance Ecoli Iris Glass Wine
Datasets
Multi-SVM MBSVM MTPMSVM Twin-KSVC MNP-KSVC
10
5

5.338

4.774
4.336
2

2.876
2.015

1.839
1
1.142
1.017

0.841
5

0.772

0.591
0.479

0.429
Time (s)

0.409

0.386
0.351

0.336
0.325

2
0.247
0.179

0.1
0.146
0.098

5
0.052
0.034

2
0.019

0.01
5

2
0.001
Thyroid Dermatology Shuttle Contraceptive Pen Based
Datasets

Figure 2. The learning times on benchmark datasets for linear classifiers.

Multi-SVM MBSVM MTPMSVM Twin-KSVC MNP-KSVC


10
8.078

5
6.805
3.945

3.604

3.042

2
2.633

2.49
2.037

1.757
1.617
1.606

1.528

1
1.426
1.385
1.308

1.098
1.066

1.016
Time (s)

0.872

0.782
0.753

2
0.225
0.174

0.153
0.1
0.115

0.01
Balance Ecoli Iris Glass Wine
Datasets
Multi-SVM MBSVM MTPMSVM Twin-KSVC MNP-KSVC
100
5
37.476

2
25.904

21.384

10
13.742
11.844

10.466
9.829
8.434

8.334
7.827

7.805

5
6.493

6.269

5.497
5.032
4.185
Time (s)

3.489

2
2.333
1.952

1
1.669
1.354

5
0.638
0.446

0.381
0.316

2
0.1
5

2
0.01
Thyroid Pen Based Dermatology Shuttle Contraceptive
Datasets

Figure 3. The learning times on benchmark datasets for nonlinear classifiers.


Information 2021, 12, 515 17 of 19

6. Discussion and Future Work


This paper proposes a novel K multiclass nonparallel parametric-margin support
vector machine, termed MNP-KSVC. Specifically, our MNP-KSVC has the following
attractive merits:
• For the K-class learning task, our MNP-KSVC first transforms the complicated multiclass
problem into K (K − 1)/2 subproblems via a “one-versus-one-versus-rest” strategy. Each
subproblem focuses on separating the two selected classes and the rest of the classes. That
is, we utilize {−1, 1} to represent the label of the two selected classes and 0 to label the
rest. Unlike the “one-versus-all” strategy used in Multi-SVM, MBSVM, and MTPMSVM,
this encoding strategy can alleviate the imbalanced issues that sometimes occur in
multiclass learning [32,35].
• For each subproblem, our MNP-KSVC aims to learn a pair of nonparallel parametric-
margin hyperplanes (36) with the ternary encoding {−1, 0, +1}. These parametric-
margin hyperplanes are closer to their corresponding class and at least one distance
from the other class. Meanwhile, they restrict the rest of the instances into an insensitive
region. A hybrid classification and regression loss joined with the regularization is
further utilized to formulate the optimization problems (10) and (11) of MNP-KSVC.
• Moreover, the nonlinear extension is also presented to deal with the nonlinear multiclass
learning tasks. In contrast to MBSVM [33] and Twin-KSVC [35], the linear and nonlinear
models in MNP-KSVC are consistent. Applying the linear kernel in the nonlinear
problems (44) and (45) results in the same formulations as the original linear problems
(14) and (15).
• Extensive experiments on various datasets demonstrate the effectiveness of the proposed
MNP-KSVC compared with Multi-SVM, MBSVM, MTPMSVM, and Twin-KSVC.
There are several interesting directions to research in the future, such as extensions
to semi-supervised learning [26,42], multi-label learning [22], and privilege-information
learning [43].

Author Contributions: Conceptualization, S.-W.D. and W.-J.C.; Funding acquisition, W.-J.C. and Y.-H.S.;
Investigation, M.-C.Z. and P.C.; Methodology, S.-W.D., M.-C.Z. and Y.-H.S.; Project administration,
S.-W.D. and W.-J.C.; Supervision, W.-J.C. and Y.-H.S.; Validation, M.-C.Z. and H.-F.S.; Visualization, P.C.
and H.-F.S.; Writing—original draft, S.-W.D., P.C. and H.-F.S.; Writing—review and editing, S.-W.D. and
W.-J.C. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China, Grant
number: 61603338, 11871183, 61866010, and 11426202; the Natural Science Foundation of Zhejiang
Province of China, Grant number: LY21F030013; the Natural Science Foundation of Hainan Province
of China, Grant number: 120RC449; the Scientific Research Foundation of Hainan University, Grant
number: kyqd(sk)1804.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data sharing not applicable. No new data were created or analyzed in
this study.
Conflicts of Interest: The authors declare no conflict of interest.
Information 2021, 12, 515 18 of 19

Abbreviations
The following abbreviations are used in this manuscript:

QPP Quadratic Problem Programming


KKT Karush–Kuhn–Tucker
SVM Support Vector Machine
TWSVM Twin Support Vector Machine
Multi-SVM Multiclass Support Vector Machine
MBSVM Multiple Birth Support Vector Machine
MTPMSVM Multiple Twin Parametric Margin Support Vector Machine
Twin-KSVC Multiclass Twin Support Vector Classifier
MNP-KSVC Multiclass Nonparallel Parametric-Margin Support Vector Classifier

References
1. Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques; Morgan Kaufmann: San Francisco, CA, USA, 2012.
2. Vapnik, V. Statistical Learning Theory; Wiley: New York, NY, USA, 1998.
3. Deng, N.; Tian, Y.; Zhang, C. Support Vector Machines: Theory, Algorithms and Extensions; CRC Press: Philadelphia, PA, USA, 2013.
4. Sitaula, C.; Aryal, S.; Xiang, Y.; Basnet, A.; Lu, X. Content and context features for scene image representation. Knowl.-Based Syst.
2021, 232, 107470. [CrossRef]
5. Ma, S.; Cheng, B.; Shang, Z.; Liu, G. Scattering transform and LSPTSVM based fault diagnosis of rotating machinery. Mech. Syst.
Signal Process. 2018, 104, 155–170. [CrossRef]
6. Liu, T.; Yan, D.; Wang, R.; Yan, N.; Chen, G. Identification of Fake Stereo Audio Using SVM and CNN. Information 2021, 12, 263.
[CrossRef]
7. You, S.D. Classification of Relaxation and Concentration Mental States with EEG. Information 2021, 12, 187. [CrossRef]
8. Kang, J.; Han, X.; Song, J.; Niu, Z.; Li, X. The identification of children with autism spectrum disorder by SVM approach on EEG
and eye-tracking data. Comput. Biol. Med. 2020, 120, 103722. [CrossRef]
9. Lazcano, R.; Salvador, R.; Marrero-Martin, M.; Leporati, F.; Juarez, E.; Callico, G.M.; Sanz, C.; Madronal, D.; Florimbi, G.;
Sancho, J.; et al. Parallel Implementations Assessment of a Spatial-Spectral Classifier for Hyperspectral Clinical Applications.
IEEE Access 2019, 7, 152316–152333. [CrossRef]
10. Fabelo, H.; Ortega, S.; Szolna, A.; Bulters, D.; Pineiro, J.F.; Kabwama, S.; J-O’Shanahan, A.; Bulstrode, H.; Bisshopp, S.;
Kiran, B.R.; et al. In-Vivo Hyperspectral Human Brain Image Database for Brain Cancer Detection. IEEE Access 2019,
7, 39098–39116. [CrossRef]
11. Roy, S.D.; Debbarma, S. A novel OC-SVM based ensemble learning framework for attack detection in AGC loop of power
systems. Electr. Power Syst. Res. 2022, 202, 107625. [CrossRef]
12. Mangasarian, O.L.; Wild, E.W. Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE
Trans. Pattern Anal. Mach. Intell. 2006, 28, 69–74. [CrossRef] [PubMed]
13. Jayadeva.; Khemchandani, R.; Chandra, S. Twin Support Vector Machines for Pattern Classification. IEEE Trans. Pattern Anal.
Mach. Intell. 2007, 29, 905–910. [CrossRef] [PubMed]
14. Ding, S.; Hua, X. An overview on nonparallel hyperplane support vector machine algorithms. Neural Comput. Appl. 2014,
25, 975–982. [CrossRef]
15. Peng, X. TPMSVM: A novel twin parametric-margin support vector machine for pattern recognition. Pattern Recogn. 2011,
44, 2678–2692. [CrossRef]
16. Arun Kumar, M.; Gopal, M. Least squares twin support vector machines for pattern classification. Expert Syst. Appl. 2009,
36, 7535–7543. [CrossRef]
17. Shao, Y.; Zhang, C.; Wang, X.; Deng, N. Improvements on Twin Support Vector Machines. IEEE Trans. Neural Netw. 2011,
22, 962–968. [CrossRef]
18. Chen, W.J.; Shao, Y.H.; Li, C.N.; Liu, M.Z.; Wang, Z.; Deng, N.Y. ν-projection twin support vector machine for pattern classification.
Neurocomputing 2020, 376, 10–24. [CrossRef]
19. Tian, Y.; Qi, Z.; Ju, X.; Shi, Y.; Liu, X. Nonparallel Support Vector Machines for Pattern Classification. IEEE Trans. Cybern. 2014,
44, 1067–1079. [CrossRef]
20. Tian, Y.; Ping, Y. Large-scale linear nonparallel support vector machine solver. Neural Netw. 2014, 50, 166–174. [CrossRef]
[PubMed]
21. Chen, W.; Shao, Y.; Li, C.; Wang, Y.; Liu, M.; Wang, Z. NPrSVM: Nonparallel sparse projection support vector machine with
efficient algorithm. Appl. Soft Comput. 2020, 90, 106142. [CrossRef]
22. Chen, W.; Shao, Y.; Li, C.; Deng, N. MLTSVM: A novel twin support vector machine to multi-label learning. Pattern Recogn. 2016,
52, 61–74. [CrossRef]
23. Bai, L.; Shao, Y.H.; Wang, Z.; Chen, W.J.; Deng, N.Y. Multiple Flat Projections for Cross-Manifold Clustering. IEEE Trans. Cybern.
2021, 1–15. Available online: https://ieeexplore.ieee.org/document/9343292 (accessed on 20 October 2021).
Information 2021, 12, 515 19 of 19

24. Hou, Q.; Liu, L.; Zhen, L.; Jing, L. A novel projection nonparallel support vector machine for pattern classification. Eng. Appl.
Artif. Intel. 2018, 75, 64–75. [CrossRef]
25. Liu, L.; Chu, M.; Gong, R.; Zhang, L. An Improved Nonparallel Support Vector Machine. IEEE Trans. Neur. Net. Lear. 2021,
32, 5129–5143. [CrossRef]
26. Chen, W.; Shao, Y.; Deng, N.; Feng, Z. Laplacian least squares twin support vector machine for semi-supervised classification.
Neurocomputing 2014, 145, 465–476. [CrossRef]
27. Shao, Y.; Chen, W.; Zhang, J.; Wang, Z.; Deng, N. An efficient weighted Lagrangian twin support vector machine for imbalanced
data classification. Pattern Recogn. 2014, 47, 3158–3167. [CrossRef]
28. Chen, W.; Shao, Y.; Ning, H. Laplacian smooth twin support vector machine for semi-supervised classification. Int. J. Mach. Learn.
Cyber. 2014, 5, 459–468. [CrossRef]
29. Gao, Z.; Fang, S.C.; Gao, X.; Luo, J.; Medhin, N. A novel kernel-free least squares twin support vector machine for fast and
accurate multi-class classification. Knowl.-Based Syst. 2021, 226, 107123. [CrossRef]
30. Ding, S.; Zhao, X.; Zhang, J.; Zhang, X.; Xue, Y. A review on multi-class TWSVM. Artif. Intell. Rev. 2017, 52, 775–801. [CrossRef]
31. Qiang, W.; Zhang, J.; Zhen, L.; Jing, L. Robust weighted linear loss twin multi-class support vector regression for large-scale
classification. Signal Process. 2020, 170, 107449. [CrossRef]
32. de Lima, M.D.; Costa, N.L.; Barbosa, R. Improvements on least squares twin multi-class classification support vector machine.
Neurocomputing 2018, 313, 196–205. [CrossRef]
33. Yang, Z.; Shao, Y.; Zhang, X. Multiple birth support vector machine for multi-class classification. Neural Comput. Appl. 2013,
22, 153–161. [CrossRef]
34. Angulo, C.; Parra, X.; Català, A. K-SVCR. A support vector machine for multi-class classification. Neurocomputing 2003, 55, 57–77.
[CrossRef]
35. Xu, Y.; Guo, R.; Wang, L. A Twin Multi-Class Classification Support Vector Machine. Cogn. Comput. 2013, 5, 580–588. [CrossRef]
36. Nasiri, J.A.; Moghadam Charkari, N.; Jalili, S. Least squares twin multi-class classification support vector machine. Pattern
Recogn. 2015, 48, 984–992. [CrossRef]
37. Mangasarian, O.L. Nonlinear Programming; SIAM Press: Philadelphia, PA, USA, 1993.
38. Tomar, D.; Agarwal, S. A comparison on multi-class classification methods based on least squares twin support vector machine.
Knowl.-Based Syst. 2015, 81, 131–147. [CrossRef]
39. Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 1937,
200, 675–701. [CrossRef]
40. Yu, Z.; Wang, Z.; You, J.; Zhang, J.; Liu, J.; Wong, H.S.; Han, G. A New Kind of Nonparametric Test for Statistical Comparison of
Multiple Classifiers Over Multiple Datasets. IEEE Trans. Cybern. 2017, 47, 4418–4431. [CrossRef] [PubMed]
41. Hatamlou, A. Black hole: A new heuristic optimization approach for data clustering. Inform. Sci. 2013, 222, 175–184. [CrossRef]
42. Chen, W.; Shao, Y.; Xu, D.; Fu, Y. Manifold proximal support vector machine for semi-supervised classification. Appl. Intell. 2014,
40, 623–638. [CrossRef]
43. Li, Y.; Sun, H.; Yan, W.; Cui, Q. R-CTSVM+: Robust capped L1-norm twin support vector machine with privileged information.
Inform. Sci. 2021, 574, 12–32. [CrossRef]

You might also like