Neurocomputing: Yukun Bao, Zhongyi Hu, Tao Xiong
Neurocomputing: Yukun Bao, Zhongyi Hu, Tao Xiong
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
a r t i c l e i n f o a b s t r a c t
Article history: Addressing the issue of SVMs parameters optimization, this study proposes an efficient memetic
Received 3 October 2012 algorithm based on particle swarm optimization algorithm (PSO) and pattern search (PS). In the
Received in revised form proposed memetic algorithm, PSO is responsible for exploration of the search space and the detection of
15 November 2012
the potential regions with optimum solutions, while pattern search (PS) is used to produce an effective
Accepted 16 January 2013
exploitation on the potential regions obtained by PSO. Moreover, a novel probabilistic selection strategy
Communicated by W.S. Hong
Available online 27 February 2013 is proposed to select the appropriate individuals among the current population to undergo local
refinement, keeping a well balance between exploration and exploitation. Experimental results confirm
Keywords: that the local refinement with PS and our proposed selection strategy are effective, and finally
Parameters optimization
demonstrate the effectiveness and robustness of the proposed PSO-PS based MA for SVMs parameters
Support vector machines
optimization.
Memetic algorithms
Particle swarm optimization & 2013 Elsevier B.V. All rights reserved.
Pattern search
0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2013.01.027
Y. Bao et al. / Neurocomputing 117 (2013) 98–106 99
guaranteed to find the global optimum solution to a problem, where /,S denotes the inner product; w is the weight vector,
though they are generally good at finding ‘‘acceptable good’’ or which controls the smoothness of the model; b is a parameter of
near-optimal solutions to problems. Another drawback of EAs is bias; xi is a non-negative slack variable which defines the
that they are not well suited to perform finely tuned search, but permitted misclassification error. In the regularized training error
on the other hand, they are good at exploring the solution space given by Eq. (1), the first term, 1/299w992, is the regularization
since they search from a set of designs and not from a single term to be used as a measure of flatness or complexity of the
P
design. function. The second term ni¼ 1 xi is the empirical risk. Hence, C
Memetic algorithms (MAs), first coined by Moscato [20,21], is referred to as the penalty coefficient and it specifies the trade-
have been regarded as an promising framework that combines off between the empirical risk and the regularization term.
the evolutionary algorithms (EAs) with problem-specific local According to Wolfe’s Dual theorem and the saddle-point
searcher (LS), where the latter is often referred to as a meme condition, the dual optimization problem of the above primal
defined as a unit of cultural evolution that is capable of local one is obtained as the following quadratic programming form
refinements. From an optimization point of view, MAs are hybrid X
n
1 X n
EAs that combine global and local search using an EA to perform max ai a a y y /x ,x S
2 i,j ¼ 1 i j i j i j
exploration while the local search method performs exploitation. i¼1
related to MAs reported in the literature of SVMs parameters Here, K(xi,x) is defined as kernel function. The value of the
optimization. In this study, by combining particle swarm optimi- kernel function is equivalent to the inner product of two vectors xi
zation algorithm (PSO) and pattern search (PS), an efficient PSO- and xj in the high-dimensional feature space f(xi) and f(xj), that
PS based memetic algorithm (MA) is proposed to optimize the is, K(xi,xj)¼/f(xi),f(xj)S. The elegance of using the kernel func-
parameters of SVMs. In the proposed PSO-PS based MA, PSO is tion is that one can deal with feature spaces of arbitrary
responsible for exploration of the search space and the detection dimensionality without having to compute the map f(x) explicitly.
of the potential regions with optimum solution, while a direct Any function that satisfies Mercer’s condition [27] can be used as
search algorithm, pattern search (PS) in this case, is used to the kernel function. The typical examples of kernel function are as
produce an effective exploitation on the potential regions follows:
obtained by PSO. Furthermore, the problem of selecting promising
individuals to experience local refinement is also addressed and
thus a novel probabilistic selection strategy is proposed to keep a Linear kernel: K(xi,xj)¼/xi,xjS
balance between exploration and exploitation. The performance Polynomial kernel: K(xi,xj)¼(g/xi,xjS þr)d, g 40.
of proposed PSO-PS based MA for parameters optimization in Radial basis function (RBF) kernel: Kðxi ,xj Þ ¼
SVMs is justified on several benchmarks against selected estab- 2
expðg99xi xj 99 Þ, g 40:
lished counterparts. Experimental results and comparisons Sigmoid kernel: Kðxi ,xj Þ ¼ tanhðg/xi ,xj S þ rÞ:
demonstrate the effectiveness of the proposed PSO-PS based MA
for parameters optimization of SVMs. where,g, r and d are kernel parameters. The kernel parameter
The rest of the study is organized as follows. Section 2 presents should be carefully chosen as it implicitly defines the structure of
a brief review on SVMs, Section 3 elaborates on the PSO-PS based the high dimensional feature space f(x) and thus controls the
MA proposed in this study, the results with discussions are complexity of the final solution [28]. Generally, among these
reported in Section 4, and finally, we conclude this study in kernel functions, RBF kernel is strongly recommended and widely
Section 5. used for its performance and complexity [2] and thus SVMs with
RBF kernel function is the one studied in this study.
Overall, SVMs are a powerful classifier with strong theoretical
2. Support vector machines foundations and excellent generalization performance. Note that
before implementing the SVMs with RBF kernel, there are two
Consider a binary classification problem involving a set of parameters (penalty parameter C and RBF kernel parameterg)
training dataset {(x1,y1),(x2,y2)...(xn,yn)} CRn R, where xi is input have to set. Previous studies show that these two parameters play
space, yiA{ 1,1} is the labels of the input space xi and n denotes an important role in the success of SVMs. In this study, addressing
the number of the data items in the training set. Based on the the selection of these two parameters, a PSO-PS based memetic
structured risk minimization (SRM) principle [1], SVMs aim to algorithm (MA) is proposed and justified within the context of
generate an optimal hyper-plane to separate the two classes by SVMs with RBF kernel function.
minimizing the regularized training error
1 2
Xn
3. Proposed memetic algorithms
min 99w99 þ C xi
2 i¼1
Memetic algorithms (MAs), first coined by Moscato [20,21],
s:t: yi ð/w,xi Sþ bÞ Z 1xi , i ¼ 1,2:::::n
have come to light as an union of population-based stochastic
xi Z 0, i ¼ 1,2:::::n ð1Þ global search and local based cultural evolution, which are inspired
100 Y. Bao et al. / Neurocomputing 117 (2013) 98–106
by Darwinian principles of natural evolution and Dawkins notion of a t can be described as vti ¼ ovti1 ,. . .,vtiD 4. In this study, the position
meme. The meme is defined as a unit of cultural evolution that is and velocity of one particle have only two dimensions that denote the
capable of local/individual refinements. As designed as hybridization, two parameters (C and g) to be optimized in SVMs. A set of particles
MAs are expected to make full use of the balance between explora- P ¼ fxt1 ,. . .,xtM g is called a swarm (or population).
tion and exploitation of the search space to complement the Initially, the particles in the population are randomly gener-
advantages of population based methods and local based methods. ated in the solution space according to Eqs. (4) and (5),
Nowadays, MAs have revealed their successes with high performance
xid ¼ xmin,d þ r ðxmax,d xmin,d Þ ð4Þ
and superior robustness across a wide range of problem domains
[29,30]. However, according to the No Free Lunch theory, the vid ¼ vmin,d þr ðvmax,d vmin,d Þ ð5Þ
hybridization can be more complex and expensive to implement.
Considering the simplicity in implementation of particle swarm where xid and vid are the position value and velocity in the dth
optimizations (PSO) and its proven performance, being able to dimension of the ith particle, respectively. r is a random number
produce good solutions at a low computational cost [16,31–33], this in the range of [0,1]. xmin,d and xmax,d denote the search space of
study proposed a PSO based memetic algorithm with the pattern the dth dimension (that is, the upper and lower limit of the
search (PS) as a local individual learner, to solve the parameters parameters in SVMs), while vmin,d and vmax,d are used to con-
optimization problem in SVMs. The concept of the proposed memetic straint the velocity of the particle to avoid the particle flying
algorithms is illustrated in Fig. 1. In the proposed PSO-PS based MA, outside the search space.
PSO is used for exploring the global search space of parameters, and After the initialization of the population, the algorithm itera-
pattern search is deserved to play the role of local exploitation based tively generates offspring using PSO based operator and under-
on its advantages of simplicity, flexibility and robustness. goes local refinement with pattern search, which is to be
In the following sub-sections, we will explain the implemen- discussed in the following sub-sections.
tation of the proposed PSO-PS MA for parameters optimization in
details. 3.2. Fitness function
3.1. Representation and initialization The objective of optimizing the parameters in a SVMs model is
to maximize the generalization ability or minimize generalization
In our MA, each particle/individual in the population is a potential error of the model. As cross-validation can provide unbiased
solution to the SVMs parameters. And its status is characterized estimation of the generalization error, in this study, we take the
according to its position and velocity. The D-dimensional position for k-fold cross-validation error into consideration. Let kCVMR
the particle i at iteration t can be represented as xti ¼ oxti1 ,. . .,xtiD 4. denotes the mean of the k-folds’ misclassification rate (MR). With
Likewise, the fly velocity (position change) for particle i at iteration the use of cross-validation, kCVMR is deserved to be a criterion for
Begin End
Return optimal
Evaluate the swarm and particle
initialize the pbest, gbest
Is termination
Evaluate the swarm and update condition satisfied?
the pbest, gbest Next
No
Decrease
Evaluate the
Selection of particles to be search step
neighbors of current
refined point
Next Is improvement No
generation Refinement with pattern search obtained?
Yes
Yes
generalization ability of a model. Hence, kCVMR is used as fitness column of Pk, O denotes the neighborhood of the center point. The
function in this study. The fitness function and errors are shown termination conditions are the maximum iteration is met, or the
as follows: search step gets a predefined small value. To balance the amount
Fitness ¼ kCVMR of computational budget allocated for exploration versus exploi-
tation, D0/8 is empirically selected as the minimum search step.
1Xk
kCVMR ¼ MRj ð6Þ
kj¼1
3.5. A probabilistic selection strategy
where, k is the size of folds. In this study, fivefold cross-validation
is conducted, which is widely used and suggested in [34]. An important issue in MAs to be addressed is which particles
from the population should be selected to undergo local improve-
ment (see ‘‘Selection of particles to be refined’’ procedure in Fig. 1).
3.3. PSO based operator
This selection directly defines the balance of evolution (explora-
tion) and the individual learning (exploitation) under limited
In the proposed PSO-PS based MA, PSO based operators are
computational budget, which is of importance to the success of
used to explore the search space. Particle swarm optimization
MAs. In previous studies, a common way is to apply a local search
(PSO) [35] is a population-based meta-heuristic that simulates
to all newly generated offspring, which suffers from much strong
social behavior such as birds flocking to a promising position to
local exploitation. Another way is to apply local search to particles
achieve precise objectives (e.g., finding food) in a multi-
with a predefined probability pl, which seems too blind and the pl
dimensional space by interacting among them.
is not easy to define [37]. Besides, as [38] suggests that those
To search for the optimal solution, each particle adjusts their
solutions that are in proximity of a promising basin of attraction
flight trajectories by using the following updating equations:
received an extended budget, the randomly selected particles
tþ1
vid ¼ w vtid þ c1 r 1 ðpid xtid Þ þc2 r 2 ðpgd xtid Þ ð7Þ may need more cost to reach the optimal. Hence, it seems a good
choice that only the particles with the fitness larger than a
xtidþ 1 ¼ xtid þ vtidþ 1 ð8Þ threshold v are selected for local improvement [39], while the
threshold is problem specific and some of the selected particles
where c1 ,c2 A R are acceleration coefficients, w is inertia weight,
may be crowed in the same promising region.
r1 and r2 are random numbers in the range of [0,1]. vtid and xtid
To this end, a probabilistic selection strategy of selecting non-
denote the velocity and position of the ith particle in dth
crowed individuals to undergo refinement is proposed in this
dimension at tth iteration, respectively. pid is the value in dimen-
study. Beginning with the best individual, each individual under-
sion d of the best parameters combination (a particle) found so far
goes local search with the probability pl(x) (in Eq. (10)), a
by particle i. pi ¼/pi1,y,piDS is called personal best (pbest)
individual with better fitness has a more chance to be selected
position. pgd is the value in dimension d of the best parameters
for exploitation. To avoid exploiting the same region simulta-
combination (a particle) found so far in the swarm (P); pg ¼
neously, once an individual is selected to be refined, those
/pg1,y,pgDS is considered as the global best (gbest) position.
individuals crowd around the selected individual (i.e., the Eucli-
Note that each particle takes individual (lbest) and social (gbest)
dean distance is less than or equal to r) is eliminated from the
information into account for updating its velocity and position.
population. The only parameter in the proposed strategy is r,
In the search space, particles track the individual’s best values
which denotes the radius of the exploitation region. The r is
and the best global values. The process is terminated if the
selected empirically in this study, and detail of this strategy is
number of iteration reaches the pre-determined maximum num-
illustrated in Fig. 3.
ber of iteration.
Formally, the selection probability pl(x) of each solution x in
the current population P is specified by the following roulette-
3.4. Refinement with pattern search wheel selection scheme with the linear scaling
In the proposed PSO-PS based MA, pattern search is employed f ðPÞf ðxÞ
pl ðxÞ ¼ P max ð10Þ
to conduct exploitation of the parameters solution space. Pattern f max ðPÞf ðyÞ
yAP
search (PS) is a kind of numerical optimization methods that do
not require the gradient of the problem to be optimized. It is a where f max ðPÞ is the maximum (worse) fitness value among the
simple effective optimization technique suitable for optimizing a current population P.
wide range of objective functions. It does not require derivatives
but direct function evaluation, making the method suitable for
parameters optimization of SVMs with the validation error mini- 4. Experimental results and discussions
mized. The convergence to local minima for constrained problems
as well as unconstrained problems has been proven in [36]. It 4.1. Experimental setup
investigates nearest neighbor parameter vectors of the current
point, and tries to find a better move. If all neighbor points fail to In this study, we conduct two experiments to justify the
produce a decrease, then the search step is reduced. This search proposed PSO-PS based MA as well as the proposed probabilistic
stops until the search step gets sufficiently small, which ensuring selection strategy. The first experiment aims to investigate the
the convergence to a local minimum. The pattern search is based performance of the four variants of the proposed PSO-PS based
on a pattern Pk that defines the neighborhood of current points. MA with different selection strategies and thus the best variant is
A well often used pattern is five-point unit-size rood pattern determined. Then, the best selected variant is employed to
which can be represented by the generating matrix Pk in Eq. (9). compare against the established counterparts in literature in the
second experiment.
1 0 1 0 0
Pk ¼ ð9Þ LibSVM (Version 2.91) [34] is used for the implementation of
0 1 0 1 0
SVMs. PSO and several variants of proposed MA are coded in
The procedure of pattern search is outlined in Fig. 2. D0 MATLAB 2009b using computer with Intel Core 2 Duo CPU T5750,
denotes the default search step of PS, D is a search step, pk is a 2G RAM. As suggested by Hsu et al. [2], the search space of the
102 Y. Bao et al. / Neurocomputing 117 (2013) 98–106
Begin
Initialize : P’ as a copy of P; Ω as a null set ;
While (P’ is not NULL)
Get the best fitness individual x belongs to P’, i.e., x = arg min{f (x) | x P'};
Calculate pl (x) according to Eq.(10);
parameters is defined as an exponentially growing space: with training and testing sets (about 60%:40%) describing the
X ¼ log2 C; Y ¼ log2 g and 10 rXr10; 10 rY r10. Through binary classification problem. The dataset in each group has
initial experiments, the parameters in PSO and MA are set as already been normalized to zero mean and unit standard devia-
follows. The acceleration coefficients c1 and c2 both equals to 2. tion. For the dataset in each group, the training set is used to
For w, a time varying inertia weight linearly decreasing from select the optimal parameters in SVM based on fivefold cross-
1.2 to 0.8 is employed. The default search step (D) in pattern validation and establish the classifier, and then the test set is used
search is set to 1. The radius of the exploitation region (r) used in to assess the classifier with the optimal parameters. Table 1
the probabilistic selection strategy equals to 2. summarizes the general information of these datasets.
To test the performance of the proposed PSO-PS based meme-
tic algorithms, computational experiments are carried out against 4.2. Experiment I
some well-studied benchmarks. In this study, thirteen commonly
used benchmark datasets from IDA Benchmark Repository1 are In the proposed PSO-PS based MA, a pattern search is inte-
used. These datasets, having been preprocessed by [40], consist of grated into PSO. So it is worth to evaluate the influence of pattern
100 random groups (20 in the case of image and splice datasets) search. Besides, in order to establish an efficient and robust MA,
the selection of particles to be locally improved plays an impor-
tant role. Hence, in the first experiment, a single PSO and four
1
http://www.raetschlab.org/Members/raetsch/benchmark/. variants of proposed MA with different selection strategies are
Y. Bao et al. / Neurocomputing 117 (2013) 98–106 103
Table 1
1681
1681
1681
1681
1681
1681
1681
6
]eva
Details of the benchmark datasets.
25.97
23.67
34.25
24.33
23.19
4.33
error
attributes set set groups
11.9
GS
19
Banana 2 400 4900 100
Breast 9 200 77 100
1109 7 115.3
1055.2 7 80.6
1079.7 7 91.5
1029.7 7 75.5
1110.8 7 94.4
2.5 7 2.17
976.8 7 90.7
1060.2 7 90.7
cancer
Diabetis 8 468 300 100
Flare solar 9 666 400 100
German 20 700 300 100
]eva
Heart 13 170 100 100
Image 18 1300 1010 20
Ringnorm 20 400 7000 100
10.637 0.48
23.477 1.98
24.017 2.26
16.137 2.45
21.92 7 2.29
1.087 1.25
24.697 4.75
32.567 1.8
Splice 60 1000 2175 20
Thyroid 5 140 75 100
Titanic 3 150 2051 100
error
MA4
Twonorm 20 400 7000 100
Waveform 21 400 4600 100
1165.87 126.7
1184.37 103.3
1140.5 7 91.8
1109.5 7 98.1
994.4 7 93.6
1117.17 93.6
1108 7 70.3
4.57 3.2
considered. The following abbreviations represent the algorithms
considered in this sub-section.
]eva
PSO: a single PSO without local search;
24.91 7 4.72
23.477 1.98
24.36 7 2.43
16.18 7 2.87
22.06 7 2.42
2.087 1.75
10.757 0.52
32.71 7 2.01
MA1: all newly generated particles are refined with PS;
MA2: PS is applied to each new particle with a predefined
error
MA3
probability pl ¼0.1;
MA3: PS is only applied to the two best fitness particles;
MA4: PS is applied to the selected particles by probabilistic
1135.87 119.4
1128.67 114.4
1093.7 7 109.5
1093.9 7 101.9
1047.6 7 89.6
1031.2 7 73.6
1126.67 105
selection strategy proposed in Section 3.5;
4.2 7 3.8
GS: a grid search method on the search space, with the search
step equal to 0.5.
]eva
25.49 72.73
16.74 73.26
22.78 72.65
11.05 70.68
25.65 74.80
33.18 72.20
3.83 73.50
Heart). The population size for PSO and each MA is fixed to 15. The
stopping criterions of PSO are set as follows: the number of
error
MA2
2.8 72.3
1088.3 796
16.85 7 3.26
22.81 7 2.61
11.037 0.68
24.41 7 2.30
33.31 7 2.06
25.66 7 2.50
3.837 3.50
1061 7103.5
1026.5 794.5
981.8 799.8
901.5 7108
evaluations. The ‘‘Ave’’ row shows the average error rate (%) and
1 73.5
921.5 787
993.1 787
23.74 7 2.90
5.007 5.00
17.34 7 3.9
error rate is always better than that obtained by single PSO, which
demonstrates the effectiveness of incorporating the local refine-
error
PSO
AveRank
Data sets
Diabetis
German
Banana
Heart
Ave
the average number of fitness evaluation of MA is larger than that algorithm. Table 4 and Fig. 4 show the comparison of mean
of PSO by only at most 13% (e.g., MA3). Finally, compared with GS, testing error rates (%) and the standard deviations (%) obtained by
both PSO and MA achieve better results than GS in terms of test the proposed PSO-PS based MA and other algorithms proposed in
error rates and number of fitness evaluations. The facts above previous studies [12,13,40]. For convenience of comparison, the
imply that among the six algorithms investigated, four variants of best errors for every dataset are bolded. The first column reports
proposed MA tend to achieve the smallest error rates with a little the results obtained by two stages grid search based on fivefold
more fitness evaluations than that of PSO but still much less than cross-validation from [40]. The results of second and third
that of grid search. column, both reported in [13] are, respectively, obtained by a
By comparing four variants of proposed MA, several results can radius-margin method and a fast-efficient strategy. ‘‘L1-CB’’ and
be observed. It is interesting to note that MA1, where PS is applied ‘‘L2-CB’’, respectively, show the results of hybridization of CLPSO
to all newly generated individuals, does not achieve as good and BFGS by minimizing the L1-SVMs and L2-SVMs general-
results as expected. This may be due to the premature conver- ization errors in [12]. Among them, although no method can
gence of the solutions because of the strong local refinement is outperform all the others on each dataset, the proposed PSO-PS
applied to the population. MA2 performs comparably with the based MA has lowest error on more than a half of the datasets.
MA1, which indicates that the randomly selected strategy by From the perspective of average rank, the proposed MA gains the
probability pl is also failed. MA3 gains better results than MA1 rank 1.5471.88, which is the best one among the six methods
and MA2, and even has competitive results with MA4 on some investigated. Finally, the Wilcoxon signed rank test is used to
datasets (e.g., Diabetis dataset). Finally, MA4 always performs the verify the performance of the proposed approach against ‘‘CV’’ for
best in terms of the error rate on each dataset and the average each dataset2 . The results show that the proposed MA is
error rate on all datasets, while the number of fitness evaluations significantly superior to ‘‘CV’’ in terms of error rate on eight
does not increase too much than that of the other three variants datasets. To be concluded, those results imply our proposed
of MA. All the above results suggest the success of our proposed approach is competitive with other methods and even more
probabilistic selection strategy used in MA4, highlighted by effective and robust than others, which indicate that our proposed
addressing the balance of exploration and exploitation in the PSO-PS based MA is a practical method for parameters optimiza-
proposed MA. Hence, we can conclude that the MA4 with the tion in SVMs.
proposed probabilistic selection strategy is effective and robust
for solving parameters optimization in SVMs modeling.
In addition, the average computational time of each method is 5. Conclusions
given in Table 3. It is obvious that all of the four variants of
proposed MA consume more time than the single PSO in almost The parameters of SVMs are of great importance to the
all the datasets, which is mainly due to refinement with pattern success of the SVMs. To optimize the parameters properly, this
search. Among the four variants of the proposed MA with study proposed a PSO and pattern search (PS) based memetic
different selection strategy, the MA4 consumes a little more time algorithm. In the proposed PSO-PS based MA, PSO was used to
than the other three variants. Since the accuracy of a SVM model explore the search space and detect the potential regions with
is often more important and the consumed time of MA4 is still optimum solutions, while pattern search (PS) was used to
much less than that of the grid search, MA4 is selected as the best conduct an effective refinement on the potential regions
one to perform a further comparison in sub-section 4.3 with the obtained by PSO. To balance the exploration of PSO and exploi-
established counterparts in the literature. tation of PS, a probabilistic selection strategy was also intro-
duced. The performance of proposed PSO-PS based MA was
4.3. Experiment II confirmed through experiments on several benchmark datasets.
The results can be summarized as: (1) with different kinds of
Furthermore, following the same experimental settings as in selection strategy, the variants of the proposed MA are all
[12,13,40], the second experiment will directly compare the superior to single PSO and GS in terms of error rates and
performance of the best variant as MA4 of the proposed PSO-PS standard deviations, which confirms the success of refinement
based MA with established algorithms in previous studies. with pattern search. (2) Although the number of fitness evalua-
The experiment is conducted as follows. At first, on each of the tions and computational time of each MA are slightly inferior to
first five groups of every benchmark dataset, the parameters are PSO, each MA consumes much less fitness evaluations and
optimized using the proposed MA4, and then the final parameters computational time than grid search. (3) Among the variants
are computed as the median of obtained five estimations. At last, of the proposed MA, the MA with our proposed probabilistic
the mean error rate and standard deviation of SVMs with such selection strategy gains the best performance in terms of error
parameters on all 100 (20 for image and splice datasets) groups of rates and standard deviations. (4) Compared with established
each dataset are taken as the results of the corresponding counterparts in previous studies, the proposed PSO-PS based MA
using proposed probabilistic selection strategy is capable of
yielding higher and more robust performance than others.
Hence, the proposed PSO-PS based MA can be a promising
Table 3
Average computational time in seconds of each method.
alternative for SVMs parameters optimization.
One should note, however, that this study only focus on the
Data sets PSO MA1 MA2 MA3 MA4 GS PSO-PS based MA for parameters optimization in SVMs, some
other varieties of PSO and evolutionary algorithms can also be
Banana 126.35 129.89 147.81 145.76 149.57 187.8
used in the proposed MA. Besides, as the rapid development of
Breast cancer 56.71 59.69 62.48 62.01 67.25 74.5
Diabetis 275.03 288.73 264.02 278.56 311.61 358.13 hardware, the parallelization of the proposed MA to make full use
Flare solar 405.88 402.13 447.31 474.37 432.12 564.97
German 687.53 700.78 764.22 697.11 787.2 994.01
2
Heart 34.54 35.94 40.34 35.21 37.07 53.64 Only ‘‘CV’’ and ‘‘Proposed MA’’ are compared using Wilcoxon signed rank
Ave 264.34 269.53 287.70 282.17 297.47 372.18 test because the detail results of others approaches on each dataset are not
AveRank 1.33 2.50 3.67 3.00 4.50 6.00 available. The detail results of ‘‘CV’’ on each dataset are available at http://www.
raetschlab.org/Members/raetsch/benchmark/.
Y. Bao et al. / Neurocomputing 117 (2013) 98–106 105
Table 4
Comparison with established algorithms.
Banana 11.53 7 0.66 10.48 7 0.40 10.68 7 0.50 11.65 7 5.90 10.44 7 0.46 10.35 70.40nnn
Breast Cancer 26.047 4.74 27.83 7 4.62 24.97 7 4.62 28.75 7 4.61 26.03 7 4.60 24.37 74.61n
Diabetis 25.53 7 1.73 34.56 7 2.17 23.167 1.65 25.36 7 2.95 23.50 7 1.66 23.17 71.35nnn
Flare Solar 32.43 7 1.82 35.51 7 1.65 32.37 7 1.78 33.15 7 1.83 33.33 7 1.79 32.07 71.73n
German 23.61 7 2.07 28.84 7 2.03 23.417 2.10 29.26 7 2.88 24.41 7 2.13 23.72 72.05
Heart 15.957 3.26 21.84 7 3.70 16.62 7 3.20 16.94 7 3.71 16.02 7 3.26 15.95 72.11
Image 2.96 7 0.60 4.19 7 0.70 5.55 7 0.60 3.347 0.71 2.97 7 0.45 2.90 70.65
Ringnorm 1.66 7 0.12 1.657 0.11 2.03 7 0.25 1.687 0.13 1.68 7 0.12 1.66 70.12
Splice 10.887 0.66 10.94 7 0.70 11.11 7 0.66 10.997 0.74 10.847 0.74 10.89 70.63
Thyroid 4.807 2.19 4.017 2.18 4.4 7 2.40 4.557 2.10 4.207 2.03 3.4 72.07nnn
Titanic 22.42 7 1.02 23.04 7 1.18 22.87 7 1.12 23.51 7 2.92 22.89 7 1.15 21.58 71.06nnn
Twonorm 2.96 7 0.23 3.077 0.246 2.69 7 0.19 2.907 0.27 2.64 7 0.20 2.49 70.13nn
Waveform 9.88 7 0.43 11.10 7 0.50 107 0.39 9.787 0.48 9.58 7 0.37 9.30 70.37nnn
Ave 14.67 7 1.50 16.70 7 1.55 14.60 7 1.50 15.53 7 2.25 14.50 7 1.46 13.99 71.33
AveRank 3.38 7 3.73 4.62 7 3.69 3.62 7 3.42 4.817 5.31 3.047 2.96 1.54 71.88
40 6
CV CV
35 RM RM
EF 5 EF
L1-CB L1-CB
30
L2-CB L2-CB
Proposed MA 4 Proposed MA
25
20 3
15
2
10
1
5
0 0
1
2
3
4
5
6
7
8
9
10
11
12
13
1
2
3
4
5
6
7
8
9
10
11
12
13
Fig. 4. Results on 13 benchmark datasets: (a) mean of error rates; and (b) standard deviation of error rates.
[11] M.M. Adankon, M. Cheriet, Model selection for the LS-SVM. Application to [35] J. Kennedy, R.C. Eberhart, Particle swarm optimization, in: Proceedings of the
handwriting recognition, Patt. Recogn. 42 (12) (2009) 3264–3270. IEEE International Conference on Neural Networks, 4 (1995) pp. 1942–1948.
[12] S. Li, M. Tan, Tuning SVM parameters by using a hybrid CLPSO–BFGS [36] E.D. Dolan, R.M. Lewis, V. Torczon, On the local convergence of pattern
algorithm, Neurocomputing 73 (2010) 2089–2096. search, SIAM J. Optim. 14 (2) (2003) 567–583.
[13] Z.B. Xu, M.W. Dai, D.Y. Meng, Fast and efficient strategies for model selection [37] B. Liu, L. Wang, Y.H. Jin, An effective PSO-based memetic algorithm for flow shop
of Gaussian support vector machine, IEEE Trans. Syst. Man Cybern. Part scheduling, IEEE Trans. Syst. Man Cybern. Part B–Cybernetics 37 (1) (2007) 18–27.
B–Cybernetics 39 (5) (2009) 1292–1307. [38] M. Lozano, F. Herrera, N. Krasnogor, D. Molina, Real-coded memetic
[14] F. Imbault, K. Lebart, A stochastic optimization approach for parameter algorithms with crossover hill-climbing, Evol. Comput. 12 (3) (2004) 273–302.
tuning of Support Vector Machines, in: J. Kittler, M. Petrou, M. Nixon (Eds.), [39] M.L. Tang, X. Yao, A memetic algorithm for VLSI floorplanning, IEEE Trans.
Proceedings of the 17th International Conference on Pattern Recognition, vol Syst. Man Cybern. Part B–Cybernetics 37 (1) (2007) 62–69.
4, IEEE Computer Soc, Los Alamitos, 2004, pp. 597–600. [40] G. Rätsch, T. Onoda, K.R. Müller, Soft Margins for AdaBoost, Mach. Learn. 42
[15] X.C. Guo, J.H. Yang, C.G. Wu, C.Y. Wang, Y.C. Liang, A novel LS-SVMs hyper- (3) (2001) 287–320.
parameter selection based on particle swarm optimization, Neurocomputing
71 (2008) 3211–3215.
[16] H.J. Escalante, M. Montes, L.E. Sucar, Particle swarm model selection, J. Mach. Yukun Bao is an Associate Professor at the Depart-
Learn. Res. 10 (2009) 405–440, February. ment of Management Science and Information Sys-
[17] X. Zhang, X. Chen, Z. He, An ACO-based algorithm for parameter optimization tems, School of Management, Huazhong University of
of support vector machines, Expert Syst. Appl. 37 (9) (2010) 6618–6628. Science and Technology, PR China. He received his Bsc,
[18] T. Gomes, R. Prudencio, C. Soares, A. Rossi, A. Carvalho, Combining meta- Msc, and PhD in Management Science and Engineering
learning and search techniques to select parameters for support vector from Huazhong University of Science and Technology,
machines, Neurocomputing 75 (2012) 3–13. PR China, in 1996, 1999 and 2002, respectively. He has
[19] M.N. Kapp, R. Sabourin, P. Maupin, A dynamic model selection strategy for been principal investigators for two research projects
support vector machine classifiers, Appl. Soft Comput. 12 (8) (2012) 2550–2565. funded by Natural Science Foundation of China and
[20] P. Moscato, M.G. Norman, A ‘memetic’ approach for the traveling salesman has served as a referee of paper review for several IEEE
problem implementation of a computational ecology for combinatorial journals and international journals and a PC member
optimization on message-passing systems, Parallel Comput. Transputer Appl. for several international academic conferences. His
(1992) 177–186. research interests are time series modeling and fore-
[21] P. Moscato, On evolution search optimization genetic algorithms and martial casting, business intelligence and data mining.
arts: towards memetic algorithms, caltech concurrent computation program,
C3P Report, 826 (1989).
[22] J. Tang, M.H. Lim, Y.S. Ong, Diversity-adaptive parallel memetic algorithm for
solving large scale combinatorial optimization problems, Soft Comput. 11 (9)
(2007) 873–888. Zhongyi Hu received BS degree in Information System
[23] S. Areibi, Z. Yang, Effective memetic algorithms for VLSI design ¼genetic in 2009 from Harbin Institute of Technology at Weihai,
algorithmsþ local search plus multi-level clustering, Evol. Comput. 12 (3) China. He received his MS degree in Management
(2004) 327–353. Science and Engineering in 2011 from Huazhong Uni-
[24] A. Elhossini, S. Areibi, R. Dony, Strength pareto particle swarm optimization versity of Science and Technology, China. Currently, he
and hybrid EA-PSO for multi-objective optimization, Evol. Comput. 18 (1) is working toward the PhD degree in Management
(2010) 127–156. Science and Engineering, Huazhong University of
[25] A. Lara, G. Sanchez, C. Coello, O. Schuetze, HCS: a new local search strategy Science and Technology, China. His research interests
for memetic multiobjective evolutionary algorithms, IEEE Trans. Evol. include support vector machines, swarm intelligence,
Comput. 14 (1) (2010) 112–132. memetic algorithms, and time series forecasting.
[26] Q.H. Nguyen, Y.S. Ong, M.H. Lim, A probabilistic memetic framework, IEEE
Trans. Evol. Comput. 13 (3) (2009) 604–623.
[27] V.N. Vapnik, Statistical Learning Theory, John Wiley&Sons, Inc, New York, 1998.
[28] F.E.H. Tay, L.J. Cao, Application of support vector machines in financial time
series forecasting, Omega 29 (4) (2001) 309–317.
Tao Xiong received his Bsc and Msc in Management
[29] X. Chen, Y. Ong, M. Lim, K. Tan, A multi-facet survey on memetic computa-
Science and Engineering from Huazhong University of
tion, IEEE Trans. Evol. Comput. 15 (5) (2011) 591–607.
Science and Technology, PR China, in 2008 and 2010,
[30] F. Neri, C. Cotta, Memetic algorithms and memetic computing optimization: a
respectively. Currently, he is working toward the PhD
literature review, Swarm Evol. Comput. 2 (2012) 1–14.
degree in Management Science and Engineering, Huaz-
[31] X. Hu, R.C. Eberhart, Y. Shi, Engineering optimization with particle swarm, in:
hong University of Science and Technology, China. His
Proceedings of the 2003 IEEE Swarm Intelligence Symposium, 2003, pp. 53–57.
research interests include multi-step-ahead time ser-
[32] M. Reyes, C. Coello, Multi-objective particle swarm optimizers: a survey of
ies forecasting, interval data analysis, and computa-
the state-of-the-art, Int. J. Comput. Intell. Res. 2 (3) (2006) 287–308.
tional intelligence.
[33] J. Kennedy, R.C. Eberhart, Y. Shi, Swarm Intelligence, Morgan Kaufmann
Publishers, San Francisco, 2001.
[34] C. Chang, C. Lin, LIBSVM: a library for support vector machines, 2001.
Software available from /http://www.csie.ntu.edu.tw/ cjlin/libsvmS.