0% found this document useful (0 votes)
86 views9 pages

Neurocomputing: Yukun Bao, Zhongyi Hu, Tao Xiong

The document proposes a memetic algorithm to optimize parameters for support vector machines (SVMs). The memetic algorithm combines particle swarm optimization (PSO) for exploration with pattern search (PS) for local exploitation. A probabilistic selection strategy is used to balance exploration and exploitation. Experimental results show the local refinement and selection strategies are effective, and the PSO-PS based memetic algorithm optimizes SVM parameters well.

Uploaded by

Joseph Jose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views9 pages

Neurocomputing: Yukun Bao, Zhongyi Hu, Tao Xiong

The document proposes a memetic algorithm to optimize parameters for support vector machines (SVMs). The memetic algorithm combines particle swarm optimization (PSO) for exploration with pattern search (PS) for local exploitation. A probabilistic selection strategy is used to balance exploration and exploitation. Experimental results show the local refinement and selection strategies are effective, and the PSO-PS based memetic algorithm optimizes SVM parameters well.

Uploaded by

Joseph Jose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Neurocomputing 117 (2013) 98–106

Contents lists available at SciVerse ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

A PSO and pattern search based memetic algorithm for SVMs


parameters optimization
Yukun Bao n, Zhongyi Hu, Tao Xiong
Department of Management Science and Information Systems, School of Management, Huazhong University of Science and Technology, Wuhan 430074, PR China

a r t i c l e i n f o a b s t r a c t

Article history: Addressing the issue of SVMs parameters optimization, this study proposes an efficient memetic
Received 3 October 2012 algorithm based on particle swarm optimization algorithm (PSO) and pattern search (PS). In the
Received in revised form proposed memetic algorithm, PSO is responsible for exploration of the search space and the detection of
15 November 2012
the potential regions with optimum solutions, while pattern search (PS) is used to produce an effective
Accepted 16 January 2013
exploitation on the potential regions obtained by PSO. Moreover, a novel probabilistic selection strategy
Communicated by W.S. Hong
Available online 27 February 2013 is proposed to select the appropriate individuals among the current population to undergo local
refinement, keeping a well balance between exploration and exploitation. Experimental results confirm
Keywords: that the local refinement with PS and our proposed selection strategy are effective, and finally
Parameters optimization
demonstrate the effectiveness and robustness of the proposed PSO-PS based MA for SVMs parameters
Support vector machines
optimization.
Memetic algorithms
Particle swarm optimization & 2013 Elsevier B.V. All rights reserved.
Pattern search

1. Introduction search, which conducts an exhaustive search on the parameters


space with the validation error (such as fivefold cross-validation
Support vector machines (SVMs), first presented by Vapnik [1] error) minimized. Obviously, although it can be easily parallelized
based on statistical learning theory (SLT) and structural risk and seems safe [2], its computational cost scales exponentially
minimization principle (SRM), solve the classification problem with the number of parameters and the number of the sampling
by maximizing the margin between the separating hyper-plane points for each parameter [3]. Besides, the performance of grid
and the data. SVMs implement the structural risk minimization search is sensitive to the setting of the grid range and coarseness
principle (SRM), which seeks to minimize an upper bound of the for each parameter, which are not easy to set without prior
generalization error by using penalty parameter C as trade-off knowledge. Instead of minimizing the validation error, another
between training error and model complexity. The use of kernel streams of studies focused on minimizing the approximated
tricks enables the SVMs to have the ability of dealing with bounds on generalization performance by numerical optimization
nonlinear features in high dimensional feature space. Due to the methods [4–12]. The numerical optimization methods are gen-
excellent generalization performance, SVMs have been widely erally more efficient than grid search for their fast convergence
used in various areas, such as pattern recognition, text categor- rate. However, this kind of methods can only be suitable for the
ization, fault diagnosis and so on. However, the generalization cases that error bounds are differentiable and continuous with
ability of SVMs highly depends on the adequate setting of respect to the parameters in SVMs. It is also worth noting that the
parameters, such as penalty coefficient C and kernel parameters. numerical optimization methods, such as gradient descent, may
Therefore, the selection of the optimal parameters is of critical get stuck in local optima and highly depend on the starting points.
importance to obtain a good performance in handling learning Furthermore, experimental evidence showed that several estab-
task with SVMs. lished bounds methods could not compete with traditional five-
In this study, we mainly concentrate on the parameters fold cross-validation method [6], which indicates inevitably gap
optimization of SVMs, which has gained great attentions in the between the approximation bounds and real error [13]. Recently,
past several years. The most popular and universal method is grid evolutionary algorithms such as genetic algorithm (GA), particle
swarm optimization (PSO), ant colony optimization (ACO) and
simulated annealing algorithm (SA) have been employed to
n
Corresponding author. Tel.: þ86 27 87558579; fax: þ 86 27 87556437.
optimize the SVMs parameters for their better global search
E-mail addresses: yukunbao@hust.edu.cn, y.bao@ieee.org, abilities against numerical optimization methods [14–19].
yukunbao@mail.hust.edu.cn (Y. Bao). However, GAs and other evolutionary algorithms (EAs) are not

0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2013.01.027
Y. Bao et al. / Neurocomputing 117 (2013) 98–106 99

guaranteed to find the global optimum solution to a problem, where /,S denotes the inner product; w is the weight vector,
though they are generally good at finding ‘‘acceptable good’’ or which controls the smoothness of the model; b is a parameter of
near-optimal solutions to problems. Another drawback of EAs is bias; xi is a non-negative slack variable which defines the
that they are not well suited to perform finely tuned search, but permitted misclassification error. In the regularized training error
on the other hand, they are good at exploring the solution space given by Eq. (1), the first term, 1/299w992, is the regularization
since they search from a set of designs and not from a single term to be used as a measure of flatness or complexity of the
P
design. function. The second term ni¼ 1 xi is the empirical risk. Hence, C
Memetic algorithms (MAs), first coined by Moscato [20,21], is referred to as the penalty coefficient and it specifies the trade-
have been regarded as an promising framework that combines off between the empirical risk and the regularization term.
the evolutionary algorithms (EAs) with problem-specific local According to Wolfe’s Dual theorem and the saddle-point
searcher (LS), where the latter is often referred to as a meme condition, the dual optimization problem of the above primal
defined as a unit of cultural evolution that is capable of local one is obtained as the following quadratic programming form
refinements. From an optimization point of view, MAs are hybrid X
n
1 X n
EAs that combine global and local search using an EA to perform max ai  a a y y /x ,x S
2 i,j ¼ 1 i j i j i j
exploration while the local search method performs exploitation. i¼1

This has the ability to exploit the complementary advantages X


n

of EAs (generality, robustness, global search efficiency), and


s:t: ai yi ¼ 0, 0 r ai r C, i ¼ 1,2:::::n ð2Þ
i¼1
problem-specific local search (exploiting application-specific pro-
blem structure, rapid convergence toward local minima) [22]. where (ai)iAn are nonnegative Lagrangian multipliers that can be
Up to date, MAs have been recognized as a powerful algorithmic obtained by solving the convex quadratic programming problem
paradigm for evolutionary computing in a wide variety of areas stated above.
[23–26]. In particular, the relative advantage of MAs over EAs is Finally, by solving Eq. (2) and using the trick of kernel function,
quite consistent on complex search spaces. the decision function can be defined as the following explicit form
Since the parameters space of SVMs is often considered !
Xn
complex, it is of interest to justify the use of MAs for SVMs f ðxÞ ¼ sgn yi ai Kðxi ,xÞ þ b ð3Þ
parameters optimization. There have been, if any, few works i¼1

related to MAs reported in the literature of SVMs parameters Here, K(xi,x) is defined as kernel function. The value of the
optimization. In this study, by combining particle swarm optimi- kernel function is equivalent to the inner product of two vectors xi
zation algorithm (PSO) and pattern search (PS), an efficient PSO- and xj in the high-dimensional feature space f(xi) and f(xj), that
PS based memetic algorithm (MA) is proposed to optimize the is, K(xi,xj)¼/f(xi),f(xj)S. The elegance of using the kernel func-
parameters of SVMs. In the proposed PSO-PS based MA, PSO is tion is that one can deal with feature spaces of arbitrary
responsible for exploration of the search space and the detection dimensionality without having to compute the map f(x) explicitly.
of the potential regions with optimum solution, while a direct Any function that satisfies Mercer’s condition [27] can be used as
search algorithm, pattern search (PS) in this case, is used to the kernel function. The typical examples of kernel function are as
produce an effective exploitation on the potential regions follows:
obtained by PSO. Furthermore, the problem of selecting promising
individuals to experience local refinement is also addressed and
thus a novel probabilistic selection strategy is proposed to keep a Linear kernel: K(xi,xj)¼/xi,xjS
balance between exploration and exploitation. The performance Polynomial kernel: K(xi,xj)¼(g/xi,xjS þr)d, g 40.
of proposed PSO-PS based MA for parameters optimization in Radial basis function (RBF) kernel: Kðxi ,xj Þ ¼
SVMs is justified on several benchmarks against selected estab- 2
expðg99xi xj 99 Þ, g 40:
lished counterparts. Experimental results and comparisons Sigmoid kernel: Kðxi ,xj Þ ¼ tanhðg/xi ,xj S þ rÞ:
demonstrate the effectiveness of the proposed PSO-PS based MA
for parameters optimization of SVMs. where,g, r and d are kernel parameters. The kernel parameter
The rest of the study is organized as follows. Section 2 presents should be carefully chosen as it implicitly defines the structure of
a brief review on SVMs, Section 3 elaborates on the PSO-PS based the high dimensional feature space f(x) and thus controls the
MA proposed in this study, the results with discussions are complexity of the final solution [28]. Generally, among these
reported in Section 4, and finally, we conclude this study in kernel functions, RBF kernel is strongly recommended and widely
Section 5. used for its performance and complexity [2] and thus SVMs with
RBF kernel function is the one studied in this study.
Overall, SVMs are a powerful classifier with strong theoretical
2. Support vector machines foundations and excellent generalization performance. Note that
before implementing the SVMs with RBF kernel, there are two
Consider a binary classification problem involving a set of parameters (penalty parameter C and RBF kernel parameterg)
training dataset {(x1,y1),(x2,y2)...(xn,yn)} CRn  R, where xi is input have to set. Previous studies show that these two parameters play
space, yiA{  1,1} is the labels of the input space xi and n denotes an important role in the success of SVMs. In this study, addressing
the number of the data items in the training set. Based on the the selection of these two parameters, a PSO-PS based memetic
structured risk minimization (SRM) principle [1], SVMs aim to algorithm (MA) is proposed and justified within the context of
generate an optimal hyper-plane to separate the two classes by SVMs with RBF kernel function.
minimizing the regularized training error

1 2
Xn
3. Proposed memetic algorithms
min 99w99 þ C xi
2 i¼1
Memetic algorithms (MAs), first coined by Moscato [20,21],
s:t: yi ð/w,xi Sþ bÞ Z 1xi , i ¼ 1,2:::::n
have come to light as an union of population-based stochastic
xi Z 0, i ¼ 1,2:::::n ð1Þ global search and local based cultural evolution, which are inspired
100 Y. Bao et al. / Neurocomputing 117 (2013) 98–106

by Darwinian principles of natural evolution and Dawkins notion of a t can be described as vti ¼ ovti1 ,. . .,vtiD 4. In this study, the position
meme. The meme is defined as a unit of cultural evolution that is and velocity of one particle have only two dimensions that denote the
capable of local/individual refinements. As designed as hybridization, two parameters (C and g) to be optimized in SVMs. A set of particles
MAs are expected to make full use of the balance between explora- P ¼ fxt1 ,. . .,xtM g is called a swarm (or population).
tion and exploitation of the search space to complement the Initially, the particles in the population are randomly gener-
advantages of population based methods and local based methods. ated in the solution space according to Eqs. (4) and (5),
Nowadays, MAs have revealed their successes with high performance
xid ¼ xmin,d þ r  ðxmax,d xmin,d Þ ð4Þ
and superior robustness across a wide range of problem domains
[29,30]. However, according to the No Free Lunch theory, the vid ¼ vmin,d þr  ðvmax,d vmin,d Þ ð5Þ
hybridization can be more complex and expensive to implement.
Considering the simplicity in implementation of particle swarm where xid and vid are the position value and velocity in the dth
optimizations (PSO) and its proven performance, being able to dimension of the ith particle, respectively. r is a random number
produce good solutions at a low computational cost [16,31–33], this in the range of [0,1]. xmin,d and xmax,d denote the search space of
study proposed a PSO based memetic algorithm with the pattern the dth dimension (that is, the upper and lower limit of the
search (PS) as a local individual learner, to solve the parameters parameters in SVMs), while vmin,d and vmax,d are used to con-
optimization problem in SVMs. The concept of the proposed memetic straint the velocity of the particle to avoid the particle flying
algorithms is illustrated in Fig. 1. In the proposed PSO-PS based MA, outside the search space.
PSO is used for exploring the global search space of parameters, and After the initialization of the population, the algorithm itera-
pattern search is deserved to play the role of local exploitation based tively generates offspring using PSO based operator and under-
on its advantages of simplicity, flexibility and robustness. goes local refinement with pattern search, which is to be
In the following sub-sections, we will explain the implemen- discussed in the following sub-sections.
tation of the proposed PSO-PS MA for parameters optimization in
details. 3.2. Fitness function

3.1. Representation and initialization The objective of optimizing the parameters in a SVMs model is
to maximize the generalization ability or minimize generalization
In our MA, each particle/individual in the population is a potential error of the model. As cross-validation can provide unbiased
solution to the SVMs parameters. And its status is characterized estimation of the generalization error, in this study, we take the
according to its position and velocity. The D-dimensional position for k-fold cross-validation error into consideration. Let kCVMR
the particle i at iteration t can be represented as xti ¼ oxti1 ,. . .,xtiD 4. denotes the mean of the k-folds’ misclassification rate (MR). With
Likewise, the fly velocity (position change) for particle i at iteration the use of cross-validation, kCVMR is deserved to be a criterion for

Begin End

Initialize the population

Return optimal
Evaluate the swarm and particle
initialize the pbest, gbest

Is termination condition Yes


satisfied?
Refinement with pattern search
No
Selected particle
Offspring generation

Is termination
Evaluate the swarm and update condition satisfied?
the pbest, gbest Next
No
Decrease
Evaluate the
Selection of particles to be search step
neighbors of current
refined point

Next Is improvement No
generation Refinement with pattern search obtained?
Yes
Yes

Update gbest Update current point

Replace the particles


and update the pbest

Fig. 1. Flowchart of the proposed PSO-PS based memetic algorithm.


Y. Bao et al. / Neurocomputing 117 (2013) 98–106 101

generalization ability of a model. Hence, kCVMR is used as fitness column of Pk, O denotes the neighborhood of the center point. The
function in this study. The fitness function and errors are shown termination conditions are the maximum iteration is met, or the
as follows: search step gets a predefined small value. To balance the amount
Fitness ¼ kCVMR of computational budget allocated for exploration versus exploi-
tation, D0/8 is empirically selected as the minimum search step.
1Xk
kCVMR ¼ MRj ð6Þ
kj¼1
3.5. A probabilistic selection strategy
where, k is the size of folds. In this study, fivefold cross-validation
is conducted, which is widely used and suggested in [34]. An important issue in MAs to be addressed is which particles
from the population should be selected to undergo local improve-
ment (see ‘‘Selection of particles to be refined’’ procedure in Fig. 1).
3.3. PSO based operator
This selection directly defines the balance of evolution (explora-
tion) and the individual learning (exploitation) under limited
In the proposed PSO-PS based MA, PSO based operators are
computational budget, which is of importance to the success of
used to explore the search space. Particle swarm optimization
MAs. In previous studies, a common way is to apply a local search
(PSO) [35] is a population-based meta-heuristic that simulates
to all newly generated offspring, which suffers from much strong
social behavior such as birds flocking to a promising position to
local exploitation. Another way is to apply local search to particles
achieve precise objectives (e.g., finding food) in a multi-
with a predefined probability pl, which seems too blind and the pl
dimensional space by interacting among them.
is not easy to define [37]. Besides, as [38] suggests that those
To search for the optimal solution, each particle adjusts their
solutions that are in proximity of a promising basin of attraction
flight trajectories by using the following updating equations:
received an extended budget, the randomly selected particles
tþ1
vid ¼ w  vtid þ c1  r 1  ðpid xtid Þ þc2  r 2  ðpgd xtid Þ ð7Þ may need more cost to reach the optimal. Hence, it seems a good
choice that only the particles with the fitness larger than a
xtidþ 1 ¼ xtid þ vtidþ 1 ð8Þ threshold v are selected for local improvement [39], while the
threshold is problem specific and some of the selected particles
where c1 ,c2 A R are acceleration coefficients, w is inertia weight,
may be crowed in the same promising region.
r1 and r2 are random numbers in the range of [0,1]. vtid and xtid
To this end, a probabilistic selection strategy of selecting non-
denote the velocity and position of the ith particle in dth
crowed individuals to undergo refinement is proposed in this
dimension at tth iteration, respectively. pid is the value in dimen-
study. Beginning with the best individual, each individual under-
sion d of the best parameters combination (a particle) found so far
goes local search with the probability pl(x) (in Eq. (10)), a
by particle i. pi ¼/pi1,y,piDS is called personal best (pbest)
individual with better fitness has a more chance to be selected
position. pgd is the value in dimension d of the best parameters
for exploitation. To avoid exploiting the same region simulta-
combination (a particle) found so far in the swarm (P); pg ¼
neously, once an individual is selected to be refined, those
/pg1,y,pgDS is considered as the global best (gbest) position.
individuals crowd around the selected individual (i.e., the Eucli-
Note that each particle takes individual (lbest) and social (gbest)
dean distance is less than or equal to r) is eliminated from the
information into account for updating its velocity and position.
population. The only parameter in the proposed strategy is r,
In the search space, particles track the individual’s best values
which denotes the radius of the exploitation region. The r is
and the best global values. The process is terminated if the
selected empirically in this study, and detail of this strategy is
number of iteration reaches the pre-determined maximum num-
illustrated in Fig. 3.
ber of iteration.
Formally, the selection probability pl(x) of each solution x in
the current population P is specified by the following roulette-
3.4. Refinement with pattern search wheel selection scheme with the linear scaling

In the proposed PSO-PS based MA, pattern search is employed f ðPÞf ðxÞ
pl ðxÞ ¼ P max  ð10Þ
to conduct exploitation of the parameters solution space. Pattern f max ðPÞf ðyÞ
yAP
search (PS) is a kind of numerical optimization methods that do
not require the gradient of the problem to be optimized. It is a where f max ðPÞ is the maximum (worse) fitness value among the
simple effective optimization technique suitable for optimizing a current population P.
wide range of objective functions. It does not require derivatives
but direct function evaluation, making the method suitable for
parameters optimization of SVMs with the validation error mini- 4. Experimental results and discussions
mized. The convergence to local minima for constrained problems
as well as unconstrained problems has been proven in [36]. It 4.1. Experimental setup
investigates nearest neighbor parameter vectors of the current
point, and tries to find a better move. If all neighbor points fail to In this study, we conduct two experiments to justify the
produce a decrease, then the search step is reduced. This search proposed PSO-PS based MA as well as the proposed probabilistic
stops until the search step gets sufficiently small, which ensuring selection strategy. The first experiment aims to investigate the
the convergence to a local minimum. The pattern search is based performance of the four variants of the proposed PSO-PS based
on a pattern Pk that defines the neighborhood of current points. MA with different selection strategies and thus the best variant is
A well often used pattern is five-point unit-size rood pattern determined. Then, the best selected variant is employed to
which can be represented by the generating matrix Pk in Eq. (9). compare against the established counterparts in literature in the
  second experiment.
1 0 1 0 0
Pk ¼ ð9Þ LibSVM (Version 2.91) [34] is used for the implementation of
0 1 0 1 0
SVMs. PSO and several variants of proposed MA are coded in
The procedure of pattern search is outlined in Fig. 2. D0 MATLAB 2009b using computer with Intel Core 2 Duo CPU T5750,
denotes the default search step of PS, D is a search step, pk is a 2G RAM. As suggested by Hsu et al. [2], the search space of the
102 Y. Bao et al. / Neurocomputing 117 (2013) 98–106

Fig. 2. Pseudocode of pattern search for local refinement.

Algorithm 2. A Probabilistic Selection Strategy

Input: current population P, constant radius r


Output: a set of selected individuals that undergo local refinement Ω;

Begin
Initialize : P’ as a copy of P; Ω as a null set ;
While (P’ is not NULL)
Get the best fitness individual x belongs to P’, i.e., x = arg min{f (x) | x P'};
Calculate pl (x) according to Eq.(10);

If pl (x) ≥ rand (0,1)


Ω = Ω ∪{x};
Remove x from P’ ;
For each individual y in P’
If d (x,y) ≤ r , then removey fromP’;
End For
End If
End While
End

Fig. 3. Pseudocode of proposed probabilistic selection strategy.

parameters is defined as an exponentially growing space: with training and testing sets (about 60%:40%) describing the
X ¼ log2 C; Y ¼ log2 g and  10 rXr10;  10 rY r10. Through binary classification problem. The dataset in each group has
initial experiments, the parameters in PSO and MA are set as already been normalized to zero mean and unit standard devia-
follows. The acceleration coefficients c1 and c2 both equals to 2. tion. For the dataset in each group, the training set is used to
For w, a time varying inertia weight linearly decreasing from select the optimal parameters in SVM based on fivefold cross-
1.2 to 0.8 is employed. The default search step (D) in pattern validation and establish the classifier, and then the test set is used
search is set to 1. The radius of the exploitation region (r) used in to assess the classifier with the optimal parameters. Table 1
the probabilistic selection strategy equals to 2. summarizes the general information of these datasets.
To test the performance of the proposed PSO-PS based meme-
tic algorithms, computational experiments are carried out against 4.2. Experiment I
some well-studied benchmarks. In this study, thirteen commonly
used benchmark datasets from IDA Benchmark Repository1 are In the proposed PSO-PS based MA, a pattern search is inte-
used. These datasets, having been preprocessed by [40], consist of grated into PSO. So it is worth to evaluate the influence of pattern
100 random groups (20 in the case of image and splice datasets) search. Besides, in order to establish an efficient and robust MA,
the selection of particles to be locally improved plays an impor-
tant role. Hence, in the first experiment, a single PSO and four
1
http://www.raetschlab.org/Members/raetsch/benchmark/. variants of proposed MA with different selection strategies are
Y. Bao et al. / Neurocomputing 117 (2013) 98–106 103

Table 1

1681
1681
1681
1681
1681
1681
1681
6
]eva
Details of the benchmark datasets.

Datasets No. of No. of training No. of testing No. of

25.97
23.67
34.25
24.33

23.19
4.33
error
attributes set set groups

11.9
GS

19
Banana 2 400 4900 100
Breast 9 200 77 100

1109 7 115.3

1055.2 7 80.6
1079.7 7 91.5
1029.7 7 75.5

1110.8 7 94.4

2.5 7 2.17
976.8 7 90.7
1060.2 7 90.7
cancer
Diabetis 8 468 300 100
Flare solar 9 666 400 100
German 20 700 300 100

]eva
Heart 13 170 100 100
Image 18 1300 1010 20
Ringnorm 20 400 7000 100

10.637 0.48

23.477 1.98

24.017 2.26
16.137 2.45
21.92 7 2.29
1.087 1.25
24.697 4.75

32.567 1.8
Splice 60 1000 2175 20
Thyroid 5 140 75 100
Titanic 3 150 2051 100

error
MA4
Twonorm 20 400 7000 100
Waveform 21 400 4600 100

1165.87 126.7

1184.37 103.3
1140.5 7 91.8

1109.5 7 98.1
994.4 7 93.6
1117.17 93.6
1108 7 70.3

4.57 3.2
considered. The following abbreviations represent the algorithms
considered in this sub-section.

]eva
PSO: a single PSO without local search;

24.91 7 4.72
23.477 1.98

24.36 7 2.43
16.18 7 2.87
22.06 7 2.42
2.087 1.75
10.757 0.52

32.71 7 2.01
MA1: all newly generated particles are refined with PS;
MA2: PS is applied to each new particle with a predefined

error
MA3
probability pl ¼0.1;
MA3: PS is only applied to the two best fitness particles;
MA4: PS is applied to the selected particles by probabilistic

1135.87 119.4

1128.67 114.4
1093.7 7 109.5

1093.9 7 101.9
1047.6 7 89.6
1031.2 7 73.6

1126.67 105
selection strategy proposed in Section 3.5;

4.2 7 3.8
GS: a grid search method on the search space, with the search
step equal to 0.5.
]eva

This experiment is conducted on the first groups of the top six


datasets (Banana, Breast cancer, Diabetis, Flare solar, German and
24.59 72.21

25.49 72.73
16.74 73.26
22.78 72.65
11.05 70.68
25.65 74.80

33.18 72.20

3.83 73.50
Heart). The population size for PSO and each MA is fixed to 15. The
stopping criterions of PSO are set as follows: the number of
error
MA2

iterations reached 100 or there is no improvement in the fitness


for 30 consecutive iterations. To compare the algorithms fairly,
each MA stops when the number of the fitness evaluations
1059.1 790.5
1071.5 787.7
1153.6 793.5
985.3 799.7
1058.6 7101

2.8 72.3

reaches the maximum value that equals to 1500, or there is no


997.1 765

1088.3 796

improvement in the fitness value for 450 consecutive iterations.


Besides, grid search (GS) evaluates the fitness 1681 times by
]eva

sampling 41  41 points in the parameters’ search space. For the


purpose of reducing statistical errors, each benchmark is inde-
pendently simulated 30 times, and the error rates and numbers of
25.58 7 4.87

16.85 7 3.26
22.81 7 2.61
11.037 0.68

24.41 7 2.30
33.31 7 2.06
25.66 7 2.50

3.837 3.50

fitness evaluations are collected for comparison.


Comparison of the variants of the proposed MA with PSO and GS.

Table 2 describes the results obtained with the above algo-


error
MA1

rithms. The column of ‘error’ shows the mean and standard


deviation of the error rates (%) for 30 times. The ‘‘]eva’’ gives
the mean and standard deviation of the numbers of fitness
987 7118.5

1061 7103.5
1026.5 794.5

981.8 799.8
901.5 7108

evaluations. The ‘‘Ave’’ row shows the average error rate (%) and
1 73.5
921.5 787

993.1 787

average number of fitness evaluations on all datasets. The last


row ‘‘AveRank’’ reports the average of the rank computed through
]eva

the Friedman test for each algorithm on all datasets.


From Table 2, we can see that, no matter which kind of
strategy is selected in MA, the performance in term of mean
26.73 7 5.16
25.31 7 2.35
33.50 7 2.27
28.03 7 2.96
11.50 7 0.75

23.74 7 2.90
5.007 5.00
17.34 7 3.9

error rate is always better than that obtained by single PSO, which
demonstrates the effectiveness of incorporating the local refine-
error
PSO

ment with PS into PSO. Besides, from the view of standard


deviation, it can be seen that the performance of proposed PSO-
PS based MA is more stable than single PSO. As pattern search is
Breast cancer

employed to refine the individuals, the number of fitness evalua-


Flare solar

AveRank
Data sets

Diabetis

German
Banana

tions of each MA is larger than that of the PSO, while it can be


Table 2

Heart
Ave

regarded as the cost to avoid premature convergence and conduct


finely local refinement to obtain better performances. Actually,
104 Y. Bao et al. / Neurocomputing 117 (2013) 98–106

the average number of fitness evaluation of MA is larger than that algorithm. Table 4 and Fig. 4 show the comparison of mean
of PSO by only at most 13% (e.g., MA3). Finally, compared with GS, testing error rates (%) and the standard deviations (%) obtained by
both PSO and MA achieve better results than GS in terms of test the proposed PSO-PS based MA and other algorithms proposed in
error rates and number of fitness evaluations. The facts above previous studies [12,13,40]. For convenience of comparison, the
imply that among the six algorithms investigated, four variants of best errors for every dataset are bolded. The first column reports
proposed MA tend to achieve the smallest error rates with a little the results obtained by two stages grid search based on fivefold
more fitness evaluations than that of PSO but still much less than cross-validation from [40]. The results of second and third
that of grid search. column, both reported in [13] are, respectively, obtained by a
By comparing four variants of proposed MA, several results can radius-margin method and a fast-efficient strategy. ‘‘L1-CB’’ and
be observed. It is interesting to note that MA1, where PS is applied ‘‘L2-CB’’, respectively, show the results of hybridization of CLPSO
to all newly generated individuals, does not achieve as good and BFGS by minimizing the L1-SVMs and L2-SVMs general-
results as expected. This may be due to the premature conver- ization errors in [12]. Among them, although no method can
gence of the solutions because of the strong local refinement is outperform all the others on each dataset, the proposed PSO-PS
applied to the population. MA2 performs comparably with the based MA has lowest error on more than a half of the datasets.
MA1, which indicates that the randomly selected strategy by From the perspective of average rank, the proposed MA gains the
probability pl is also failed. MA3 gains better results than MA1 rank 1.5471.88, which is the best one among the six methods
and MA2, and even has competitive results with MA4 on some investigated. Finally, the Wilcoxon signed rank test is used to
datasets (e.g., Diabetis dataset). Finally, MA4 always performs the verify the performance of the proposed approach against ‘‘CV’’ for
best in terms of the error rate on each dataset and the average each dataset2 . The results show that the proposed MA is
error rate on all datasets, while the number of fitness evaluations significantly superior to ‘‘CV’’ in terms of error rate on eight
does not increase too much than that of the other three variants datasets. To be concluded, those results imply our proposed
of MA. All the above results suggest the success of our proposed approach is competitive with other methods and even more
probabilistic selection strategy used in MA4, highlighted by effective and robust than others, which indicate that our proposed
addressing the balance of exploration and exploitation in the PSO-PS based MA is a practical method for parameters optimiza-
proposed MA. Hence, we can conclude that the MA4 with the tion in SVMs.
proposed probabilistic selection strategy is effective and robust
for solving parameters optimization in SVMs modeling.
In addition, the average computational time of each method is 5. Conclusions
given in Table 3. It is obvious that all of the four variants of
proposed MA consume more time than the single PSO in almost The parameters of SVMs are of great importance to the
all the datasets, which is mainly due to refinement with pattern success of the SVMs. To optimize the parameters properly, this
search. Among the four variants of the proposed MA with study proposed a PSO and pattern search (PS) based memetic
different selection strategy, the MA4 consumes a little more time algorithm. In the proposed PSO-PS based MA, PSO was used to
than the other three variants. Since the accuracy of a SVM model explore the search space and detect the potential regions with
is often more important and the consumed time of MA4 is still optimum solutions, while pattern search (PS) was used to
much less than that of the grid search, MA4 is selected as the best conduct an effective refinement on the potential regions
one to perform a further comparison in sub-section 4.3 with the obtained by PSO. To balance the exploration of PSO and exploi-
established counterparts in the literature. tation of PS, a probabilistic selection strategy was also intro-
duced. The performance of proposed PSO-PS based MA was
4.3. Experiment II confirmed through experiments on several benchmark datasets.
The results can be summarized as: (1) with different kinds of
Furthermore, following the same experimental settings as in selection strategy, the variants of the proposed MA are all
[12,13,40], the second experiment will directly compare the superior to single PSO and GS in terms of error rates and
performance of the best variant as MA4 of the proposed PSO-PS standard deviations, which confirms the success of refinement
based MA with established algorithms in previous studies. with pattern search. (2) Although the number of fitness evalua-
The experiment is conducted as follows. At first, on each of the tions and computational time of each MA are slightly inferior to
first five groups of every benchmark dataset, the parameters are PSO, each MA consumes much less fitness evaluations and
optimized using the proposed MA4, and then the final parameters computational time than grid search. (3) Among the variants
are computed as the median of obtained five estimations. At last, of the proposed MA, the MA with our proposed probabilistic
the mean error rate and standard deviation of SVMs with such selection strategy gains the best performance in terms of error
parameters on all 100 (20 for image and splice datasets) groups of rates and standard deviations. (4) Compared with established
each dataset are taken as the results of the corresponding counterparts in previous studies, the proposed PSO-PS based MA
using proposed probabilistic selection strategy is capable of
yielding higher and more robust performance than others.
Hence, the proposed PSO-PS based MA can be a promising
Table 3
Average computational time in seconds of each method.
alternative for SVMs parameters optimization.
One should note, however, that this study only focus on the
Data sets PSO MA1 MA2 MA3 MA4 GS PSO-PS based MA for parameters optimization in SVMs, some
other varieties of PSO and evolutionary algorithms can also be
Banana 126.35 129.89 147.81 145.76 149.57 187.8
used in the proposed MA. Besides, as the rapid development of
Breast cancer 56.71 59.69 62.48 62.01 67.25 74.5
Diabetis 275.03 288.73 264.02 278.56 311.61 358.13 hardware, the parallelization of the proposed MA to make full use
Flare solar 405.88 402.13 447.31 474.37 432.12 564.97
German 687.53 700.78 764.22 697.11 787.2 994.01
2
Heart 34.54 35.94 40.34 35.21 37.07 53.64 Only ‘‘CV’’ and ‘‘Proposed MA’’ are compared using Wilcoxon signed rank
Ave 264.34 269.53 287.70 282.17 297.47 372.18 test because the detail results of others approaches on each dataset are not
AveRank 1.33 2.50 3.67 3.00 4.50 6.00 available. The detail results of ‘‘CV’’ on each dataset are available at http://www.
raetschlab.org/Members/raetsch/benchmark/.
Y. Bao et al. / Neurocomputing 117 (2013) 98–106 105

Table 4
Comparison with established algorithms.

Datasets CV [40] RM [13] FE [13] L1-CB [12] L2-CB [12] Proposed MA

Banana 11.53 7 0.66 10.48 7 0.40 10.68 7 0.50 11.65 7 5.90 10.44 7 0.46 10.35 70.40nnn
Breast Cancer 26.047 4.74 27.83 7 4.62 24.97 7 4.62 28.75 7 4.61 26.03 7 4.60 24.37 74.61n
Diabetis 25.53 7 1.73 34.56 7 2.17 23.167 1.65 25.36 7 2.95 23.50 7 1.66 23.17 71.35nnn
Flare Solar 32.43 7 1.82 35.51 7 1.65 32.37 7 1.78 33.15 7 1.83 33.33 7 1.79 32.07 71.73n
German 23.61 7 2.07 28.84 7 2.03 23.417 2.10 29.26 7 2.88 24.41 7 2.13 23.72 72.05
Heart 15.957 3.26 21.84 7 3.70 16.62 7 3.20 16.94 7 3.71 16.02 7 3.26 15.95 72.11
Image 2.96 7 0.60 4.19 7 0.70 5.55 7 0.60 3.347 0.71 2.97 7 0.45 2.90 70.65
Ringnorm 1.66 7 0.12 1.657 0.11 2.03 7 0.25 1.687 0.13 1.68 7 0.12 1.66 70.12
Splice 10.887 0.66 10.94 7 0.70 11.11 7 0.66 10.997 0.74 10.847 0.74 10.89 70.63
Thyroid 4.807 2.19 4.017 2.18 4.4 7 2.40 4.557 2.10 4.207 2.03 3.4 72.07nnn
Titanic 22.42 7 1.02 23.04 7 1.18 22.87 7 1.12 23.51 7 2.92 22.89 7 1.15 21.58 71.06nnn
Twonorm 2.96 7 0.23 3.077 0.246 2.69 7 0.19 2.907 0.27 2.64 7 0.20 2.49 70.13nn
Waveform 9.88 7 0.43 11.10 7 0.50 107 0.39 9.787 0.48 9.58 7 0.37 9.30 70.37nnn
Ave 14.67 7 1.50 16.70 7 1.55 14.60 7 1.50 15.53 7 2.25 14.50 7 1.46 13.99 71.33
AveRank 3.38 7 3.73 4.62 7 3.69 3.62 7 3.42 4.817 5.31 3.047 2.96 1.54 71.88

Bold values are the best ones for each dataset.


n
Statistically significant at the level of 0.05 when compared to CV.
nn
Statistically significant at the level of 0.01 when compared to CV.
nnn
Statistically significant at the level of 0.001 when compared to CV.

40 6
CV CV
35 RM RM
EF 5 EF
L1-CB L1-CB
30
L2-CB L2-CB
Proposed MA 4 Proposed MA
25

20 3

15
2
10

1
5

0 0
1
2
3
4
5
6
7
8
9
10
11
12
13

1
2
3
4
5
6
7
8
9
10
11
12
13

Fig. 4. Results on 13 benchmark datasets: (a) mean of error rates; and (b) standard deviation of error rates.

of the increasingly vast computing resources (e.g., multiple cores, References


GPU, cloud computing) is another interesting topic. Other topics
include investigating adaptive parameters setting of the proposed [1] V.N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York,
algorithms, comparing more extensively with other existing EAs 1995.
[2] C.W. Hsu, C.C. Chang, C.J. Lin, A Practical Guide to Support Vector Classifica-
or MAs, and developing more efficient memetic algorithms in
tion, Department of Computer Science, National Taiwan University, 2003.
dealing with other designing issues, e.g., population management [3] G. Moore, C. Bergeron, K.P. Bennett, Model selection for primal SVM, Mach.
and intensity of refinement [29]. Future work will be on the Learn. 85 (2011) 175–208.
research of the above cases and applications of the proposed [4] O. Chapelle, V. Vapnik, O. Bousqet, S. Mukherjee, Choosing multiple parameters for
support vector machines, Mach. Learn. 46 (1) (2002) 131–159.
approach in practical problems. [5] S.S. Keerthi, Efficient tuning of SVM hyperparameters using radius/margin
bound and iterative algorithms, IEEE Trans. Neural Networ. 13 (5) (2002)
1225–1229.
[6] K. Duan, S.S. Keerthi, A.N. Poo, Evaluation of simple performance measures
Acknowledgment
for tuning SVM hyperparameters, Neurocomputing 51 (2003) 41–59.
[7] K.M. Chung, W.C. Kao, C.L. Sun, L.L. Wang, C.J. Lin, Radius margin bounds for
The authors would like to thank the anonymous reviewers for support vector machines with the RBF kernel, Neural Comput. 15 (2003)
2643–2681.
their valuable suggestions and constructive comments. This work [8] N.E. Ayat, M. Cheriet, C.Y. Suen, Automatic model selection for the optimiza-
was supported by the Natural Science Foundation of China (Grant tion of SVM kernels, Patt. Recogn. 38 (10) (2005) 1733–1745.
No. 70771042) and the Fundamental Research Funds for the [9] M.M. Adankon, M. Cheriet, Optimizing resources in model selection for
support vector machine, Patt. Recogn. 40 (3) (2007) 953–963.
Central Universities (2012QN208-HUST) and a grant from the [10] L. Wang, P. Xue, K.L. Chan, Two criteria for model selection in multiclass
Modern Information Management Research Center at Huazhong support vector machines, IEEE Trans. Syst. Man Cybern. Part B—Cybernetics
University of Science and Technology. 38 (6) (2008) 1432–1448.
106 Y. Bao et al. / Neurocomputing 117 (2013) 98–106

[11] M.M. Adankon, M. Cheriet, Model selection for the LS-SVM. Application to [35] J. Kennedy, R.C. Eberhart, Particle swarm optimization, in: Proceedings of the
handwriting recognition, Patt. Recogn. 42 (12) (2009) 3264–3270. IEEE International Conference on Neural Networks, 4 (1995) pp. 1942–1948.
[12] S. Li, M. Tan, Tuning SVM parameters by using a hybrid CLPSO–BFGS [36] E.D. Dolan, R.M. Lewis, V. Torczon, On the local convergence of pattern
algorithm, Neurocomputing 73 (2010) 2089–2096. search, SIAM J. Optim. 14 (2) (2003) 567–583.
[13] Z.B. Xu, M.W. Dai, D.Y. Meng, Fast and efficient strategies for model selection [37] B. Liu, L. Wang, Y.H. Jin, An effective PSO-based memetic algorithm for flow shop
of Gaussian support vector machine, IEEE Trans. Syst. Man Cybern. Part scheduling, IEEE Trans. Syst. Man Cybern. Part B–Cybernetics 37 (1) (2007) 18–27.
B–Cybernetics 39 (5) (2009) 1292–1307. [38] M. Lozano, F. Herrera, N. Krasnogor, D. Molina, Real-coded memetic
[14] F. Imbault, K. Lebart, A stochastic optimization approach for parameter algorithms with crossover hill-climbing, Evol. Comput. 12 (3) (2004) 273–302.
tuning of Support Vector Machines, in: J. Kittler, M. Petrou, M. Nixon (Eds.), [39] M.L. Tang, X. Yao, A memetic algorithm for VLSI floorplanning, IEEE Trans.
Proceedings of the 17th International Conference on Pattern Recognition, vol Syst. Man Cybern. Part B–Cybernetics 37 (1) (2007) 62–69.
4, IEEE Computer Soc, Los Alamitos, 2004, pp. 597–600. [40] G. Rätsch, T. Onoda, K.R. Müller, Soft Margins for AdaBoost, Mach. Learn. 42
[15] X.C. Guo, J.H. Yang, C.G. Wu, C.Y. Wang, Y.C. Liang, A novel LS-SVMs hyper- (3) (2001) 287–320.
parameter selection based on particle swarm optimization, Neurocomputing
71 (2008) 3211–3215.
[16] H.J. Escalante, M. Montes, L.E. Sucar, Particle swarm model selection, J. Mach. Yukun Bao is an Associate Professor at the Depart-
Learn. Res. 10 (2009) 405–440, February. ment of Management Science and Information Sys-
[17] X. Zhang, X. Chen, Z. He, An ACO-based algorithm for parameter optimization tems, School of Management, Huazhong University of
of support vector machines, Expert Syst. Appl. 37 (9) (2010) 6618–6628. Science and Technology, PR China. He received his Bsc,
[18] T. Gomes, R. Prudencio, C. Soares, A. Rossi, A. Carvalho, Combining meta- Msc, and PhD in Management Science and Engineering
learning and search techniques to select parameters for support vector from Huazhong University of Science and Technology,
machines, Neurocomputing 75 (2012) 3–13. PR China, in 1996, 1999 and 2002, respectively. He has
[19] M.N. Kapp, R. Sabourin, P. Maupin, A dynamic model selection strategy for been principal investigators for two research projects
support vector machine classifiers, Appl. Soft Comput. 12 (8) (2012) 2550–2565. funded by Natural Science Foundation of China and
[20] P. Moscato, M.G. Norman, A ‘memetic’ approach for the traveling salesman has served as a referee of paper review for several IEEE
problem implementation of a computational ecology for combinatorial journals and international journals and a PC member
optimization on message-passing systems, Parallel Comput. Transputer Appl. for several international academic conferences. His
(1992) 177–186. research interests are time series modeling and fore-
[21] P. Moscato, On evolution search optimization genetic algorithms and martial casting, business intelligence and data mining.
arts: towards memetic algorithms, caltech concurrent computation program,
C3P Report, 826 (1989).
[22] J. Tang, M.H. Lim, Y.S. Ong, Diversity-adaptive parallel memetic algorithm for
solving large scale combinatorial optimization problems, Soft Comput. 11 (9)
(2007) 873–888. Zhongyi Hu received BS degree in Information System
[23] S. Areibi, Z. Yang, Effective memetic algorithms for VLSI design ¼genetic in 2009 from Harbin Institute of Technology at Weihai,
algorithmsþ local search plus multi-level clustering, Evol. Comput. 12 (3) China. He received his MS degree in Management
(2004) 327–353. Science and Engineering in 2011 from Huazhong Uni-
[24] A. Elhossini, S. Areibi, R. Dony, Strength pareto particle swarm optimization versity of Science and Technology, China. Currently, he
and hybrid EA-PSO for multi-objective optimization, Evol. Comput. 18 (1) is working toward the PhD degree in Management
(2010) 127–156. Science and Engineering, Huazhong University of
[25] A. Lara, G. Sanchez, C. Coello, O. Schuetze, HCS: a new local search strategy Science and Technology, China. His research interests
for memetic multiobjective evolutionary algorithms, IEEE Trans. Evol. include support vector machines, swarm intelligence,
Comput. 14 (1) (2010) 112–132. memetic algorithms, and time series forecasting.
[26] Q.H. Nguyen, Y.S. Ong, M.H. Lim, A probabilistic memetic framework, IEEE
Trans. Evol. Comput. 13 (3) (2009) 604–623.
[27] V.N. Vapnik, Statistical Learning Theory, John Wiley&Sons, Inc, New York, 1998.
[28] F.E.H. Tay, L.J. Cao, Application of support vector machines in financial time
series forecasting, Omega 29 (4) (2001) 309–317.
Tao Xiong received his Bsc and Msc in Management
[29] X. Chen, Y. Ong, M. Lim, K. Tan, A multi-facet survey on memetic computa-
Science and Engineering from Huazhong University of
tion, IEEE Trans. Evol. Comput. 15 (5) (2011) 591–607.
Science and Technology, PR China, in 2008 and 2010,
[30] F. Neri, C. Cotta, Memetic algorithms and memetic computing optimization: a
respectively. Currently, he is working toward the PhD
literature review, Swarm Evol. Comput. 2 (2012) 1–14.
degree in Management Science and Engineering, Huaz-
[31] X. Hu, R.C. Eberhart, Y. Shi, Engineering optimization with particle swarm, in:
hong University of Science and Technology, China. His
Proceedings of the 2003 IEEE Swarm Intelligence Symposium, 2003, pp. 53–57.
research interests include multi-step-ahead time ser-
[32] M. Reyes, C. Coello, Multi-objective particle swarm optimizers: a survey of
ies forecasting, interval data analysis, and computa-
the state-of-the-art, Int. J. Comput. Intell. Res. 2 (3) (2006) 287–308.
tional intelligence.
[33] J. Kennedy, R.C. Eberhart, Y. Shi, Swarm Intelligence, Morgan Kaufmann
Publishers, San Francisco, 2001.
[34] C. Chang, C. Lin, LIBSVM: a library for support vector machines, 2001.
Software available from /http://www.csie.ntu.edu.tw/  cjlin/libsvmS.

You might also like