Tumor Biol.
(2013) 34:1547–1552
DOI 10.1007/s13277-013-0683-5
 RESEARCH ARTICLE
A support vector machine model for predicting non-sentinel
lymph node status in patients with sentinel lymph node
positive breast cancer
Xiaowen Ding & Shangnao Xie & Jie Chen & Wenju Mo &
Hongjian Yang
Received: 1 November 2012 / Accepted: 28 January 2013 / Published online: 10 February 2013
# International Society of Oncology and BioMarkers (ISOBM) 2013
Abstract This study aimed to investigate the accuracy and             Keywords Breast cancer . SLNB . Prediction . Data model
feasibility of support vector machine (SVM) modeling in
predicting non-sentinel lymph node (NSLN) status in patients
with SLN-positive breast cancer. Clinicopathological data             Introduction
were collected from 201 cases with sentinel lymph node
biopsy breast cancer and included patient age, tumor size,            Breast cancer is the most common solid tumor in women
histological type and grade, vascular invasion, estrogen re-          globally [1]. While the incidence of breast cancer in
ceptor status, progesterone receptor status, CerbB2 status,           Western countries has stabilized and even decreased, the
size and number of positive SLNs, number of negative                  incidence rates in Asia, including China, have been in-
SLNs, and positive SLN membrane invasion. Feature vector              creasing rapidly [2–4]. Several large, randomized prospec-
selection was based on a combination of statistical filtration        tive trials have shown that sentinel lymph node biopsy
and model-dependent screening. The arbitrary combination              (SLNB) substantially reduces the morbidity associated
with the smallest p value for SVM input was selected, the             with axillary surgery compared with formal axillary lymph
predicative results of the model were evaluated by a 10-fold          node dissection (ALND) [5–11]. Moreover, the National
cross validation, and a training model was established. Using         Surgical Adjuvant Breast and Bowel Project B-32 trial has
SLN-positive patients as a double-blind test set, 85 patients         demonstrated that when the sentinel node reveals no evi-
were input into the model to analyze its sensitivity and              dence of metastatic disease, then no further ALND is
specificity. The combination with the highest cross-                  required [12]. Recently, the results of the American
validation accuracy was selected for the SVM model and                College of Surgeons Oncology Group Z0011 trial have
consisted of the following: the number and size of positive           challenged the notion that all patients with metastases to
SLNs, the number of negative SLNs, and the membrane                   the sentinel node require ALND. The results of this trial
invasion of positive SLNs. The training accuracy of the               suggest that in selected sentinel node-positive patients,
model established with the four variables was 92 %, and               ALND can be potentially avoided [13]. However, a large
its cross-validation veracity was 87.6 %. The accuracy of             number of studies show that the lymph node metastases in
an 85-patient double-blind test of the SVM model was                  approximately 40–70 % of SLN-positive breast cancer
91.8 %. In conclusion, this SVM model is an accurate and              patients are confined to the SLN [14–17]. ALND in this
feasible method for the prediction of NSLN status in                  cohort of patients would not, therefore, clear residual
SLN-positive breast cancer and is conducive to guide                  metastatic lesions in the axilla or provide further staging
clinical treatment.                                                   information. Furthermore, ALND triples the risk of post-
                                                                      operative complications (e.g., upper limb lymphedema,
                                                                      sensory, and functional disturbance), affects the quality
X. Ding (*) : S. Xie : J. Chen : W. Mo : H. Yang
                                                                      of life of the patient, and greatly increases medical
Department of Breast Surgery, Zhejiang Cancer Hospital,
Hangzhou 310022, China                                                expenses, thereby deeming it unadvisable unless neces-
e-mail: dingxwhz@yeah.net                                             sary. It is obvious that an accurate assessment of non-
1548                                                                                             Tumor Biol. (2013) 34:1547–1552
sentinel lymph node (NSLN) status in patients with SLN-         Table 1 Clinicopathological features of modeling cases
positive breast cancer is of great clinical significance.                                      NSLN−, n (%)         NSLN+, n (%)
   For this purpose, models have been developed to predict      Patient characteristics and
NSLN status in patients with SLN-positive breast cancer.        tumor pathology                91 cases             110 cases
There are several commonly used assessment models.
                                                                Median age (years)            46 (29–78)            49 (25–69)
However, each of these models has limitations [18–20].
                                                                ER status
Support vector machines (SVM), developed by Vapnik and
                                                                 Positive                     25 (27.5)             39 (35.5)
Burges, are considered to be superior over traditional linear
                                                                 Negative                     58 (63.7)             65 (59.1)
approaches due to their capability to represent non-linear
                                                                 Unknown                      8 (8.8)               6 (5.5)
features within data [21]. Therefore, we established and
                                                                PR status
validated a discriminative SVM model, based on previous
                                                                 Positive                     33 (36.3)             48 (43.6)
SLNB cases in our hospital, to predict the NSLN status of
                                                                 Negative                     49 (53.8)             56 (50.9)
SLN-positive patients.
                                                                 Unknown                      8 (8.8)               6 (5.5)
                                                                CerbB2 status
                                                                 Positive                     17 (18.7)             30 (27.3)
Patients and methods
                                                                 Negative                     34 (37.4)             37 (33.6)
                                                                 Unknown                      40 (44.0)             43 (39.1)
From 2003 to 2010, 925 samples of non-selective SLNB
                                                                Vascular invasion
breast cancer were collected at our hospital, 201 of them met
                                                                 +                            40 (44.0)             95 (86.4)
our modeling standards. The inclusion criteria included: no
                                                                 −                            51 (56.0)             15 (13.6)
preoperative systematic treatment, such as neoadjuvant che-
                                                                Tumor size
motherapy, was received; a successful SLNB surgery was
                                                                 T1                           47 (51.6)             31 (28.2)
performed; SLNs were positive; and an ALND was com-
                                                                 T2                           41 (45.1)             76 (69.1)
pleted. The exclusion criteria included the following: the
                                                                 T3                           3 (3.3)               3 (2.7)
reception of preoperative systematic treatment such as neo-
                                                                Multifocality
adjuvant chemotherapy, negative SLNs, or the lack of an
                                                                 +                            1 (1.1)               2 (1.8)
ALND. Two-milliliter methylene blue was injected subcuta-
                                                                 −                            90 (98.9)             108 (98.2)
neously into the periareolar area for lymphatic mapping.
                                                                Histological type and grade
SLNB procedure was started 5 min after injection, and blue-
                                                                 IDC grade 1                  5 (5.5)               4 (3.6)
dyed lymph nodes with definite blue-dyed afferent drainage
                                                                 IDC grade 2                  59 (64.8)             88 (80.0)
lymphatic vessels were considered sentinel nodes. Additional
                                                                 IDC grade 3                  7 (7.7)               7 (6.4)
palpable-enlarged lymph nodes surrounding the blue- dyed
                                                                 ILC                          5 (5.5)               4 (3.6)
node or nodes, if existing, were also taken out. SLNB was
                                                                 Mixed type                   15 (16.5)             7 (6.4)
aborted once no blue-dyed nodes were identified 20 min
                                                                Size of positive SLNs
after injection. Results of SLNB were based on the final
                                                                 <0.5 cm                      43 (47.3)             3 (2.7)
pathological report originating from HE-stained sections
                                                                 0.5–1.0 cm                   44 (48.4)             79 (71.8)
irrespective of whether frozen sections performed or not.
                                                                 1.0–1.5 cm                   4 (4.4)               27 (24.5)
Clinicopathological data of the 201 patients were collected      >1.5 cm                      0                     1 (0.9)
and included the patient’s age, tumor size, histological        Number of positive SLNs
type, histological grade, vascular invasion, estrogen recep-     1                            68 (74.7)             51   (46.4)
tor (ER) status, progesterone receptor (PR) status, CerbB2       2                            19 (20.9)             37   (33.6)
status, size of positive SLNs, number of positive SLNs,          3                            4 (4.4)               12   (10.9)
multifocality, number of negative SLNs, and positive SLN         >3                           0                     10   (9.1)
membrane invasion.                                              Positive SLN membrane invasion
    We established a discriminative model according to           +                            34 (37.4)             97 (88.2)
NSLN status (positive or negative) using an SVM method.          −                            57 (62.6)             13 (11.8)
We employed the leave-one-out cross-validation method to        Number of positive SLNs
evaluate the discrimination results of the model. SVM adop-      0                            11 (12.1)             68 (61.8)
ted a radial base kernel, gamma was set at 0.6, and penalty      1                            19 (20.9)             37 (33.6)
functions (C) were set at 100. Feature vector selection was      2                            36 (39.6)             3 (2.7)
based on a combination of statistical filtration and model-      3                            14 (15.4)             1 (0.9)
dependent screening. The Wilcoxon rank sum test was per-         >3                           11 (12.1)             1 (0.9)
formed on each peak of the mass-to-charge ratio, the
Tumor Biol. (2013) 34:1547–1552                                                                                                   1549
Table 2 Clinicopathological features of the double-blind test cases   arbitrary combination with the smallest p value for SVM
                                            NSLN−         NSLN+       input was selected, the predicative results of the model were
Patient characteristics and tumor pathology 43 cases      42 cases    evaluated by a 10-fold cross-validation, and the combination
                                                                      with the highest Youden index as predicted by the SVM was
Median age (years)                           48 (27–71) 47 (29–73)    chosen as the basis for establishment of the training model.
ER status                                                             To make an accurate assessment of the model, we re-
 Positive                                    15 (34.9)    15 (35.7)   enrolled 85 patients in accordance with the modeling stand-
 Negative                                    28 (65.1)    26 (61.9)   ards. These patients were input into the model as a double-
 Unknown                                     0            1 (2.4)     blind test set to evaluate and calculate the sensitivity and
PR status                                                             specificity of the model.
 Positive                                    20 (46.5)    13 (31.0)
 Negative                                    23 (53.5)    28 (66.7)
 Unknown                                     0            1 (2.4)     Results
CerbB2 status
 Positive                                    11 (25.6)    8 (19.0)    Clinicopathological data of the 201 patients evaluated
 Negative                                    19 (44.2)    19 (45.2)   for modeling
 Unknown                                     13 (30.2)    15 (35.7)
Vascular invasion                                                     The clinicopathological data of the 201 patients with SLN-
 +                                           26 (60.5)    36 (85.7)   positive breast cancer used for modeling are shown in
 −                                           17 (39.5)    6 (14.3)    Table 1. The lesion was confined to SLNs (NSLN−) in 91
Tumor size                                                            cases (45.3 %, 91 of 201), the median age was 46, ER status
 T1                                          17 (39.5)    9 (21.4)    was unknown in eight cases (8.8 %, eight of 91), and CerbB2
 T2                                          23 (53.5)    28 (66.7)   status was unknown (including cases of CerbB2 IHC++ and
 T3                                          3 (7.0)      5 (11.9)    untested CerbB2 status) in 40 cases (44.0 %, 40 of 91).
Multifocality                                                         NSLN+ cases numbered 110 (54.7 %, 110 of 201), the
 +                                           0            2 (4.8)     median age was 49, ER status was unknown in six cases
 −                                           43 (100.0)   40 (95.2)   (5.5 %, six of 110), and CerbB2 status was unknown in 43
Histological type and grade                                           cases (39.1 %, 43 of 110).
 IDC grade 1                                 3 (7.0)      1 (2.4)
 IDC grade 2                                 17 (39.5)    22 (52.4)   Clinicopathological features of cases for the double-blind test
 IDC grade 3                                 11 (25.6)    11 (26.2)
 ILC                                         2 (4.7)      5 (11.9)    Clinicopathological data of the 85 patients used for the
 Mixed type                                  10 (23.3)    3 (7.1)
                                                                      double-blind test is shown in Table 2. For 43 cases
Size of positive SLNs
                                                                      (50.6 %, 43 of 85), the lesion was confined to the SLNs
 <0.5 cm                                     34 (79.1)    3 (7.1)
 0.5–1.0 cm                                  8 (18.6)     36 (85.7)
                                                                      Table 3 Variables selected for the SVM model
 1.0–1.5 cm                                  1 (2.3)      3 (7.1)
Number of positive SLNs                                               Variables                                         p value
 1                                           32 (74.4)    24 (57.1)
                                                                      Age                                               0.1440008312
 2                                           10 (23.3)    12 (28.6)
                                                                      Tumor size                                        0.0015053351
 3                                           1 (2.3)      4 (9.5)
                                                                      Histological type                                 0.1163538893
 >3                                          0            2 (4.8)
                                                                      Histological grade                                0.810639
Positive SLN membrane invasion
                                                                      ER status                                         0.809162
 +                                           32 (74.4)    39 (92.9)
                                                                      PR status                                         0.598087
 −                                           11 (25.6)    3 (7.1)
                                                                      Vascular invasion                                 0.0000091261
Number of negative SLNs
                                                                      Size of positive SLNs                             0
 0                                           8 (18.6)     35 (83.3)
                                                                      Number of positive SLNs                           0.0000085344
 1                                           15 (34.9)    6 (14.3)
                                                                      Positive SLN membrane invasion                    0
 2                                           9 (20.9)     1 (2.3)
                                                                      Multifocality                                     0.6959953426
 3                                           8 (18.6)     0
                                                                      Number of negative SLNs                           0
 >3                                          3 (7.0)      0
                                                                      CerbB2 status                                     0.238588
1550                                                                                                    Tumor Biol. (2013) 34:1547–1552
                                                                         Fig. 3 Training of the SVM model
Fig. 1 The training accuracy of the model and the accuracy of cross-
validation
                                                                         These six variables were combined arbitrarily, and a SVM
                                                                         training model was constructed for cross validation. The
(NSLN−), the median age was 43, and CerbB2 status was                    combination with the highest accuracy through a cross-
unknown in 13 cases (30.2 %, 13 of 85). Positive NSLN                    validation test for modeling included the number of positive
(NSLN+) was expressed in 42 cases (49.4 %, 42 of 85), the                SLNs, the size of positive SLNs, the number of negative
median age was 42, ER status was unknown in one case                     SLNs, and positive SLN membrane invasion. The training
(2.4 %, one of 42), and CerbB2 status was unknown in 15                  accuracy of the model constructed with these four variables
cases (35.7 %, 15 of 42).                                                was 92 %, and the accuracy of cross-validation was 87.6 %
                                                                         (Fig. 1).
Variable selection for use with the SVM model                               The histogram depicts the 13 variables in order from
                                                                         smallest p value to greatest. The four variables with the
Thirteen variables including patient age, tumor size, histo-             smallest p values were selected for the model. Variables 1–
logical type, histological grade, vascular invasion, ER sta-             13 correspond to the size of the positive SLNs, the number
tus, PR status, CerbB2 status, positive SLN size and                     of negative SLNs, positive SLN membrane invasion, the
number, negative SLN number, multifocality, and positive                 number of positive SLNs, vascular invasion, tumor size,
SLN membrane invasion were screened with the SVM. The                    histological type, age, CerbB2, PR, multifocality, ER, and
following six variables had p values of statistical signifi-             histological grade, respectively.
cance: no vascular invasion, the size of the positive SLNs,
the number of positive SLNs, the number of negative SLNs,
positive SLN membrane invasion, and tumor size (Table 3).
Fig. 2 The training accuracy of the model, the cross-validation verac-
ity, and the accuracy of the double-blind test                           Fig. 4 The SVM model
Tumor Biol. (2013) 34:1547–1552                                                                                                     1551
Training and test results of the model                            was 91.8 %. These data demonstrate that this model is stable
                                                                  and reliable.
The training accuracy of the model established with the four         Nevertheless, because all of the data for modeling were
variables was 92 %, the cross-validation veracity was             obtained from a single tumor treatment center, this method
87.6 %, and the accuracy of the double-blind test was             will require a larger number of cases, different groups of
91.8 %. Training of the SVM model on 201 patients                 people, and multiple treatment centers for verification. If the
revealed a sensitivity of 90.0 %, a specificity of 94.6 %,        final results show that the model is reliable and accurate, the
and an accuracy of 92 %, as shown in Figs. 2 and 3. A             SVM model will be used to predict SLN status in SLN-
double-blind test on the 85 follow-up cases revealed a            positive breast cancer cases and to carry out a study aimed at
sensitivity of 97.7 %, a specificity of 85.7 %, and an accu-      determining an alternative to ALND as the next step in the
racy of 91.8 %, as shown in Fig. 4. By leave-one-out cross-       treatment of early breast cancer. SLNB, as a substitute for
validation, the sensitivity of the model was 84.4 %, the          ALND, will be carried out on these patients on the premise
specificity was 91.0 %, and the accuracy was 87.6 %.              that locoregional control and long-term survival rate is not
                                                                  affected, thus reducing surgical complications, improving
                                                                  the quality of life of the patients, shortening the duration
Discussion                                                        of the operation, decreasing treatment expenses and medical
                                                                  costs, and protecting social productivity. This model there-
Currently, many ways can be used to classify patterns.            fore has the potential to be of great value to theoretical
Traditional ways include the Bayesian method, distance            research and in clinical applications, providing huge eco-
discrimination method, and Fisher discrimination method.          nomic and social benefits.
SVM can successfully address regression problems (time
series analysis), pattern recognition (classification, discrim-   Conflicts of interest None
inant analysis), and many other issues. The main advantage
of the SVM method is that it seeks to achieve the optimum
answer under the condition of existing information on lim-
ited samples, not just an optimum value when the number of        References
samples tends toward infinity [21].
   In recent years, four models have been used for assessing       1. Nakhlis F, Golshan M. Bevacizumab: where do we go from here in
the NSLN status in SLN-positive breast cancer, including              breast cancer? Transl Cancer Res. 2012;1(1):55–6.
the BCN of MSKCC, Tenon model, Stanford model, and                 2. Chen W, Zheng R, Zhang S, Zhao P, Li G, Wu L, et al. Report of
                                                                      incidence and mortality in China cancer registries, 2009. Chin J
Cambridge model. The results show that the BCN of
                                                                      Cancer Res. 2013;25(1):10–21.
MSKCC is the most reliable. Different models contain dif-          3. Zhang BN, Cao XC, Chen JY, Chen J, Fu L, Hu XC, et al.
ferent variables, most often in the aspects of the primary            Guidelines on the diagnosis and treatment of breast cancer (2011
tumor and SLN, such as primary tumor size, grade, ER and              edition). Gland Surg. 2012;1(1):39–61.
                                                                   4. Rashid OM, Takabe K. The evolution of the role of surgery in the
CerbB2 expression, vascular invasion, the size of positive
                                                                      management of breast cancer lung metastasis. J Thorac Dis. 2012;4
SLNs, the number of positive SLNs, positive SLN mem-                  (4):420–4.
brane extension, and positive SLN testing method (HE or            5. Veronesi U, Paganelli G, Viale G, Luini A, Zurrida S, Galimberti
IHC). The variables that have predictive values are primary           V, et al. A randomized comparison of sentinel-node biopsy with
                                                                      routine axillary dissection in breast cancer. N Engl J Med.
tumor size, the size and number of positive SLNs, and
                                                                      2003;349(6):546–53.
vascular invasion.                                                 6. Mansel RE, Fallowfield L, Kissin M, Goyal A, Newcombe RG,
   In this project, using the SVM method, we isolated six             Dixon JM, et al. Randomized multicenter trial of sentinel node
variables with statistical significance, including the number         biopsy versus standard axillary treatment in operable breast cancer:
                                                                      the ALMANAC Trial. J Natl Cancer Inst. 2006;98(9):599–609.
of SLNs, vascular invasion, the size of positive SLNs, the
                                                                   7. Fan F. Sentinel lymph node micrometastases and isolated tumor cells
number of negative SLNs, positive SLN membrane inva-                  in breast cancer: an evolving field. Gland Surg. 2012;1(1):5–6.
sion, and tumor size. These six variables were combined            8. Purushotham AD, Upponi S, Klevesath MB, Bobrow L, Millar K,
arbitrarily, and a SVM training model was constructed for             Myles JP, et al. Morbidity after sentinel lymph node biopsy in
                                                                      primary breast cancer: results from a randomized controlled trial. J
cross-validation. The combination with the highest accuracy
                                                                      Clin Oncol. 2005;23(19):4312–21.
according to the cross-validation test for modeling consisted      9. Bernet L, Cano R. Metastatic sentinel node and axillary lympha-
of the number of positive SLNs, the size of positive SLNs,            denectomy revisited. Gland Surg. 2012;1(1):7–8.
the number of negative SLNs, and positive SLN membrane            10. McLaughlin SA, Wright MJ, Morris KT, Giron GL, Sampson MR,
                                                                      Brockway JP, et al. Prevalence of lymphedema in women with
invasion. The training accuracy of the model constructed
                                                                      breast cancer 5 years after sentinel lymph node biopsy or axillary
with the four variables was 92 %, the accuracy of cross-              dissection: objective measurements. J Clin Oncol. 2008;26
validation was 87.6 %, and the accuracy of double-blind test          (32):5213–9.
1552                                                                                                          Tumor Biol. (2013) 34:1547–1552
11. Nelson V, Rademaker A, Kaklamani V. Paradigm of polyendocrine              axilla: a study to evaluate the need for complete axillary lymph
    therapy in endocrine responsive breast cancer: the role of fulves-         node dissection. Arch Surg. 2001;136(6):688–92.
    trant. Chin Clin Oncol. 2013;2(1):10.                                17.   Van Zee KJ, Manasseh DM, Bevilacqua JL, Boolbol SK, Fey JV,
12. Ashikaga T, Krag DN, Land SR, Julian TB, Anderson SJ, Brown                Tan LK, et al. A nomogram for predicting the likelihood of
    AM, et al. Morbidity results from the NSABP B-32 trial comparing           additional nodal metastases in breast cancer patients with a posi-
    sentinel lymph node dissection versus axillary dissection. J Surg          tive sentinel node biopsy. Ann Surg Oncol. 2003;10(10):1140–51.
    Oncol. 2010;102(2):111–8.                                            18.   Barranger E, Coutant C, Flahault A, Delpech Y, Darai E, Uzan S.
13. Shah-Khan M, Boughey JC. Evolution of axillary nodal staging in            An axilla scoring system to predict non-sentinel lymph node status
    breast cancer: clinical implications of the ACOSOG Z0011 Trial.            in breast cancer patients with sentinel lymph node involvement.
    Cancer Control. 2012;19(4):267–76.                                         Breast Cancer Res Treat. 2005;91(2):113–9.
14. Giuliano AE, Haigh PI, Brennan MB, Hansen NM, Kelley MC,             19.   Pal A, Provenzano E, Duffy SW, Pinder SE, Purushotham AD. A
    Ye W, et al. Prospective observational study of sentinel lympha-           model for predicting non-sentinel lymph node metastatic disease
    denectomy without further axillary dissection in patients with             when the sentinel lymph node is positive. Br J Surg. 2008;95
    sentinel node-negative breast cancer. J Clin Oncol. 2000;18                (3):302–9.
    (13):2553–9.                                                         20.   Kohrt HE, Olshen RA, Bermas HR, Goodson WH, Wood DJ, Henry
15. Turner RR, Ollila DW, Krasne DL, Giuliano AE. Histopathologic              S, et al. Bay Area SLN Study. New models and online calculator for
    validation of the sentinel lymph node hypothesis for breast carci-         predicting non-sentinel lymph node status in sentinel lymph node
    noma. Ann Surg. 1997;226(3):271–6.                                         positive breast cancer patients. BMC Cancer. 2008;8:66.
16. Kamath VJ, Giuliano R, Dauway EL, Cantor A, Berman C, Ku             21.   Sattlecker M, Bessant C, Smith J, Stone N. Investigation of sup-
    NN, et al. Characteristics of the sentinel lymph node in breast            port vector machines and Raman spectroscopy for lymph node
    cancer predict further involvement of higher-echelon nodes in the          diagnostics. Analyst. 2010;135(5):895–901.