Acute Lymphoblastic Leukemia Classification in Nucleus Microscopic Images Using Convolutional Neural Networks and Transfer Learning
Acute Lymphoblastic Leukemia Classification in Nucleus Microscopic Images Using Convolutional Neural Networks and Transfer Learning
                                                                                                                                                                            Abstract—Leukemia is a disease caused by the abnormal                        distinguish the lymphoblast from the lymphocyte cells. We
                                                                                                                                                                         production of abnormal blood cells. In Acute Lymphoblastic                      are using DenseNet-201 [10] for the deep CNN and Wide-
                                                                                                                                                                         Leukemia (ALL), lymphoblast cells do not develop into lym-                      ResNet-50-2 [11] for the wide CNN as the base model to
                                                                                                                                                                         phocytes. To diagnose the disease, we need to differentiate be-
                                                                                                                                                                         tween lymphocytes and lymphoblasts. However, lymphocytes and                    do transfer learning. As a benchmark, we include the result
                                                                                                                                                                         lymphoblasts have similar morphologies. Several studies using                   from doing transfer learning using ResNet50 as the base deep
                                                                                                                                                                         computer vision have been developed to distinguish lymphocytes                  residual network [12]. Moreover, we investigated the effect of
                                                                                                                                                                         from lymphoblasts. This study aims to compare deep and wide                     histogram equalization – as the preprocessing step – to the
                                                                                                                                                                         deep learning architectures to classify segmented blood cell                    performance.
                                                                                                                                                                         images. The “deep” architecture employed in this study was
                                                                                                                                                                         DenseNet201, whereas the “wide” architecture was Wide-ResNet-                                                   II. DATASET
                                                                                                                                                                         50-2. We also employed ResNet50 as a baseline. This study
                                                                                                                                                                         utilizes transfer learning to reduce the training steps needed.                    In this study, the dataset was obtained from Cancer Imaging
                                                                                                                                                                         In addition, we measured the impact of dataset preprocessing                    Archives [13]. This dataset was used on IEEE ISBI 2019
                                                                                                                                                                         using histogram equalization on the classifier performance. We                  conference challenge: Classification of Normal vs Malignant
                                                                                                                                                                         found that DenseNet201 model has the best performance with
                                                                                                                                                                                                                                                         Cells in B-ALL White Blood Cancer Microscopic Images. The
                                                                                                                                                                         an AUC score of 86.73% and that histogram equalization makes
                                                                                                                                                                         the performance worse.                                                          image size is 450x450 px in .bmp format and RGB channel.
                                                                                                                                                                            Keywords—leukemia, computer vision, transfer learning, con-                  The data was segmented from ALL-IDB dataset in [14]–[16].
                                                                                                                                                                         volution neural networks, image classification                                  In Fig. 1, we can see the sample data of two classes. In
                                                                                                                                                                                                                                                         this study, we used 12,528 images with 8,491 lymphoblast
                                                                                                                                                                                                  I. I NTRODUCTION                                       (positive) images and 4,037 lymphocyte (negative) images. We
                                                                                                                                                                            Blood cancer or leukemia is a disease caused by uncon-                       split the data into training, validation, and test sets.
                                                                                                                                                                         trolled duplicates of abnormal cells on blood cells. From
                                                                                                                                                                         several types of leukemia, Acute Lymphoblastic Leukemia
                                                                                                                                                                         (ALL) is the type that is caused by the uncontrolled growth
                                                                                                                                                                         of lymphocytes, also known as lymphoblasts. This type of
                                                                                                                                                                         leukemia mostly affects adults over 50 years old and children
                                                                                                                                                                         in the 2-5 year group. Studies show that 80% of leukemia cases
                                                                                                                                                                         on children is ALL [1], [2]. Due to the fast development of
                                                                                                                                                                         ALL, early diagnosis is crucial to get immediate treatment.
                                                                                                                                                                         Cell counting from microscopic images by a pathologist is
                                                                                                                                                                         still used to diagnose ALL despite being subjective and time-
                                                                                                                                                                         consuming because of the ease of implementation [3], [4]. The
                                                                                                                                                                                                                                                                 (a) Lymphocyte cell                           (b) Lymphoblast cell
                                                                                                                                                                         problem comes from the similar morphology of lymphocytes
                                                                                                                                                                         and lymphoblasts.                                                                                             Fig. 1. Sample images
                                                                                                                                                                            Previous studies focus on automating this test using
                                                                                                                                                                         computer-aided diagnosis to speed up the process [1], [3].
                                                                                                                                                                                                                                                                                         III. M ETHOD
                                                                                                                                                                         In more recent studies, ALL diagnosis done using convolu-
                                                                                                                                                                         tional neural networks (CNNs) show promising results [1],                       A. Histogram Equalization
                                                                                                                                                                         [5]–[9]. Our study attempts to compare the performance of                         Histogram equalization flattens the grey level of an image
                                                                                                                                                                         deep and wide convolutional neural network architectures to                     and maps it onto the image. This method enhances contrast on
                                                                                                                                                                            Authorized licensed use limited to: Malnad College of Engineering. Downloaded on October 18,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
images and is mostly used in medical images [9], [17]. This                     a drop-out layer to regularize the feature to do so. Thus, WRN
method started with computing the probability on each grey                      can optimize the residual layer on ResNet and makes it faster
level as shown in Eq. 1. In Eq. 1 x represents the image,                       and less expensive in computation compared to the original
ni represents the number of pixels in grey level i, and L                       ResNet.
represents the maximum grey level.
                                                                                E. Densely Connected Convolutional Networks (DenseNet)
                                  ni
               px (i) = p(x = i) =   ,0 ≤ i ≤ L            (1)                     Densely Connected Convolutional Networks (DenseNet)
                                  n
  The next step was determining the cumulative distribution                     [10] is one of the deep neural networks. This architecture re-
function (CDF) from the previous distribution for all possible                  solves the vanishing gradient problem by using concatenation
values of i which is described in Eq. 2.                                        for each dense block. Moreover, the concatenation function
                                                                                helps the network collect the knowledge from the previous
                                   i
                                   X                                            layers.
                     cdfx (i) =          px (x = j)                     (2)
                                   j=0
   Authorized licensed use limited to: Malnad College of Engineering. Downloaded on October 18,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
where θ is the weight, t is the iteration, α is the learning
rate, and J(θ; x(i) , y (i) ) is the loss function that we want to
minimize.
   2) Adaptive Moment Estimation (Adam): While the learn-
ing rate in SGD remains constant throughout the learning
process, Adam has a parameter to adapt the learning rate and
typically requires minimal tuning by using the lower-order
moments [22]. Eq. 6 shows the formulation to update the
weights at each time step
                                           mt
                     θt = θt−1 − αt √                          (6)
                                          vt + ˆ
                                                                                                      (a) Histogram of the original images
where mt is the biased first moment estimate, vt is the biased
second raw moment estimate,  is a small number to prevent
division by zero, and αt is the adapted learning rate based on
decay rates β and follows Eq. 7.
                               p
                                  1 − β2t
                      αt = α ·                             (7)
                                (1 − β1t )
                            IV. R ESULTS
   This section elaborates the results of the study. The first step
was preprocessing the dataset with histogram equalization. We
can observe in Fig. 3a that shows the sample of the original
histogram and images. Top of the figure is the lymphoblast
                                                                                                    (b) Histogram of the preprocessed images
(cancer) sample and the bottom of the figure is the lympho-
cytes (healthy) sample. The maximum value of the grey level                     Fig. 3. Comparing the effect of preprocessing the images to the grey level
                                                                                distributions
is 150 in the original images. Fig. 3b shows the histograms
of the preprocessed images. We can observe that the intensity
is now better distributed in the histograms, ranging from 0 to
255. In the image, we can also observe that the nucleus looks
brighter than original.
   After creating the enhanced dataset, the next step was
hyperparameter tuning. Table I shows the hyperparameters
used in this study. Each model was trained with 50 epochs                                  Fig. 4. Classification layer on top of each architecture
with early stopping. Fig. 4 shows the classification layer we
added and trained on top of the base model. The selected
learning rates are based on experiments with the following                      20 epochs. While the same model with a scheduler could
values: 10−4 , 10−3 , 10−2 , and 10−1 . We found that the most                  improve its accuracy before 10 epochs, it could not reach the
of the models using 10−1 did not learn at all.                                  highest accuracy of the model without a scheduler.
                                                                                   Two WRN models with Adam optimizer have an interesting
            TABLE I. H YPERPARAMETER VALUES TO TEST
                                                                                result. The model shows that there was no improvements
               Hyperparameter               Value                               after several epochs. We argue that the model could not find
                  Optimizer            [Adam, SGD]
                Learning Rates      [10−4 , 10−3 , 10−2 ]
                                                                                the optimum value because the learning rate was too large.
                  Scheduler             [True, False]                           Therefore, descending the error curve took a long time and
                                                                                triggered the early stopping. The only model which was not
  We split the data set into 70% for training data and 30% for                  in this case was a model with a 10−4 learning rate. This case
validation data. Since the data set is imbalanced, we under-                    happened either with or without a scheduler. In addition, this
sampled samples from the majority class. We cropped the                         case happened to its baseline ResNet50 too as well as to the
center of the images into 300x300 px to reduce black pixels                     model with SGD optimizer with learning rate 10−4 . For the
and resized them to match the input size of the model.                          DenseNet models, only the model using Adam optimizer with
  Fig. 5 shows the learning result of each model. Fig. 5a                       10−2 learning rate and a scheduler had no improvement on its
shows that most of the models have fluctuating curves without                   learning.
a scheduler even after several epochs compared to the ones                         Fig. 5b shows the learning curves from preprocessed data.
with a scheduler. In addition, the model with a scheduler                       The models without a scheduler were more stable than those
was faster to converge. For example, WRN model with Adam                        using the original data. However, the model took a long time
optimizer and learning rate 10−4 improved its accuracy after                    to converge. Using data preprocessing increased the amount
   Authorized licensed use limited to: Malnad College of Engineering. Downloaded on October 18,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
                                                                (a) Models using original data
   Authorized licensed use limited to: Malnad College of Engineering. Downloaded on October 18,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
                                                              TABLE II. T RAINING RESULTS
                               Model           Optimizer     Scheduler     lr      Acc           Loss      Specificity   Sensitivity
                                                                     Original Data
                            Resnet-50             Adam         True      10−4    0.9616         0.3521       0.9607        0.9640
                          Densenet-201            Adam         True      10−4    0.9685         0.3457       0.9706        0.9623
                         Wide-Resnet-50-2         Adam         True      10−4    0.9677         0.3465       0.9657        0.9732
                                                                   Preprocessed Data
                            Resnet-50             SGD          False     10−3    0.9449         0.3674       0.9450        0.9444
                          Densenet-201            SGD          False     10−3    0.9479         0.3646       0.9480        0.9479
                         Wide-Resnet-50-2         SGD          False     10−3    0.9472         0.3647       0.9511        0.9356
features and ensure that the features can describe the data for                   [3] M. Mohamed, B. Far, and A. Guaily, “An efficient technique for white
each class. It also helps to classify the data which has similar                      blood cells nuclei automatic segmentation,” in 2012 IEEE International
                                                                                      Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2012,
forms.                                                                                pp. 220–225.
   From this study, we can also observe that scheduler could                      [4] Y. Liu and F. Long, “Acute lymphoblastic leukemia cells image analysis
help the model to converge. In addition, the scheduler helped                         with deep bagging ensemble learning,” in ISBI 2019 C-NMC Challenge:
                                                                                      Classification in Cancer Cell Imaging. Springer, 2019, pp. 113–121.
the model to use the information from training optimally.                         [5] A. M. Abdeldaim, A. T. Sahlol, M. Elhoseny, and A. E. Hassanien,
However, the scheduler might make the model limits its ability                        “Computer-aided acute lymphoblastic leukemia diagnosis system based
to reach higher accuracy. In this study, preprocessed data could                      on image analysis,” in Advances in Soft Computing and Machine
                                                                                      Learning in Image Processing. Springer, 2018, pp. 131–147.
not improve the learning. Preprocessing might decrease the                        [6] L. H. S. Vogado, R. D. M. S. Veras, A. R. Andrade, F. H. D.
features of the images and make it harder to be differentiated.                       De Araujo, R. R. V. e Silva, and K. R. T. Aires, “Diagnosing leukemia
Moreover, under sampling method might reduce the variance                             in blood smear images using an ensemble of classifiers and pre-trained
                                                                                      convolutional neural networks,” in 2017 30th SIBGRAPI Conference on
of the data images because lymphoblasts vary in its forms.                            Graphics, Patterns and Images (SIBGRAPI). IEEE, 2017, pp. 367–373.
                                                                                  [7] W. Yu, J. Chang, C. Yang, L. Zhang, H. Shen, Y. Xia, and J. Sha,
                          VI. C ONCLUSION                                             “Automatic classification of leukocytes using deep neural network,” in
                                                                                      2017 IEEE 12th International Conference on ASIC (ASICON). IEEE,
   We have compared three models to find the best model to                            2017, pp. 1041–1044.
classify nucleus for ALL disease. DenseNet-201 as a model                         [8] S. Mourya, S. Kant, P. Kumar, A. Gupta, and R. Gupta, “Leukonet: Dct-
with a deep architecture achieved the best overall performance                        based cnn architecture for the classification of normal versus leukemic
                                                                                      blasts in b-all cancer,” arXiv preprint arXiv:1810.07961, 2018.
on the test set with an AUC of 86.73%. While the model
                                                                                  [9] T. Thanh, C. Vununu, S. Atoev, S.-H. Lee, and K.-R. Kwon, “Leukemia
could achieve a specificity of 93.49%, the sensitivity is still at                    blood cell image classification using convolutional neural network,”
76.55%. In a field like healthcare, we are more concerned in                          International Journal of Computer Theory and Engineering, vol. 10,
finding the positive cases. Thus, we need to tune the model                           no. 2, pp. 54–58, 2018.
                                                                                 [10] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
further or penalize the model more on false negatives to                              connected convolutional networks,” in Proceedings of the IEEE confer-
improve the sensitivity. On the other hand, we can see that                           ence on computer vision and pattern recognition, 2017, pp. 4700–4708.
preprocessing the data actually made the performance worse.                      [11] S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv
                                                                                      preprint arXiv:1605.07146, 2016.
   In the future, we also think that it is necessary to be able to               [12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
interpret the model itself, i.e. to get the visual explanations                       recognition,” in Proceedings of the IEEE conference on computer vision
for the decisions that the model make. Thus, employing a                              and pattern recognition, 2016, pp. 770–778.
method like Grad-CAM [23] can be beneficial. Further work                        [13] “Wiki - the cancer imaging archive (tcia) public access - cancer
                                                                                      imaging archive wiki,” https://wiki.cancerimagingarchive.net/, (Accessed
also includes using the model to count the cells and compare                          on 12/24/2019).
the result from this method with the gold standard.                              [14] R. Duggal, A. Gupta, R. Gupta, M. Wadhwa, and C. Ahuja, “Overlap-
                                                                                      ping cell nuclei segmentation in microscopic images using deep belief
                             R EFERENCES                                              networks,” in Proceedings of the Tenth Indian Conference on Computer
                                                                                      Vision, Graphics and Image Processing. ACM, 2016, p. 82.
 [1] V. Singhal and P. Singh, “Local binary pattern for automatic detection      [15] R. Duggal, A. Gupta, and R. Gupta, “Segmentation of overlap-
     of acute lymphoblastic leukemia,” 02 2014, pp. 1–5.                              ping/touching white blood cell nuclei using artificial neural networks,”
 [2] S. Mohapatra, D. Patra, and S. Satpathy, “An ensemble classifier                 CME Series on Hemato-Oncopathology, All India Institute of Medical
     system for early diagnosis of acute lymphoblastic leukemia in blood              Sciences (AIIMS). New Delhi, India, 2016.
     microscopic images,” Neural Computing and Applications, vol. 24, no.        [16] R. Duggal, A. Gupta, R. Gupta, and P. Mallick, “Sd-layer: stain
     7-8, pp. 1887–1904, 2014.                                                        deconvolutional layer for cnns in medical microscopic imaging,” in
    Authorized licensed use limited to: Malnad College of Engineering. Downloaded on October 18,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
        International Conference on Medical Image Computing and Computer-
        Assisted Intervention. Springer, 2017, pp. 435–443.
[17]    S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz,
        T. Greer, B. ter Haar Romeny, J. B. Zimmerman, and K. Zuiderveld,
        “Adaptive histogram equalization and its variations,” Computer vision,
        graphics, and image processing, vol. 39, no. 3, pp. 355–368, 1987.
[18]    P. Garg and T. Jain, “A comparative study on histogram equalization
        and cumulative histogram equalization,” International Journal of New
        Technology and Research, vol. 3, no. 9, 2017.
[19]    L. Torrey and J. Shavlik, “Transfer learning,” in Handbook of research
        on machine learning applications and trends: algorithms, methods, and
        techniques. IGI Global, 2010, pp. 242–264.
[20]    J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are
        features in deep neural networks?” in Advances in neural information
        processing systems, 2014, pp. 3320–3328.
[21]    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:
        A large-scale hierarchical image database,” in 2009 IEEE conference on
        computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
[22]    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
        arXiv preprint arXiv:1412.6980, 2014.
[23]    R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
        D. Batra, “Grad-cam: Visual explanations from deep networks via
        gradient-based localization,” in Proceedings of the IEEE International
        Conference on Computer Vision (ICCV), Oct 2017.
Authorized licensed use limited to: Malnad College of Engineering. Downloaded on October 18,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.