Acute Lymphoblastic Leukemia Classification in Nucleus Microscopic Images Using Convolutional Neural Networks and Transfer Learning
Acute Lymphoblastic Leukemia Classification in Nucleus Microscopic Images Using Convolutional Neural Networks and Transfer Learning
Abstract—Leukemia is a disease caused by the abnormal distinguish the lymphoblast from the lymphocyte cells. We
production of abnormal blood cells. In Acute Lymphoblastic are using DenseNet-201 [10] for the deep CNN and Wide-
Leukemia (ALL), lymphoblast cells do not develop into lym- ResNet-50-2 [11] for the wide CNN as the base model to
phocytes. To diagnose the disease, we need to differentiate be-
tween lymphocytes and lymphoblasts. However, lymphocytes and do transfer learning. As a benchmark, we include the result
lymphoblasts have similar morphologies. Several studies using from doing transfer learning using ResNet50 as the base deep
computer vision have been developed to distinguish lymphocytes residual network [12]. Moreover, we investigated the effect of
from lymphoblasts. This study aims to compare deep and wide histogram equalization – as the preprocessing step – to the
deep learning architectures to classify segmented blood cell performance.
images. The “deep” architecture employed in this study was
DenseNet201, whereas the “wide” architecture was Wide-ResNet- II. DATASET
50-2. We also employed ResNet50 as a baseline. This study
utilizes transfer learning to reduce the training steps needed. In this study, the dataset was obtained from Cancer Imaging
In addition, we measured the impact of dataset preprocessing Archives [13]. This dataset was used on IEEE ISBI 2019
using histogram equalization on the classifier performance. We conference challenge: Classification of Normal vs Malignant
found that DenseNet201 model has the best performance with
Cells in B-ALL White Blood Cancer Microscopic Images. The
an AUC score of 86.73% and that histogram equalization makes
the performance worse. image size is 450x450 px in .bmp format and RGB channel.
Keywords—leukemia, computer vision, transfer learning, con- The data was segmented from ALL-IDB dataset in [14]–[16].
volution neural networks, image classification In Fig. 1, we can see the sample data of two classes. In
this study, we used 12,528 images with 8,491 lymphoblast
I. I NTRODUCTION (positive) images and 4,037 lymphocyte (negative) images. We
Blood cancer or leukemia is a disease caused by uncon- split the data into training, validation, and test sets.
trolled duplicates of abnormal cells on blood cells. From
several types of leukemia, Acute Lymphoblastic Leukemia
(ALL) is the type that is caused by the uncontrolled growth
of lymphocytes, also known as lymphoblasts. This type of
leukemia mostly affects adults over 50 years old and children
in the 2-5 year group. Studies show that 80% of leukemia cases
on children is ALL [1], [2]. Due to the fast development of
ALL, early diagnosis is crucial to get immediate treatment.
Cell counting from microscopic images by a pathologist is
still used to diagnose ALL despite being subjective and time-
consuming because of the ease of implementation [3], [4]. The
(a) Lymphocyte cell (b) Lymphoblast cell
problem comes from the similar morphology of lymphocytes
and lymphoblasts. Fig. 1. Sample images
Previous studies focus on automating this test using
computer-aided diagnosis to speed up the process [1], [3].
III. M ETHOD
In more recent studies, ALL diagnosis done using convolu-
tional neural networks (CNNs) show promising results [1], A. Histogram Equalization
[5]–[9]. Our study attempts to compare the performance of Histogram equalization flattens the grey level of an image
deep and wide convolutional neural network architectures to and maps it onto the image. This method enhances contrast on
Authorized licensed use limited to: Malnad College of Engineering. Downloaded on October 18,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
images and is mostly used in medical images [9], [17]. This a drop-out layer to regularize the feature to do so. Thus, WRN
method started with computing the probability on each grey can optimize the residual layer on ResNet and makes it faster
level as shown in Eq. 1. In Eq. 1 x represents the image, and less expensive in computation compared to the original
ni represents the number of pixels in grey level i, and L ResNet.
represents the maximum grey level.
E. Densely Connected Convolutional Networks (DenseNet)
ni
px (i) = p(x = i) = ,0 ≤ i ≤ L (1) Densely Connected Convolutional Networks (DenseNet)
n
The next step was determining the cumulative distribution [10] is one of the deep neural networks. This architecture re-
function (CDF) from the previous distribution for all possible solves the vanishing gradient problem by using concatenation
values of i which is described in Eq. 2. for each dense block. Moreover, the concatenation function
helps the network collect the knowledge from the previous
i
X layers.
cdfx (i) = px (x = j) (2)
j=0
Authorized licensed use limited to: Malnad College of Engineering. Downloaded on October 18,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
where θ is the weight, t is the iteration, α is the learning
rate, and J(θ; x(i) , y (i) ) is the loss function that we want to
minimize.
2) Adaptive Moment Estimation (Adam): While the learn-
ing rate in SGD remains constant throughout the learning
process, Adam has a parameter to adapt the learning rate and
typically requires minimal tuning by using the lower-order
moments [22]. Eq. 6 shows the formulation to update the
weights at each time step
mt
θt = θt−1 − αt √ (6)
vt + ˆ
(a) Histogram of the original images
where mt is the biased first moment estimate, vt is the biased
second raw moment estimate, is a small number to prevent
division by zero, and αt is the adapted learning rate based on
decay rates β and follows Eq. 7.
p
1 − β2t
αt = α · (7)
(1 − β1t )
IV. R ESULTS
This section elaborates the results of the study. The first step
was preprocessing the dataset with histogram equalization. We
can observe in Fig. 3a that shows the sample of the original
histogram and images. Top of the figure is the lymphoblast
(b) Histogram of the preprocessed images
(cancer) sample and the bottom of the figure is the lympho-
cytes (healthy) sample. The maximum value of the grey level Fig. 3. Comparing the effect of preprocessing the images to the grey level
distributions
is 150 in the original images. Fig. 3b shows the histograms
of the preprocessed images. We can observe that the intensity
is now better distributed in the histograms, ranging from 0 to
255. In the image, we can also observe that the nucleus looks
brighter than original.
After creating the enhanced dataset, the next step was
hyperparameter tuning. Table I shows the hyperparameters
used in this study. Each model was trained with 50 epochs Fig. 4. Classification layer on top of each architecture
with early stopping. Fig. 4 shows the classification layer we
added and trained on top of the base model. The selected
learning rates are based on experiments with the following 20 epochs. While the same model with a scheduler could
values: 10−4 , 10−3 , 10−2 , and 10−1 . We found that the most improve its accuracy before 10 epochs, it could not reach the
of the models using 10−1 did not learn at all. highest accuracy of the model without a scheduler.
Two WRN models with Adam optimizer have an interesting
TABLE I. H YPERPARAMETER VALUES TO TEST
result. The model shows that there was no improvements
Hyperparameter Value after several epochs. We argue that the model could not find
Optimizer [Adam, SGD]
Learning Rates [10−4 , 10−3 , 10−2 ]
the optimum value because the learning rate was too large.
Scheduler [True, False] Therefore, descending the error curve took a long time and
triggered the early stopping. The only model which was not
We split the data set into 70% for training data and 30% for in this case was a model with a 10−4 learning rate. This case
validation data. Since the data set is imbalanced, we under- happened either with or without a scheduler. In addition, this
sampled samples from the majority class. We cropped the case happened to its baseline ResNet50 too as well as to the
center of the images into 300x300 px to reduce black pixels model with SGD optimizer with learning rate 10−4 . For the
and resized them to match the input size of the model. DenseNet models, only the model using Adam optimizer with
Fig. 5 shows the learning result of each model. Fig. 5a 10−2 learning rate and a scheduler had no improvement on its
shows that most of the models have fluctuating curves without learning.
a scheduler even after several epochs compared to the ones Fig. 5b shows the learning curves from preprocessed data.
with a scheduler. In addition, the model with a scheduler The models without a scheduler were more stable than those
was faster to converge. For example, WRN model with Adam using the original data. However, the model took a long time
optimizer and learning rate 10−4 improved its accuracy after to converge. Using data preprocessing increased the amount
Authorized licensed use limited to: Malnad College of Engineering. Downloaded on October 18,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
(a) Models using original data
Authorized licensed use limited to: Malnad College of Engineering. Downloaded on October 18,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
TABLE II. T RAINING RESULTS
Model Optimizer Scheduler lr Acc Loss Specificity Sensitivity
Original Data
Resnet-50 Adam True 10−4 0.9616 0.3521 0.9607 0.9640
Densenet-201 Adam True 10−4 0.9685 0.3457 0.9706 0.9623
Wide-Resnet-50-2 Adam True 10−4 0.9677 0.3465 0.9657 0.9732
Preprocessed Data
Resnet-50 SGD False 10−3 0.9449 0.3674 0.9450 0.9444
Densenet-201 SGD False 10−3 0.9479 0.3646 0.9480 0.9479
Wide-Resnet-50-2 SGD False 10−3 0.9472 0.3647 0.9511 0.9356
features and ensure that the features can describe the data for [3] M. Mohamed, B. Far, and A. Guaily, “An efficient technique for white
each class. It also helps to classify the data which has similar blood cells nuclei automatic segmentation,” in 2012 IEEE International
Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2012,
forms. pp. 220–225.
From this study, we can also observe that scheduler could [4] Y. Liu and F. Long, “Acute lymphoblastic leukemia cells image analysis
help the model to converge. In addition, the scheduler helped with deep bagging ensemble learning,” in ISBI 2019 C-NMC Challenge:
Classification in Cancer Cell Imaging. Springer, 2019, pp. 113–121.
the model to use the information from training optimally. [5] A. M. Abdeldaim, A. T. Sahlol, M. Elhoseny, and A. E. Hassanien,
However, the scheduler might make the model limits its ability “Computer-aided acute lymphoblastic leukemia diagnosis system based
to reach higher accuracy. In this study, preprocessed data could on image analysis,” in Advances in Soft Computing and Machine
Learning in Image Processing. Springer, 2018, pp. 131–147.
not improve the learning. Preprocessing might decrease the [6] L. H. S. Vogado, R. D. M. S. Veras, A. R. Andrade, F. H. D.
features of the images and make it harder to be differentiated. De Araujo, R. R. V. e Silva, and K. R. T. Aires, “Diagnosing leukemia
Moreover, under sampling method might reduce the variance in blood smear images using an ensemble of classifiers and pre-trained
convolutional neural networks,” in 2017 30th SIBGRAPI Conference on
of the data images because lymphoblasts vary in its forms. Graphics, Patterns and Images (SIBGRAPI). IEEE, 2017, pp. 367–373.
[7] W. Yu, J. Chang, C. Yang, L. Zhang, H. Shen, Y. Xia, and J. Sha,
VI. C ONCLUSION “Automatic classification of leukocytes using deep neural network,” in
2017 IEEE 12th International Conference on ASIC (ASICON). IEEE,
We have compared three models to find the best model to 2017, pp. 1041–1044.
classify nucleus for ALL disease. DenseNet-201 as a model [8] S. Mourya, S. Kant, P. Kumar, A. Gupta, and R. Gupta, “Leukonet: Dct-
with a deep architecture achieved the best overall performance based cnn architecture for the classification of normal versus leukemic
blasts in b-all cancer,” arXiv preprint arXiv:1810.07961, 2018.
on the test set with an AUC of 86.73%. While the model
[9] T. Thanh, C. Vununu, S. Atoev, S.-H. Lee, and K.-R. Kwon, “Leukemia
could achieve a specificity of 93.49%, the sensitivity is still at blood cell image classification using convolutional neural network,”
76.55%. In a field like healthcare, we are more concerned in International Journal of Computer Theory and Engineering, vol. 10,
finding the positive cases. Thus, we need to tune the model no. 2, pp. 54–58, 2018.
[10] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
further or penalize the model more on false negatives to connected convolutional networks,” in Proceedings of the IEEE confer-
improve the sensitivity. On the other hand, we can see that ence on computer vision and pattern recognition, 2017, pp. 4700–4708.
preprocessing the data actually made the performance worse. [11] S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv
preprint arXiv:1605.07146, 2016.
In the future, we also think that it is necessary to be able to [12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
interpret the model itself, i.e. to get the visual explanations recognition,” in Proceedings of the IEEE conference on computer vision
for the decisions that the model make. Thus, employing a and pattern recognition, 2016, pp. 770–778.
method like Grad-CAM [23] can be beneficial. Further work [13] “Wiki - the cancer imaging archive (tcia) public access - cancer
imaging archive wiki,” https://wiki.cancerimagingarchive.net/, (Accessed
also includes using the model to count the cells and compare on 12/24/2019).
the result from this method with the gold standard. [14] R. Duggal, A. Gupta, R. Gupta, M. Wadhwa, and C. Ahuja, “Overlap-
ping cell nuclei segmentation in microscopic images using deep belief
R EFERENCES networks,” in Proceedings of the Tenth Indian Conference on Computer
Vision, Graphics and Image Processing. ACM, 2016, p. 82.
[1] V. Singhal and P. Singh, “Local binary pattern for automatic detection [15] R. Duggal, A. Gupta, and R. Gupta, “Segmentation of overlap-
of acute lymphoblastic leukemia,” 02 2014, pp. 1–5. ping/touching white blood cell nuclei using artificial neural networks,”
[2] S. Mohapatra, D. Patra, and S. Satpathy, “An ensemble classifier CME Series on Hemato-Oncopathology, All India Institute of Medical
system for early diagnosis of acute lymphoblastic leukemia in blood Sciences (AIIMS). New Delhi, India, 2016.
microscopic images,” Neural Computing and Applications, vol. 24, no. [16] R. Duggal, A. Gupta, R. Gupta, and P. Mallick, “Sd-layer: stain
7-8, pp. 1887–1904, 2014. deconvolutional layer for cnns in medical microscopic imaging,” in
Authorized licensed use limited to: Malnad College of Engineering. Downloaded on October 18,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
International Conference on Medical Image Computing and Computer-
Assisted Intervention. Springer, 2017, pp. 435–443.
[17] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz,
T. Greer, B. ter Haar Romeny, J. B. Zimmerman, and K. Zuiderveld,
“Adaptive histogram equalization and its variations,” Computer vision,
graphics, and image processing, vol. 39, no. 3, pp. 355–368, 1987.
[18] P. Garg and T. Jain, “A comparative study on histogram equalization
and cumulative histogram equalization,” International Journal of New
Technology and Research, vol. 3, no. 9, 2017.
[19] L. Torrey and J. Shavlik, “Transfer learning,” in Handbook of research
on machine learning applications and trends: algorithms, methods, and
techniques. IGI Global, 2010, pp. 242–264.
[20] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are
features in deep neural networks?” in Advances in neural information
processing systems, 2014, pp. 3320–3328.
[21] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:
A large-scale hierarchical image database,” in 2009 IEEE conference on
computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
[22] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
arXiv preprint arXiv:1412.6980, 2014.
[23] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
D. Batra, “Grad-cam: Visual explanations from deep networks via
gradient-based localization,” in Proceedings of the IEEE International
Conference on Computer Vision (ICCV), Oct 2017.
Authorized licensed use limited to: Malnad College of Engineering. Downloaded on October 18,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.