A Multi-view Deep Convolutional Neural Networks for Lung Nodule
Segmentation
Shuo Wang, Mu Zhou, Olivier Gevaert, Zhenchao Tang, Di Dong, Zhenyu Liu, and Jie Tian, Fellow, IEEE
Abstract— We present a multi-view convolutional neural morphology operation and region growing methods, CNN
networks (MV-CNN) for lung nodule segmentation. The MV- learns discriminative features that are adaptive to specific
CNN specialized in capturing a diverse set of nodule-sensitive tasks automatically [10]. For instance, Havaei et al. [9]
features from axial, coronal and sagittal views in CT images
simultaneously. The proposed network architecture consists of applied a CNN model for brain tumor segmentation, showing
three CNN branches, where each branch includes seven stacked improved performance over hand-crafted image features. In
layers and takes multi-scale nodule patches as input. The three our study, instead of involving lung nodule shape hypothesis
CNN branches are then integrated with a fully connected layer or tuning model-based parameters, we propose a multi-view
to predict whether the patch center voxel belongs to the nodule. convolutional neural networks (MV-CNN) [11] to distinguish
The proposed method has been evaluated on 893 nodules from
the public LIDC-IDRI dataset, where ground-truth annotations nodule voxels from background voxels in CT imaging.
and CT imaging data were provided. We showed that MV-CNN Our model has learned nodule-sensitive features from 0.34
demonstrated encouraging performance for segmenting various million voxel patches automatically and revealed appealing
type of nodules including juxta-pleural, cavitary, and non- segmentation results for various type of lung nodules.
solid nodules, achieving an average dice similarity coefficient Overall, our contributions are as follows: 1) The proposed
(DSC) of 77.67% and average surface distance (ASD) of 0.24,
outperforming conventional image segmentation approaches. MV-CNN can segment lung nodules in CT images without
any shape hypothesis or user-interactive parameter settings,
I. I NTRODUCTION and it learns discriminative nodule-sensitive features auto-
Automated lung nodule segmentation from Computed matically from a large amount of image data; 2) We propose
Tomography (CT) images provides valuable information for a multi-scale patch strategy as the input of the MV-CNN to
lung cancer computer-aided diagnosis. Given the fact of capture both detailed textures and nodule shape information;
growing volumes of lung nodule CT images, developing 3) The MV-CNN integrates three branches that can learn
robust automated segmentation model is of great clinical deep features from three orthogonal image views in CT
importance to avoid tedious manual processing and reduce (Fig. 1).
inter-observer variability from human experts. This paper is organized as follows. A detailed description
Despite the development of different approaches for lung of the MV-CNN is presented in Section II. Experimental
nodule segmentation in recent years [1]–[4], achieving accu- datasets and evaluation criteria are introduced in Section III.
rate segmentation performance continues to require attention Section IV provides the quantitative performance. Finally,
because of the following two primary challenges. First, conclusion and further discussions are presented in Sec-
intensity-based methods with morphology operations [1], [2] tion V.
and region growing [3] perform well on isolated nodules but II. M ETHOD
fail to segment nodules in challenging locations, especially
for nodules attached to surroundings that usually appear Our MV-CNN is designed to be an efficient CNN-based
in CT. Second, sophisticated model-based methods [4], [5] architecture for lung nodule segmentation. It aims to con-
often involve shape hypothesis or user-interactive parameters vert lung nodule segmentation into CT voxel classification
that can be sensitive to different type of lung nodules. (Fig. 1). Given a voxel in CT image, we extract three multi-
Our study is motivated by recent success of applying con- scale patches centered on this voxel as the input to the CNN
volutional neural networks (CNN) for medical image analy- model and predict if this voxel belongs to the nodule.
sis and pattern recognition [6]–[9]. As opposed to traditional The proposed MV-CNN incorporates three branches that
process voxel patches from axial, coronal and sagittal view
This paper is supported by the National Natural Science Foundation of CT images respectively. The three branches share the same
China under Grant No. 81227901, 81527805. structure that consists of six convolutional layers (C1 to C6),
+ S. Wang (wangshuo2014@ia.ac.cn), ∗ D. Dong, ∗ Z. Liu and ∗ J. Tian
(tian@ieee.org) are with the CAS Key Laboratory of Molecular Imaging, two max-pooling layers (Max pooling 1, 2), and one fully
Institute of Automation, Chinese Academy of Sciences; Beijing Key Labo- connected layer (F7). The six convolutional layers in each
ratory of Molecular Imaging; University of Chinese Academy of Sciences, CNN branch are divided into three blocks, where each block
Beijing 100190, China. ∗ are corresponding authors. + are co-first authors.
+ M. Zhou and O. Gevaert are with the Stanford Center for Biomedical shares the exact same structure including two convolutional
Informatics Research (BMIR), Department of Medicine, Stanford Univer- layers of kernel size 3×3. Between each block, max pooling
sity, CA 94305, USA. operation with pooling window 2 × 2 and pooling step 2 is
Z. Tang is with the School of Mechanical, Electrical & Information
Engineering, Shandong University, Weihai, Shandong Province, 264209, applied for feature selection. At the end of the CNN model,
China. the three branches are merged through a fully connected
978-1-5090-2809-2/17/$31.00 ©2017 IEEE 1752
C1 C2 C3 C4 C5 C6
32@ 32@ 48@ 48@ 64@ 64@ F7 F8
3x3 3x3 3x3 3X3 3x3 3x3
Rescaled to
5 2x5x5
64 64
48 48 9x9 9x9
65 Segmentation
18x18 18x18 256
Axial View 32 32 Result
35x35 35x35
Max Pooling 1 Max Pooling 2
Coronal View
Sagittal View
Axial View
Coronal View
Sagittal View
Fig. 1. Illustration of MV-CNN architecture. This network contains three branches aiming at capturing features from axial, coronal and sagittal image
views and each branch takes a two-scale patch as input. Each CNN branch includes six convolutional layers (C1 to C6), two max-pooling layers,
and a fully connected layer (F7). The three branches are finally merged in a fully connected layer (F8). The convolutional kernel size is denoted as
f ilter number@f ilter width × f ilter height (i.e., 32@3 × 3 represents 32 filters of kernel size 3 × 3). The number below each layer indicates the
feature map size after convolution. The bottom figure shows the feature maps of the C1 layer on three branches, indicating that the learned filters can
capture different characteristics of nodules from input CT image (e.g., edge or solid part of a nodule).
layer (F8) to outcome the voxel label. A detailed architecture X
is illustrated in Fig 1. After each convolutional layer and pk = exp (ok ) / exp (oh ) (1)
the first fully connected layer (F7), a parametric rectified h⊆{0,1}
linear unit (PReLU) [12] is used as nonlinear activation
where k = 0 and k = 1 represent non-nodule and nodule
function, and batch normalization is applied for training
voxels respectively.
acceleration [13].
The goal of network training is to maximize the probability
The input to each CNN branch is a two-channel multi- of the correct class. This is achieved by minimizing the cross-
scale patch rather than a single-scale patch. The two scale entropy loss for each training sample. Suppose that y is the
patches are of size 65 × 65 and 35 × 35, and are scaled to true label for a given input patch that belongs to {0,1}, the
35 × 35 using third-order spline interpolation to form a two- loss function is defined as:
channel patch. The small scale patch contains detailed texture
information which is important for identifying voxel labels. 1 X
N
Meanwhile, the large scale patch provides a broad scope L (W ) = − [yn log yˆn + (1 − yn ) log (1 − yˆn )]+λ|W |
N n=1
of the nodule shape information. The multi-view structure
(2)
takes 3-D information into consideration without involving
much redundent image information compared with inputting where yˆn represents the predicted probability from MV-
a whole 3-D volume [14]. CNN and N is the number of samples. To avoid over fitting,
the 1−norm regularization is used on the model weights W .
In the case of the output layer (F8) consisting of two units, λ controls the regularization strength, and is set to 5×10−4 in
the activation values are fed into a binary softmax function our model. The loss function is minimized during the model
that are converted into probability distributions over the class training process by computing the gradient of L over the
labels. Namely, suppose that ok is the k-th output of the network parameters W . During this process, the weights of
network for a given input, the probability assigned to the the CNN are initialized with the Xavier algorithm [15], and
k-th class is the output of the softmax function: they are updated by the stochastic gradient descent (SGD)
1753
algorithm using a momentum of 0.9, and a batch size of 128. were randomly selected to balance the training label. For
The learning rate is set to 6 × 10−5 initially, and is reduced each sampled voxel, multi-scale patches as presented in
by a factor of 10 after every four epochs. Section II were extracted from axial, coronal, and sagittal
view images. As a result, 0.34 million patches were used
III. E XPERIMENT for model training. We identified that DSC and ASD values
A. Dataset determined on the validation set stabilized after 20 epochs of
We used the public Lung Image Database Consortium and training, therefore we chose 20 epochs for MV-CNN training.
Image Database Resource Initiative (LIDC-IDRI) [16] for After the model training was completed through CAFFE
experimental evaluation. All the nodules in this dataset are Toolkit [18], we reported segmentation results on the testing
annotated by up to four board-certified radiologists. Only the set.
nodules with annotations from all four radiologists are used IV. R ESULTS
in our experiment (a total of 893 nodules). Nodule diameters
A. Quantitative performance
in this dataset range from 2.03 mm to 38.12 mm, and the
slice interval ranges from 0.45 mm to 5.0 mm. Because To evaluate the performance of the proposed MV-CNN,
of the variability among four different radiologists, a 50% two widly used methods: level set and graph cut were
consensus criterion [1] is adopted to generate a single truth used for comparison which are provided in the public Fiji
boundary. software [19] and the parameters were optimized through
To perform a rigorous evaluation of our method, we grid searching with this software. As listed in Table I, the
randomly partitioned the 893 nodules into three subsets TABLE I
including training, validation, and testing sets that are com- M EAN AND STANDARD DEVIATION OF QUANTITATIVE RESULTS FOR
prised of 450, 50, and 393 nodules respectively. We train the VARIOUS SEGMENTATION METHODS . T HE BEST PERFORMANCE IS
MV-CNN on the training set, and the validation set is used INDICATED IN BOLD FONT.
for determining the CNN training epoch number. Finally, the
testing set is used for evaluating the model performance. DSC (%) ASD (mm) SEN (%) PPV (%)
B. Evaluation criteria Level Set 60.09(16.83) 0.52(0.28) 65.55(21.61) 68.34(25.76)
Graph Cut 69.52(17.32) 0.47(0.31) 81.15(14.29) 65.43(25.18)
Given the ground truth segmentation Gt and automated MV-CNN 77.67(15.71) 0.24(0.33) 83.72(20.71) 77.58(15.83)
segmentation result Auto, the dice similarity coefficient
(DSC) and average surface distance (ASD) are used as proposed MV-CNN outperformed graph cut and level set
the primary evaluation criteria for assessing the automatic methods. Moreover, Fig. 2 shows the DSC score distribution
segmentation accuracy [17]. In addition, we also use the of the LIDC-IDRI testing set.
sensitivity (SEN) and positive predictive value (PPV) to
demonstrate the voxel classification accuracy. Full definitions
are listed in Eq. 3 to Eq. 5:
T
2 · V (Gt Auto)
DSC = (3)
V (Gt) + V (Auto)
1
ASD = mean min d (i, j) + mean min d (i, j)
2 i∈Gt j∈Auto i∈Auto j∈Gt
(4) Fig. 2. DSC score distribution of the LIDC-IDRI testing set.
T T
V (Gt Auto) V (Gt Auto) B. Visualization
SEN = , PPV = (5)
V (Gt) V (Auto) The segmentation results are visualized to allow the com-
where V is the volume size counted in voxels and d(i,j) parison of different approaches. We demonstrate six repre-
denotes the Euclidean distance between voxel i and voxel sentative nodules from the LIDC-IDRI testing set (Fig. 3).
j measured in millimeters. For isolated solid nodules (L1), both our method and the
state-of-the-art methods perform well. However, when ex-
C. Model Training process amining nodules attached to surrounding tissues, the level
When generating training samples, we first identified the set and graph cut methods lead to reduced performance
3-D bounding cuboid for nodules in the training set, after because they are unable to identify nodules from pleura
which we extended the size of the cuboid by adding eight (L2) or vessels (L3). In contrast, the proposed MV-CNN
voxels along each axis to include additional non-nodule remains robust when segmenting such nodules, which can
tissues inside. Afterwards, one quarter of the number of be probably attributed to the feature learning ability of the
voxels in this expanded cubic were sampled uniformly. MV-CNN in capturing discriminative features from different
Finally, equal numbers of nodule and non-nodule samples image views. For cavitary (L4) and calcific (L5) nodules,
1754
the network depth will affect the model performance.
R EFERENCES
[1] T. Kubota, A. K. Jerebko, M. Dewan, M. Salganicoff, and A. Krish-
nan, “Segmentation of pulmonary nodules of various densities with
morphological approaches and convexity models,” Medical Image
Analysis, vol. 15, no. 1, pp. 133–154, 2011.
[2] T. Messay, R. C. Hardie, and S. K. Rogers, “A new computationally
efficient cad system for pulmonary nodule detection in ct imagery,”
Medical Image Analysis, vol. 14, no. 3, pp. 390–406, 2010.
[3] J. Dehmeshki, H. Amin, M. Valdivieso, and X. Ye, “Segmentation of
pulmonary nodules in thoracic ct scans: a region growing approach,”
IEEE transactions on medical imaging, vol. 27, no. 4, pp. 467–480,
2008.
[4] A. A. Farag, H. E. A. El Munim, J. H. Graham, and A. A. Farag, “A
novel approach for lung nodules segmentation in chest ct using level
sets,” IEEE Transactions on Image Processing, vol. 22, no. 12, pp.
5202–5213, 2013.
[5] M. Keshani, Z. Azimifar, F. Tajeripour, and R. Boostani, “Lung nodule
segmentation and recognition using svm classifier and active contour
modeling: a complete intelligent system,” Computers in biology and
medicine, vol. 43, no. 4, pp. 287–300, 2013.
[6] W. Shen, M. Zhou, F. Yang, C. Yang, and J. Tian, “Multi-scale
convolutional neural networks for lung nodule classification,” in Inter-
national Conference on Information Processing in Medical Imaging.
Springer, 2015, pp. 588–599.
[7] W. Shen, M. Zhou, F. Yang, D. Dong, C. Yang, Y. Zang, and
J. Tian, “Learning from experts: Developing transferable deep features
for patient-level lung cancer prediction,” in International Conference
on Medical Image Computing and Computer-Assisted Intervention.
Springer, 2016, pp. 124–131.
[8] W. Zhang, R. Li, H. Deng, L. Wang, W. Lin, S. Ji, and D. Shen,
“Deep convolutional neural networks for multi-modality isointense
infant brain image segmentation,” NeuroImage, vol. 108, pp. 214–224,
2015.
[9] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Ben-
gio, C. Pal, P. Jodoin, and H. Larochelle, “Brain tumor segmentation
with deep neural networks,” Medical Image Analysis, 2016.
[10] W. Shen, M. Zhou, F. Yang, D. Yu, D. Dong, C. Yang, Y. Zang, and
J. Tian, “Multi-crop convolutional neural networks for lung nodule
malignancy suspiciousness classification,” Pattern Recognition, 2016.
[11] A. A. A. Setio, F. Ciompi, G. Litjens, P. Gerke, C. Jacobs, S. J.
Fig. 3. Segmentation visualization. From left to right: nodule with van Riel, M. M. W. Wille, M. Naqibullah, C. I. Sánchez, and
ground truth, MV-CNN segmentation, level set segmentation, and graph cut B. van Ginneken, “Pulmonary nodule detection in ct images: false
segmentation. L1-L6 are nodules of different types from the LIDC-IDRI positive reduction using multi-view convolutional networks,” IEEE
testing set. transactions on medical imaging, vol. 35, no. 5, pp. 1160–1169, 2016.
[12] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification,” in
level set and graph cut methods only identify part of the Proceedings of the IEEE International Conference on Computer
Vision, 2015, pp. 1026–1034.
nodules. However, the MV-CNN is able to reserve the [13] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
complete nodule shape. In addition, ground-glass opacity network training by reducing internal covariate shift,” in Proceedings
(GGO) nodules (L6) present another challenge for the level of The International Conference on Machine Learning, 2015, pp. 448–
456.
set and graph cut methods because they tend to show over- [14] M. Lai, “Deep learning for medical image segmentation,” arXiv
segmentation due to the low contrast between nodules and preprint arXiv:1505.02000, 2015.
normal lung field, whereas the proposed method performs [15] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” in International conference on artificial
reasonably well in capturing the nodule shape with GGO. intelligence and statistics, 2010, pp. 249–256.
[16] S. G. Armato III, G. McLennan, L. Bidaut, M. F. McNitt-Gray, C. R.
V. C ONCLUSION Meyer, A. P. Reeves et al., “The lung image database consortium (lidc)
and image database resource initiative (idri): a completed reference
In this paper, we presented a deep learning model database of lung nodules on ct scans,” Medical physics, vol. 38, no. 2,
pp. 915–931, 2011.
MV-CNN for lung nodule segmentation, integrating three [17] Y. Gao, Y. Shao, J. Lian, A. Z. Wang, R. C. Chen, and D. Shen,
branches to extract features from three orthogonal image “Accurate segmentation of ct male pelvic organs via regression-based
views in CT. An advantage of the proposed model is that deformable models and multi-task random forests,” IEEE transactions
on medical imaging, vol. 35, no. 6, pp. 1532–1543, 2016.
it does not involve any nodule shape hypothesis or user- [18] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
interactive parameter settings. After training on 0.34 million S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for
voxel patches, MV-CNN achieved encouraging performance fast feature embedding,” in Proceedings of the 22nd ACM international
conference on Multimedia. ACM, 2014, pp. 675–678.
on 393 nodules from the public LIDC-IDRI dataset (DSC = [19] J. Schindelin, I. Arganda-Carreras, E. Frise et al., “Fiji: an open-source
77.67% and ASD = 0.24). In future work, we plan to train the platform for biological-image analysis,” Nature methods, vol. 9, no. 7,
model with a larger amount of dataset and explore whether pp. 676–682, 2012.
1755