Detecting and Recognising
Lung Cancer
Using Convolutional Neural Networks
                            Major Project by
                161112001           Abhishek Pandey
                161112031           Lokesh Lovewanshi
                161112046           Sudhanshu Ranjan
                161112049           Shubham Kose
AGENDA
Abstract
Introduction
Literature Review
Methodology and Work Description
Tools and Technology to be Used
Implementation and Coding
Result Analysis
Conclusion and Future Scope
Abstract
 ●   Lung cancer is one of the most dreadful diseases in the developing countries and its
     mortality rate is 19.4%. Early detection of lung tumor is done by using many imaging
     techniques such as Computed Tomography (CT), Sputum Cytology, Chest X-ray and
     Magnetic Resonance Imaging (MRI).
 ●   The chance of survival at the advanced stage is less when compared to the treatment
     and lifestyle to survive cancer therapy when diagnosed at the early stage of the cancer.
     Manual analysis and diagnosis system can be greatly improved with the implementation
     of image processing techniques.
 ●   Neural network plays a key role in the recognition of the cancer cells among the normal
     tissues, which in turn provides an effective tool for building an assistive AI based cancer
     detection. The cancer treatment will be effective only when the tumor cells are
     accurately separated from the normal cells.
 ●   Classification of the tumor cells and training of the neural network forms the basis for
     the machine learning based cancer diagnosis. This major project presents a
     Convolutional Neural Network (CNN) based technique to classify the lung tumors as
     malignant or benign.
  Introduction
01                A neural network in a modern sense is a network or circuit of artificial neurons,
                  to build an artificial neural network. The connections of the neurons are
Neural
Networks          modeled as weights. A positive weight reflects an excitatory connection, while
                  negative values mean inhibitory connections.
02                CNNs, like neural networks, are made up of neurons with learnable weights
                  and biases. Each neuron receives several inputs, takes a weighted sum over
Convolutional
Neural Networks   them, passes it through an activation function and responds with an output.
                  The whole network has a loss function and all the tips and tricks that was
                  developed for neural networks still apply on CNNs.
03                  ●
                    ●
                        Image Recognition
                        Video Analysis
Uses of
CNNs                ●   Natural Language Processing
                    ●   Drug Discovery
       Literature Review
●   Computer-Aided Diagnostic (CAD) approaches use a filter for enhancement of lesions as a
    preprocessing step for enhancing sensitivity and specificity. Thus, existing filters fail to improve actual
    lesions. Suzuki et al (2005) proposed a supervised filter for enhancement of lesions by use of a
    Massive-Training Artificial Neural Network (MTANN) in a CAD scheme for detection of lung nodules
    in CT. The MTANN filter was trained with actual nodules in CT images to improve actual patterns of
    nodules. By use of the MTANN filter, the sensitivity and specificity of this CAD approach were
    enhanced. With the database of 69 lung cancers, this CAD approach with the MTANN filter achieved
    97% sensitivity with 6.7 false positives (FPs) per section, whereas a conventional CAD technique with
    a difference-image technique achieved 96% sensitivity.
●   In (Nikita, 2012), a sober edge detection method was used which is based on finding the image
    gradient. This method tells that intensity of the image will be maximum where there is a separation of
    two dissimilar regions and thus an edge must exist there. On this basis they found the nodules in CT
    images. In (Parsh, 2011), a new variation level set algorithm without re-initialization was used. They
    also used thresholding to reduce the noise component of the images.
●   In (Sonith, 2012) an overview of entire process for processing digital images for lung cancer detection
    is given in this paper. This paper also describes all the essential steps required for the better
    performance starting from the pre-processing till the very end phase extraction of features
       Literature Review
●   Regarding lung cancer diagnosis, methods proposed so far have dealt mostly with radiology. In image-
    based radiomics features strongly related to survival are extracted from positron emission
    tomography-computed tomography (PET/CT) scans. CNN is employed for classification of lung
    nodule images yielding accuracy of 86.4%. In digital pathology tasks CNNs chave been used on cell
    level for mitosis detection and cell nuclei detection. CAMELYON16 was the first challenge dealing
    with WSIs to detect breast cancer metastases in lymph nodes.
●   Thanks to the availability of large annotated training set in this challenge, it was possible to train
    deeper and more powerful CNN architectures like GoogLeNet, VGG-Net and ResNet. Method that
    gives the best result in this challenge is described in. It performs patch-based classification to
    discriminate tumor patches from normal patches using a combination of 2 GoogLeNet architectures
    where one of them is trained with and another without hard-negative mining.
●   Aim of TUPAC challenge was WSI based mitosis detection in breast cancer tissue and tumor grading
    prediction. In the best performing method ROI regions are firstly extracted from WSI based on cell
    density. This is followed by mitosis detection using ResNet CNN architecture. Finally, each WSI is
    represented by feature vector including the number of mitoses and cells in each patch as well as other
    features derived from statistics.
Methodology and Work Description
For the purpose of the project, we are using Kaggle dataset and LUNA dataset .
The CNN will be developed with variable depths to evaluate the performance of these models
for facial expression recognition.
The first part of the network refers to M convolutional layers that can possess spatial batch
normalization, dropout, and max-pooling in addition to the convolution layer and ReLU
nonlinearity, which always exists in these layers. After M convolution layers, the network is led
to N fully connected layers that always have Affine operation and ReLU nonlinearity, and can
include batch normalization and dropout. Finally, the network is followed by the affine layer
that computes the scores and softmax loss function.
Methodology and Work Description
  Fig. 1 - Non Cancerous Lung      Fig. 2 - Cancerous Lung with Nodule
Methodology and Work Description
The developed model gives the user the freedom to decide about the number of convolutional
and fully connected layers, as well as the existence of batch normalization and max -pooling
layers. Along with batch normalization techniques, regularization was included in the
implementation. Furthermore, the number of filters and layers can be specified by user for the
best results .
Based on the results of previous publications, a decision was taken to create a CNN by oneself
and train it from scratch. A 9-layer CNN with two convolutional layers, two pooling layers, and
4 fully connected dense layers along with a matrix flattening layer. The structure of the CNN is
shown in Fig 3.
Fig. 3 - CNN Architecture Flowchart
Methodology and Work Description
1. Extracting Effective Features
   In this module, first the system will take the image from the dataset taken. Then the
   input image is first checked for the lung x-ray features. In case if the image does not
   contain lung features, then it is not detected. If the input image contains lung features,
   then it detects the features. Lung is detected from the image as shown as Fig. 3
  Fig. 4 - Image Fed to Model                                    Fig. 5 - Extracting Lung
Methodology and Work Description
2. Feature Point Detection
   For lung detection, first - convert the image from an RGB format to a binary format. The next
   step is to find the ribs from the binary image. System will start scanning from the middle of the
   image, after that it will look for continuous white pixels after a continuous black pixel. In this
   the goal is to want to find the maximum width of the white pixel by searching vertical both left
   and right side. Then, if the new width is smaller than half of the previous maximum width, then
   the scan is broken because in that case the scan will reach the diaphragm. Then the lung is cut
   from the starting position of the x ray and its height will be 1.5 multiple of its width. This
   processed image will have the lung, hotspot and body.
                                 Fig. 6 - Feature Point Detection
Tools and Technology to be used
 1.   Software Requirements
       a.   Python
       b.   Keras Library
       c.   Anaconda Navigator
       d.   Numpy                              Library
 2.   Hardware Requirements
       a.   Windows XP or Above
       b.   2GB of RAM
       c.   Any Dual Core Processor or above
        Implementation and Coding
S. No            Layer                                      Shape
 1           Convolution2D   (Filters(64, 3, 3), input_shape = (64, 64, 3), activation =
                             ‘relu’)
 2           MaxPooling2D    (pool_size = (2, 2))
 3           Convolution2D   (Filters(32, 3, 3), activation = ‘relu’)
 4           MaxPooling2D    (pool_size = (2, 2))
 5              Flatten      Flatten the matrix
 6              Dense        (output_dim = 128, activation = ‘relu’)
 7              Dense        (output_dim = 128, activation = ‘relu’)
 8              Dense        (output_dim = 128, activation = ‘relu’)
 9              Dense        (output_dim = 2, activation = ‘softmax’)
Result Analysis
S. No        Image   Output   Accuracy
                              Correct
                              Correct
Result Analysis
S. No        Image   Output   Accuracy
                              Incorrect
                              Incorrect
Result Analysis
S. No        Image   Output   Accuracy
                              Correct
                              Incorrect
Result Analysis
S. No        Image   Output   Accuracy
                              Correct
                              Correct
Result Analysis
S. No        Image   Output   Accuracy
                              Correct
10
                              Correct
Result Analysis
S. No        Image   Output   Accuracy
11
                              Correct
12
                              Incorrect
Result Analysis
S. No        Image   Output   Accuracy
13
                              Correct
14
                              Correct
   Result Analysis
   S. No                 Image                            Output                  Accuracy
  15
                                                                                   Correct
Accuracy on the above images = (Number of correct instances/Number of total instances) x 100
                  Accuracy on the above images = (11/15) x 100 = 73.33%
                            Accuracy on training dataset = 98%
                            Accuracy on the test dataset = 76%
Conclusion and Future Scope
The first novelty in our paper is using the K-means algorithm to pre-classify the pictures into
piles of same slice images, where the DNN can specialize in image classification of same slice
images.
The second novelty is that the additional convolution layer with edge sharpening filters, to
thoroughly search for cancer. Finally, the most novelty is testing our Deep Neural Network
with carcinoma images from Tx stages 2, 3 and 4 and determining at which Tx stage the 2
algorithms can detect the possibility of cancer. The results were analyzed with medical
personnel from the oncology department and were marked as satisfactory to see cancer in T3
phase.
For future work, we plan on making an extra analysis, where we are going to change the DNN
to output 2 values (0 and 1) and determine which one has higher certainty of classification.
This way, we can classify the image not even as being a decimal value between 0.0 or 1.0, but
also compare what proportion is 0 (not cancer) and the way much is 1 (cancer). For extra future
work, almost like Cruz-Roa and Ovalle, who used RGB (color) images to spotlight the realm of
malignant cells, we plan on modifying the DNN to indicate to us where (the location) on the CT
image it's detected a cancer.
Thank you!