=Paper=
{{Paper
|id=Vol-2031/p7
|storemode=property
|title=Algorithm for the Detection of Breast Cancer in Digital Mammograms Using Deep Learning
|pdfUrl=https://ceur-ws.org/Vol-2031/p7.pdf
|volume=Vol-2031
|authors=Natalia Pirouzbakht,Jose Mejía
}}
==Algorithm for the Detection of Breast Cancer in Digital Mammograms Using Deep Learning==
<pdf width="1500px">https://ceur-ws.org/Vol-2031/p7.pdf</pdf>
<pre>
RCCS+SPIDTEC2 2017, PIROUZBAKHT & MEJÍA                                                                                                        46


    Algorithm for the Detection of Breast Cancer in
     Digital Mammograms Using Deep Learning
                         Natalia Pirouzbakht, Jose Mejı́a , Eléctrica y Computación, IIT/UACJ.

     Abstract—Breast cancer is one of the most frequent malignant tumors in women worldwide, the detection of this disease in time
     increases the possibility of receiving a less aggressive treatment and increases the survival rate. In this paper, we developed a cancer
     detection system that could be beneficial to help radiologists in cancer detection. To this end, we used a deep-learning network
     architecture. The proposed network consists of three convolutional layers followed each by pooling, and finally, four full connected
     layers provided the output of the network. Here, we also proposed to feed up the net with contrast-enhanced images to improve
     performance.

     Index Terms—Deep learning, mammography, breast cancer, convolutional neural network.

                                                                        F

1   I NTRODUCCI ÓN


N      OWADAYS , breast cancer is the most frequent malig-
       nant tumor causing the highest number of deaths in
women worldwide [14]. In Mexico, in 2014, of the total
                                                                            with false positive reduction using support vector machines,
                                                                            where they obtained a sensitivity of mass detection of 78.2%
                                                                            with a specificity of 1.48 false positives per image. Finally, in
number of cancer cases diagnosed in the population over                     2016, T. Kooi et al. [10], worked on large scale deep learning
20 years of age, the breast is the one with the greatest                    for computer aided detection of mammography lesions.
impact with 19.4%. In the same year, the mortality rate per                 Their research, offered a direct comparison between an ad-
malignant breast tumor is 15 deaths per 100,000 women over                  vanced mammography CAD system, based on a set of man-
20 years of age. In 2015, the incidence of malignant breast                 ually designed features and a convolutional neural network,
tumor is 14.80 new cases per 100,000 people. Globally, an                   with the aim of having a system that can, ultimately, read
estimated 1.38 million new cases and 458,000 deaths are                     mammograms independently. Later in 2016, S. Suzuki et al.
detected each year [8]. The women who come to perform                       [11] adopted a convolutional neural network architecture
a mammography annually, can detect this disease in time                     (DCNN) that consisted of eight layers with weight, includ-
and therefore the possibility of receiving a less aggressive                ing 5five convolutional layers, and three fully-connected
treatment. Although this test has been effective in early                   layers in their study. They first trained the DCNN using
detection, there is still a high percentage of false positives              about 1.2 million natural images for classification of 1,000
and false negatives, which causes patients to undergo more                  classes. Then, they modified the last fully-connected layer
invasive unnecessary treatment and / or testing causing                     of the DCNN and subsequently trained the DCNN using
anxiety, increased costs, and long-term psychosocial dam-                   1,656 regions of interest in mammographic image for two
age. Young women are more likely to get false negatives                     classes classification:mass and normal. The detection test
and positives. The main cause is the density of the breast,                 was conducted on 198 mammographic images including
the denser it is, the greater is the probability of obtaining               99 mass images and 99 normal images. The experimental
erroneous results since the visualization of the neoplasm is                results showed that the sensitivity of the mass detection
more difficult. Also, false positives often occur when women                was 89.9% and the false positive was 19.2%. J. Arevalo et
take estrogen, when they have had biopsies or when they                     al. [1], worked on a hybrid CNN method to learn image-
have a family history of breast cancer. According to the                    based features in a supervised way for mammography mass
federally funded Breast Cancer Surveillance Consortium in                   lesion classifications. The developed method comprises two
the United States, for every 1,000 women who undergo the                    main stages: (i) preprocessing to enhance image details
test, 100 are further tested, but only 5 have breast cancer [2],            and (ii) supervised training for learning both the features
[4], [5], [13].                                                             and the breast imaging lesions classifier, as result, their
    Computer-aided detection in the field of medicine, was                  method exhibited significant improved performance, such
developed among other things to assist radiologists in the                  as histogram of oriented gradients (HOG) and histogram of
interpretation of mammograms [6]. In 2014, M. Tan et al.                    the gradient divergence (HGD), increasing the performance
[19], worked on reducing false positives recalls using a com-               from 0.787 to 0.822 in terms of the area under the ROC curve
puterized mammographic image feature analysis scheme,                       (AUC). Furthermore, in 2017, W. Sun, T.Tseng, J. Zhang
where they analyzed the global mammogram texture and                        and W. Qian [17], developed a graph based semi-supervised
density characteristics calculated from four-view images                    learning (SSL) scheme using deep convolutional neural net-
with the help of the technique of artificial neural networks.               work (CNN) for breast cancer diagnosis with a small portion
    In the same year, X. Liu and Z. Zeng [11] proposed                      of labeled data in training set. Four modules were included
a new automatic mass detection method for breast cancer                     in the diagnosis system: data weighing, feature selection,
RCCS+SPIDTEC2 2017, PIROUZBAKHT & MEJÍA                                                                                           47

                                                                       data using contrast enhancement in the contourlet domain.
                                                                       We expect that this prepossessing helps the network to
                                                                       generalize even with low data volumes in training the set.
                                                                       The preprocessing enhances several features in the images,
                                                                       such as microcalcifications that could help detect a cancer
                                                                       case more easily.
Figure 1. Preprocessing the image. a) Typical mammogram image his-
togram. b) Binary image with two objects: breast and label artifact.
                                                                          The contributions of this study are:
                                                                             •    Preprocessing of the mammogram images using the
                                                                                  contourlet transform
                                                                             •    A new neural network topology of layers adapted to
                                                                                  the task of breast cancer detection.
                                                                           The rest of the paper is organized as follows. In section
                                                                       II we describe our proposed model to detect breast cancer
                                                                       cases, Section III experiments and results are showed, finally
Figure 2. Processed image with the NSCT method.                        conclusions are provided in Section IV.


                                                                       2     M ETHODS
                                                                       In this section, we describe the proposed algorithm which is
                                                                       composed of two stages. The first stage, described in section
                                                                       2.1, consists in the preprocessing of data, where the images
                                                                       are prepared to be fed into the network. Finally, a second
Figure 3. A mammogram image, a) with label artifact is in the left     stage, which consists on feed the data to a convolutional
superior corner, b) without the artifact.                              neural network, is described in section 2.2 were we outline
                                                                       the proposed network topology.

dividing co-training data labeling, and CNN. They achieved
an area under the curve (AUC) of 0.8818, and the accuracy              2.1       Preprocessing of the data
of CNN was 0.8243 using the mixed labeled and unlabeled                The raw images from the data base of mammogram images
data.                                                                  are no suitable to be feed up directly into the network
    One of the difficulties facing the mammography study               because they have a certain number of artifacts and because
is that it generally has low contrast, making it difficult for         of the high dynamic range.
radiologists to interpret results. In addition, it has been                 To alleviate this, we began the preprocessing of the
shown that the mammogram is susceptible to false positives             images by first removing the label artifact that all images
and false negatives.                                                   of the data base contain, see Figure 3a. For this end, we
    A study conducted in the United States in 2015 showed              used binary image techniques. We obtained a binary image
that women between 40 and 49 years of age constitute the               from the original in order to separate foreground (objects)
highest percentage of false positive mammography results               from background, we selected a suitable threshold using
with the recommendation to perform other studies (33.1%).              the histogram of the image. The threshold is obtained as the
On average, 10% of 1,000 women who get a mammography                   value of intensity in the middle between the mean intensity
will have to undergo further tests, but only 5 of that 10%             of the background and the mean intensity of the object.
actually have breast cancer. In the case of false negatives, 6%        Next, we assigned a “0” to the intensity of the pixels of
to 46% of women with invasive cancer will receive negative             the background or black value, while to the pixels in the
mammograms, especially if they are young or have dense                 objects or foreground we assigned a “1” or white value, see
breasts [3], [13].                                                     Figure 1b.
    The development of a cancer detection system could be                   Once the binary image is obtained we found the objects
beneficial to help radiologists in their interpretation and            in the image as sets of white pixels connected using an 8-
achieve a better diagnosis. In addition, the adoption of a sys-        neighborhood. Then we filtered the objects by area, that is,
tem could reduce the workload of experts. Furthermore, in              we only kept objects with a certain area, in our experiments
terms of economic benefit, a detection system could achieve            an area of 1000 was sufficient to filter out the object that
a cost reduction as it could eliminate double reading, in              contains the chest area from the label artifacts that have
addition to having a faster diagnosis.                                 less area, this value was obtained empirically from a set
    Therefore, the development of an algorithm that by                 of 20 images, since the proportion of the area of the label
means of deep learning techniques can determine if a digital           regarding to the breast is almost constant in all images, the
mammography presents or not breast cancer, could help                  value found, worked for the entire database. We used the
radiologist in reducing the rate of false positives and nega-          filtered binary image as a mask to further filter the original
tives, being this of importance.                                       image in order to remove the label artifacts, an example of
    In this paper, an approach to detect mammograms with               the result obtained is shown in Figure 3b.
a possible tumor is presented, our approach is based on a                   The next step in the preprocessing was to equalize the
Deep learning architecture. We proposed to preprocess the              intensity values in the image and reduce its dynamic range.
RCCS+SPIDTEC2 2017, PIROUZBAKHT & MEJÍA                                                                                                 48

The original images in the database have a dynamic range                    layer takes large images and shrink them down. We used
of 0-65536 values of intensity, that, besides occupying much                three pooling layers of size 2 × 2 with a stride of 2 and
space, is not fully utilized, see the Figure 1a. This could                 the process consists of walking a small window across a
affect the time or success of network training because only                 filtered image of the convolution layer output and taking
a portion of the dynamic range provides information. We                     the maximum value from the window so it preserves the
reduced the dynamic range by first equalizing the image                     best fits of each feature within the window.
intensity using the technique of histogram equalization [7]                      Finally, we used four fully connected layers, identified
and using a mapping to the range of 0 to 255.                               as ip1, ip2, ip3 and ip4, each of 105, 25, 7 and 2 neurons
    The final preprocessing step was a contrast enhance-                    respectively which takes every single value and translate
ment, for this end we used the technique used in [12], this                 them into votes. We used the rectified linear unit (RelU) as
improves the contrast of all structures in the mammogram,                   nonlinearity activation function.
and improves visibility of small lesions such as microcal-                       In this work, we only had two categories, images with
cifications, which are known to be an indicative of lesions                 and without cancer, so we ended ip4 with two neurons.
such as tumors [15], [16]. We expected that this helped the                 The obtained votes are expressed as weights between each
network in learning specially improving the generalization                  value. Then, the answer with the most votes wins and finally
when using small databases of images, which is the case of                  is declared the category of the input. The network was
the mammogram database used.                                                implemented using the Caffe framework described in [9].
    Later on, we described the method to enhance the mam-
mogram, for further details see [12]. The process begins by                 3   R ESULTS
transforming the mammogram using the nonsubsampling
                                                                            This section contains the results of training and testing
contourlet transform.
                                                                            the proposed network with the mammogram database. All
                                                                            experiments were performed on a computer with a Core i7-
                           Y = N SCT (I)
                                                                            6700HQ, 2.6GHz × 8 processor and 31.3 GB of RAM, no
    Where, I is a mammogram image, N SCT (·) is the                         GPU was used.
nonsubsampling contourlet transform operator, and Y is the                      The database used is publicly-available provided by the
mammogram image in the transformed domain.                                  group Health Cooperative for “The Digital Mammography
    This transform decomposes the input image, I , in                       DREAM challenge”. The dataset is composed by 500 mam-
several subbands yi ,j , that is Y is a set of subbands                     mogram images, in different sizes ranging from 3328x2560
{y1,1 , y2,1 , . . . , yi,j , . . . }, where i is the number of level and   to 5928x4728 pixels in DICOM format. The database also
j in the number of direction in the transform.                              includes annotated files to identify normal from cancer
    The subbands of Y , are then processed using                            cases.
                      (                                                         To speed up the training process, we changed the origi-
            0           w1 yi,j (n1 , n2 ) if bi,j (n1 , n2 ) = 0
          yi,j =                                                  ,         nal format to portable network graphics (png), and reduced
                        w2 yi,j (n1 , n2 ) if bi,j (n1 , n2 ) = 1           the size of all images to 208 x 208, with one channel or gray
where y ’ is the processed subband, w1 and w2 are weights                   scale.
used for the tissue and microcalcifications respectively, bi ,j                 Since the cases with cancer were only 32 of 500 cases, we
is a binary image where points of high gradient are the                     selected the training set as 29 + 41 = 80 images, with 29
foreground, and (n1 ,n2 ) are the coordinates of the subband                of the images presenting cancer cases and the rest normal
processed. In this work, we used the values suggested in                    cases, we used a test set composed of 3 images with cancer
[12] for the weights. In Figure 2, it is show an example of an              and 7 without cancer.
image processed with this technique.                                            The training phase consisted of 4000 iterations, which
                                                                            were completed in 1 hour and 20 minutes approximately.
2.2   Net Architecture                                                      We tested the resulting network in the test set, obtaining
                                                                            100% of accuracy.
A Convolutional Neural Network consists of a number of
                                                                                In Figure 5, is shown the final filter weights of the first
convolutional, pooling, and fully connected layers. In our
                                                                            convolution layer, we note that it is difficult to visually
proposed network, see Figure 4, the first step is a convolu-
                                                                            determine a predominant pattern or characteristic in data,
tional layer, where we used 30 filters, with a kernel size of
                                                                            that could have used by the network in its classification task.
5 x 5. To calculate the match of a feature to a patch of the
image, each pixel in the kernel is multiplied by the value
of the corresponding pixel in the image. To complete the                    4   C ONCLUSION
convolution, we repeat the process, lining up the kernel with               In this paper, a novel algorithm for detecting breast cancer
every possible image patch.                                                 is presented. We preprocessed the mammogram image to
     The feature map, it’s a map where in the image the                     remove artifact, and enhanced contrast by means of the
feature is found, and as result, we get a set of filtered images,           NSCT, subsequently we fed up the image to a deep neural
one for each of the filters. It is possible to repeat this process          network. We obtained favorable results, which we attributed
as many times as wanted, therefore in this work we used                     to the preprocessing of the images in the database that helps
3 convolutional layers of the same size but with different                  to enhance the structure of the mammogram. Thus, this
filters, 30, 50 and 40 respectively.                                        preprocessing facilitated that the filters in the convolutional
     The next step is the pooling layer, also known as max-                 layers were able to adapt and obtain characteristics of im-
pooling because we chose the maximum as statistic. This                     portance to classify correctly these images, even though the
RCCS+SPIDTEC2 2017, PIROUZBAKHT & MEJÍA                                                                                                                    49


Figure 4. Network architecture used in this work.


                                                                                 [9]  Jia Y., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R.,
                                                                                      and Darrell T. (2014). Caffe: Convolutional architecture for fast feature
                                                                                      embedding. Proceedings of the 22nd ACM international conference on
                                                                                      Multimedia, pp. 675-678.
                                                                                 [10] Kooi T. et al. (2016). Large scale deep learning for computer aided
                                                                                      detection of mammographic lesions, Medical Image Analysis, vol. 35,
                                                                                      pp. 303-312.
                                                                                 [11] Liu X., and Zeng Z. (2014). A new automatic mass detection method for
                                                                                      breast cancer with false positive reduction, Neurocomputing, vol. 152,
                                                                                      pp. 388-402.
                                                                                 [12] Mejia J., Domı́nguez H. D. J. O., Villegas O. O. V., Sánchez V. G. C.,
                                                                                      and Maynez L. O. (2009). The nonsubsampled contourlet transform for
                                                                                      enhancement of microcalcifications in digital mammograms, In Mexican
                                                                                      International Conference on Artificial Intelligence, Springer, Berlin,
Figure 5. Filter weights for the first convolutional layer, after the network         Heidelberg, pp.292-302.
is fully trained using 4000 iterations.                                          [13] Myers E. R. et al. (2015). Benefits and Harms of Breast Cancer
                                                                                      Screening: A Systematic Review, Jama, vol. 314, no. 15, pp. 1615-1634.
                                                                                 [14] Reynoso-Noverón N., Villaseñor-Navarro Y., Hernández-Ávila M.,
                                                                                      and Mohar-Betancourt A. (2013). Carcinoma in situ e infiltrante iden-
training data base was small. As a further work, we suggest                           tificado por tamizaje mamográfico oportunista en mujeres asintomáticas
to test the algorithm with a larger database, in order to have                        de la Ciudad de México, Salud pública de México, vol. 55, no. 5, pp.
                                                                                      469-477.
a better idea of its performance, and avoid a possible over-
                                                                                 [15] Sickles E. A. (1984). Mammographic Features of Early Breast Cancer,
fitting.                                                                              American Journal of Roentgenology, vol. 143, pp. 461 - 464.
                                                                                 [16] Sickles E. A. (1986). Mammographic Features of 300 Consecutives
                                                                                      Nonpalpable Breast Cancers, American Journal of Roentgenology, vol.
                                                                                      146, pp. 661 - 663.
R EFERENCES                                                                      [17] Sun W., Tseng T., Zhang J. and Qian W. (2017). Enhancing deep
[1]   Arevalo J. et al. (2016). Representation learning for mammography               convolutional neural network scheme for breast cancer diagnosis with
      mass lesion classification with convolutional neural networks, Elsevier,        unlabeled data, Elsevier, vol. 57, pp.4-9.
      vol. 37, pp-248-257.                                                       [18] Suzuki S. et al. (2016). Mass Detection Using Deep Convolutional
[2]   Brodersen J., and Siersma V. D. (2013). Long-Term Psychosocial                  Neural Network for Mammographic Computer–Aided Diagnosis, in
      Consequences of False-Positive Screening Mammography, Annals of                 Proceedings of the SICE Annual Conference 2016, pp. 1382-1386.
      Family Medicine, vol. 11, no. 2, pp. 106-115.                              [19] Tan M., Pu J., and Zheng B. (2014). Reduction of false-positive recalls
[3]   National Cancer Institute. (2017, February 22). Breast Cancer                   using a computerized mammographic image feature analysis scheme,
      Screening (PDQ R )–Health Professional Version [Online]. Available:             Physics in Medicine and Biology, vol. 59, no. 15.
      https://www.cancer.gov/types/breast/hp/breast-screening-
      pdq.
[4]   American Cancer Society. (2017). Limitations of Mammograms
      [Online]. Available:
      https://www.cancer.org/cancer/breast-cancer/screening-
      tests-and-early-detection/mammograms/limitations-of-
      mammograms.html.
[5]   Elmore J. G., Barton M. B., Moceri V. M., Bolk S., Arena P. J.,
      and Fletcher S. W. (1998). Ten-Year Risk of False Positive Screening
      Mammograms and Clinical Breast Examinations, The New England
      Journal of Medicine, vol. 338, no. 16, pp. 1089-1096.
[6]   Fenton J. et al. (2007). Influence of Computer-Aided Detection on
      Performance of Screening Mammography, The New England Journal
      of Medicine, vol. 356, no. 14, pp. 1399-1409.
[7]   Gonzalez R., Eddins S., and Woods R. E. (2004). Digital image
      processing using MATLAB.
[8]   Instituto Nacional de Estadı́sticas y Geografı́a (2016). Estadı́sticas a
      propósito del dı́a mundial de la lucha contra el cáncer de mama.

</pre>