=Paper=
{{Paper
|id=None
|storemode=property
|title=SDCA: System to Detect Cancerous Abnormalities
|pdfUrl=https://ceur-ws.org/Vol-804/11_LANMR11.pdf
|volume=Vol-804
|dblpUrl=https://dblp.org/rec/conf/lanmr/CruzADP11
}}
==SDCA: System to Detect Cancerous Abnormalities==
<pdf width="1500px">https://ceur-ws.org/Vol-804/11_LANMR11.pdf</pdf>
<pre>
SDCA: System to detect cancerous abnormalities

    Eddy Sánchez de la Cruz1 , Homero Alpuı́n-Jiménez1 , Humberto de Jesús
                  Ochoa Domı́nguez2 , and Pilar Pozos-Parra1
      1
       Universidad Juárez Autónoma de Tabasco. Cunduacán, Tabasco, México
        eddsacx@gmail.com, {homero.alpuin,pilar.pozos}@dais.ujat.mx
    2
      Universidad Autónoma de Ciudad Juárez.Ciudad Juárez, Chihuahua, México
                                  hochoa@uacj.mx


          Abstract. In this article we present SDCA, which is a system to detect
          cancerous abnormalities in digital mammograms. The SDCA try to give
          at radiologist a second opinion in the analysis of a digital mammogram
          to increase the reliability of detecting breast cancer. SDCA is a semi-
          automation of KDD process (Knowledge Discovery in Databases). The
          KDD process is a method that uses strategies of Artiﬁcial Intelligence
          (AI) to extract patterns of behavior in databases with large volumes of
          information. Two SDCA characteristics outstanding are 1) the imple-
          mentation of Mejı́a ﬁltering method in the data cleansing module, and
          2) the implementation of Decorate strategy Classiﬁcation in the classiﬁ-
          cation module.
          The results shows that SDCA get 95% of detections classiﬁed correctly.
          SDCA was developed using Matlab GUIDE, and tests were done with
          the database (DB) of digital mammographic MIAS.


          Key words: SDCA, KDD process, detect cancerous abnormalities, dig-
          ital mammograms, classiﬁcation.


1    Introduction

For the radiologist, mammograms are highly useful to identify abnormalities
carcinogenic potential. The diﬃculty arises when the radiologist’s review does
not guarantee the detection of cancerous abnormalities. Therefore, this research
serves as a support in the detection of abnormal regions. Therefore, the area
of interest for this research is the analysis of medical images using the KDD
process.
This research, then, serves to give the radiologist a second opinion on the detec-
tion of abnormal and thus increase the reliability of diagnosis.
SDCA is an extension of work previously presented in [11] and [10]. This article
describes the segmentation, ﬁltering and classiﬁcation modules.
The rest of the paper is divided as follows: Section 2 brieﬂy describes the KDD
process for this research. Section 3 describes the ﬁlter module. Section 4 de-
scribes the segmentation module. Section 5 describes the classiﬁcation module.


                                            115
Section shows experimental results and ﬁnally in section ends with a conclusion
and future work.


2   KDD process

The following describes the KDD process steps for this research:
 – Selection: Given a set of diﬀerent digital mammography DB, the most rep-
   resentative was selected respect its use in other research. MIAS (see Table
   1) is a reduced version of the original, and for a long time, been used to
   test research in [9], [1], [3], [8], [7], and also strongly recommended by the
   University of South Florida [5].


Name Description
MIAS Small database that contains the same images as the
     original version, but reduced to a size of 1024 x 1024 pixels,
     available in http://www.wiau.man.ac.uk/services/MIAS/MIASmini.html
                              Table 1. DB MIAS


 – Data Preparation: The images are ﬁltered using the Mejı́a ﬁltering method,
   to reduce noise.
 – Data Transformation: Segmented manually, the area of interest by the ra-
   diologist, normalizes each segmented area to have the same dimensions and
   ﬁnally gives the frequency histogram of gray levels of each segmented area,
   and stored for create a testing base.
 – Data Mining: new samples are obtained and applied Decorate classiﬁcation
   strategy to classify the abnormality, if it exists.
 – Patterns evaluation: The patterns obtained are evaluated by the radiologist.


3   Filtering module

This module was integrated into the prototype, because with this ﬁltering method
is obtained excellent results in the enhancement of abnormalities. This work was
presented in [7]. The method works by using the Transform Contourlet Nonsub-
sampled (NSCT) and the Prewitt ﬁlter. The method is based on the classical
approach used in the processing methods for image processing.


For a more detailed explanation see [7].


                                      116
4   Segmentation module

In this module, the radiologist manually segmented the area he considers abnor-
malities. The result is shown in Fig. 2.


                          Fig. 1. Segmentation module


Then the area of interest is normalized so that all images have the same dimen-
sions.


After being normalized interest area, gives the frequency histogram of gray levels,
ranging from 0 to 255 (see Fig. 3).


                   Fig. 2. Frequency histogram of gray levels


This histogram is stored in a xlsx ﬁle called Histograma.xlsx for purpose of
building the testing base.
Then the radiologist choose a tag if the detected abnormality is benign (B),
malignant (M) or normal (N).


                                        117
    Finally for this module, open the ﬁle Histograma.xlsx and saved in CSV format
    (comma delimited) for later use.


    5     Classiﬁcation module


    In this module selects the Histograma.csv ﬁle to implement the classiﬁcation of
    the abnormality:

    The classiﬁcation is activated by pressing the Detección button, which indicates
    the call to Decorate strategy, which uses, in this case, the algorithm LADTree
    base, and this, in turn, is based on the algorithm LoogitBoost 1.1.

    The basic algorithm LoogitBoost:

  Algorithm 1.1: LogitBoost algorithm
  Result: If p (1|a)>0.5 predict the ﬁrst class Else the second
1 begin
2    for j = 1 until t do
3       for a [i] do
4           Assign the target value for the regression to
            z [i] = (y [i] − p (1 |a [i])) (p (1 |a [i])Λ (1 − p (1 |a [i])) Assign the
            weight of the instance to [i] = p (1 |a [i])Λ (1 − p (1 |a [i]))
5           Fit a regression model fj at data with class values z [i] and weights
            w [i]
6
7
8   end


    Finally, the result is released (see Fig. 6) for analysis by the radiologist.


                               Fig. 3. Classiﬁcation results


                                             118
This table shows four predictions made by SDCA, the ﬁrst three samples are
needed for the proper operation of the system, however, the new sample, which
interests the radiologist is the fourth, and in this new sample the radiologist
predicts the area of interest was the ﬁrst type, i.e., benign (1: B) (blue circle),
but the system indicates that normality can be either three, or malignant (3: M)
(red circle). This aid increase the reliability, as is now required to take another
mammogram from another angle to conﬁrm whether the abnormality is type 3:
M.


6   Results

Classiﬁcation Algorithms We tested each classiﬁcation algorithm for each
strategy, the strategies are: bayesian algorithms (bayes) classiﬁcation functions
(functions), algorithms to generate rules (rules), meta classiﬁers (meta), lazy
algorithms (lazy), algorithms generation of decision trees (trees) and miscella-
neous algorithms (misc).

We used the 322 digitized mammograms of MIAS BD (Table 1) for the ﬁrst test
and even to choose the algorithm that performed better. As shown in the table
below, the best result was obtained with the strategy Decorate. This strategy
was 95% of instances correctly classiﬁed.


                          Strategy Algorithm         CCI* % of CCI*
                          Bayes     NaiveBayes        13       65%
                          Functions Logistic          14       70%
                          Rules     PART              12       60%
                          Meta      Decorate          19       95%
                          Lazy      IB1               12       60%
                          Trees     RandomForest      17       85%
                          Misc      HyperPipes        10       50%
                      * CCI ← Correctly Classiﬁed Instances


Experiments and analysis For testing we used a data sample of eighty, di-
vided into four sets of twenty instances each. These results have been presented
previously [10] and [11].

In the ﬁrst dataset is obtained 95%; in the second dataset is obtained, similarly,
95%; in the third dataset is obtained 90% and ﬁnally, in the fourth dataset is ob-
tained 100% of instances correctly classiﬁed. These results ﬂuctuate between 90%
and 100%, giving an average of 95% of instances correctly classiﬁed. This shows
that SDAC is very reliable to use in the classiﬁcation of cancerous abnormalities.


                                               119
Evaluación. En los últimos años se han presentado aportaciones para ayudar
al radiólogo en el diagnóstico de cáncer de mama. Las pruebas en estos trabajos
se han realizado con diferentes BDs.
En la tabla 2, vemos que [6] obtuvo 95.35%, sin embargo, la BD nos dice que
estos resultados son propios para la comunidad española y, además, el tamaño de
la muestra es pequeño. [8] obtuvo 91% utilizando la misma BD que se usa en esta
investigación, sin embargo, el tamaño de la muestra es muy pequeño. En [2] se
obtuvo 73% utilizando una BD diferente, resultados realmente bajos y, además,
no se menciona el tamaño de la muestra. Finalmente, en esta invetigación, se
obtiene 95%, resultados satisfactorios, teniendo en cuenta que la BD utilizada
goza de amplia aprobación por la comunidad cientı́ﬁca, y que el tamaño de la
muestra es considerable. Por lo que se concluye que SDAC es altamente ﬁable
para apoyar el diagnóstico del radiólogo.


Método año BD utilizada                           Tamaño muestra % de ICC*
SDAC     2011 MIAS                                         80            95%
[2]      2008 DDSM[5]                                     s/n            73%
[8]      2008 MIAS                                         30            91%
[6]      2005 Hospital Puerta de Hierro de Madrid          43           95.35%
                 * ICC ← Instancias Correctamente Clasiﬁcadas
Table 2. Comparación de resultados obtenidos con otros trabajos previamente
reportados en la literatura


7   Conclusión y trabajos futuros


En este trabajo, presentamos SDCA para detectar anormalilades cancerı́genas
en mastografı́as digitales. SDCA es una semi-automaización del proceso KDD.
Los resultados obtenidos muestran que SDCA aumenta la ﬁabilidad en la de-
tección de anormalidades cancerı́genas, dando al radiólogo una segunda opinión
sobre la revisión de la mastografı́a.


Como trabajo futuro se propone aumentar el número de datos de pruebas, para
veriﬁcar que se mantenga el promedio de ﬁabilidad alrededor de 95%. Además,
teniendo en cuenta los buenos resultados aqui obtenidos e inspirados en [4],
quienes primero trabajaron con mastografı́as y luego con Pap Smear Microscopic
Image, se pretende migrar el proceso KDD para detectar cáncer cervicoúterino.


                                       120
References
 1. M. Antonie, O. Zaiane, and A. Coman. Application of data mining techniques
    for medical image classiﬁcation. In Proc. Of Second Intl. Workshop on Multime-
    dia Data Minino (MDM/KDD2001) in conjunction with Seventh ACM SIGKDD,
    pages 94–101, 2001.
 2. Enrique Calot, Hernán Merlino, and Paola Britos Ramón Garcı́a-Martı́nez. Clasi-
    ﬁcación de tumores en mamografı́as mediante uso combinado de rbp y ﬁltros sobel.
    2008.
 3. Ahmed Farag and Samia Mashali. Dct based features for the detection of micro-
    calciﬁcations in digital mammograms. 2004. Univ of Texas at El Paso. IEEE.
 4. Francisco Gallegos-Funes, Margarita Gómez-Mayorga, José Lopez-Bonilla, and
    Rene Cruz-Santiago. Rank m-type radial basis function (rmrbf) neural net-
    work for pap smear microscopic image classiﬁcation. C. Roy Keys Inc. http:
    // redshift. vif. com , 16:4, 2009.
 5. M. Heath, K. Bowyer, D. Kopans, R. Moore, and W. Kegelmeyer. The digital
    database for screening mammography (ddsm). Medical Physics Publishing. ISBN:
    1-930524-00-5, pages 212–218, 2001. University of South Florida Digital Mammog-
    raphy Home Page http://marathon.csee.usf.edu/Mammography/Database.html.
 6. José Avelino Manzano Lizcano, Laura Moyano Pérez, and Carmen Sánchez Ávila.
    Sistema para la detección automatizada de microcalciﬁcaciones en mamografı́a dig-
    italizada utilizando la transformada contourlet. Congreso de Métodos Numéricos
    en Ingenierı́a. ISBN: 978-607-7557-71-5, pages 2–11, 2005.
 7. José M. Mejı́a Mu noz. The nonsubsampled contourlet transform for enhancement
    of microcalciﬁcations in digital mammograms. 2009. 8th Mexican International
    Conference on Artiﬁcial Intelligence MICAI-2009. Guanajuato, México.
 8. Lorena Vargas Quintero, Leiner Barba Jiménez, Cesar Torres, and Lorenzo Mat-
    tos. Transformada wavelet y técnicas de ﬁltrado no lineal aplicadas a la detección
    de microcalciﬁcaciones en mamografı́as digitales. Memorias. XIII Simposio de
    Tratamiento de Señales, Imágenes y Visión Artiﬁcial STSIVA. ISSN 978-958-
    8477-00-8, II:23–26, 2008.
 9. R. Rangayyan, N. El-Faramawy, Leo Desautels, and O. Alim. Measures of acu-
    tance and shape for classiﬁcation of breast tumors. IEEE Transactions on Medical
    Imaging, 16:799, 1997.
10. Eddy Sánchez, Pilar Pozos-Parra, and Homero Alpuı́n-Jiménez. Cancer detection
    using the kdd process. Advances in Soft Computing Algorithms. ISSN: 1870-4079,
    49:109–117, 2010.
11. Eddy Sánchez, Pilar Pozos-Parra, and Homero Alpuı́n-Jiménez. Detección de
    cáncer de mama usando el proceso kdd en mastografı́as digitales. Avances en In-
    formática y Sistemas Computacionales. ISBN: 978-607-7557-71-5, V:40–51, 2010.


                                          121
122

</pre>