=Paper= {{Paper |id=None |storemode=property |title=SDCA: System to Detect Cancerous Abnormalities |pdfUrl=https://ceur-ws.org/Vol-804/11_LANMR11.pdf |volume=Vol-804 |dblpUrl=https://dblp.org/rec/conf/lanmr/CruzADP11 }} ==SDCA: System to Detect Cancerous Abnormalities== https://ceur-ws.org/Vol-804/11_LANMR11.pdf
SDCA: System to detect cancerous abnormalities

    Eddy Sánchez de la Cruz1 , Homero Alpuı́n-Jiménez1 , Humberto de Jesús
                  Ochoa Domı́nguez2 , and Pilar Pozos-Parra1
      1
       Universidad Juárez Autónoma de Tabasco. Cunduacán, Tabasco, México
        eddsacx@gmail.com, {homero.alpuin,pilar.pozos}@dais.ujat.mx
    2
      Universidad Autónoma de Ciudad Juárez.Ciudad Juárez, Chihuahua, México
                                  hochoa@uacj.mx


          Abstract. In this article we present SDCA, which is a system to detect
          cancerous abnormalities in digital mammograms. The SDCA try to give
          at radiologist a second opinion in the analysis of a digital mammogram
          to increase the reliability of detecting breast cancer. SDCA is a semi-
          automation of KDD process (Knowledge Discovery in Databases). The
          KDD process is a method that uses strategies of Artificial Intelligence
          (AI) to extract patterns of behavior in databases with large volumes of
          information. Two SDCA characteristics outstanding are 1) the imple-
          mentation of Mejı́a filtering method in the data cleansing module, and
          2) the implementation of Decorate strategy Classification in the classifi-
          cation module.
          The results shows that SDCA get 95% of detections classified correctly.
          SDCA was developed using Matlab GUIDE, and tests were done with
          the database (DB) of digital mammographic MIAS.


          Key words: SDCA, KDD process, detect cancerous abnormalities, dig-
          ital mammograms, classification.



1    Introduction

For the radiologist, mammograms are highly useful to identify abnormalities
carcinogenic potential. The difficulty arises when the radiologist’s review does
not guarantee the detection of cancerous abnormalities. Therefore, this research
serves as a support in the detection of abnormal regions. Therefore, the area
of interest for this research is the analysis of medical images using the KDD
process.
This research, then, serves to give the radiologist a second opinion on the detec-
tion of abnormal and thus increase the reliability of diagnosis.
SDCA is an extension of work previously presented in [11] and [10]. This article
describes the segmentation, filtering and classification modules.
The rest of the paper is divided as follows: Section 2 briefly describes the KDD
process for this research. Section 3 describes the filter module. Section 4 de-
scribes the segmentation module. Section 5 describes the classification module.



                                            115
Section shows experimental results and finally in section ends with a conclusion
and future work.


2   KDD process

The following describes the KDD process steps for this research:
 – Selection: Given a set of different digital mammography DB, the most rep-
   resentative was selected respect its use in other research. MIAS (see Table
   1) is a reduced version of the original, and for a long time, been used to
   test research in [9], [1], [3], [8], [7], and also strongly recommended by the
   University of South Florida [5].


Name Description
MIAS Small database that contains the same images as the
     original version, but reduced to a size of 1024 x 1024 pixels,
     available in http://www.wiau.man.ac.uk/services/MIAS/MIASmini.html
                              Table 1. DB MIAS




 – Data Preparation: The images are filtered using the Mejı́a filtering method,
   to reduce noise.
 – Data Transformation: Segmented manually, the area of interest by the ra-
   diologist, normalizes each segmented area to have the same dimensions and
   finally gives the frequency histogram of gray levels of each segmented area,
   and stored for create a testing base.
 – Data Mining: new samples are obtained and applied Decorate classification
   strategy to classify the abnormality, if it exists.
 – Patterns evaluation: The patterns obtained are evaluated by the radiologist.


3   Filtering module

This module was integrated into the prototype, because with this filtering method
is obtained excellent results in the enhancement of abnormalities. This work was
presented in [7]. The method works by using the Transform Contourlet Nonsub-
sampled (NSCT) and the Prewitt filter. The method is based on the classical
approach used in the processing methods for image processing.


For a more detailed explanation see [7].



                                      116
4   Segmentation module

In this module, the radiologist manually segmented the area he considers abnor-
malities. The result is shown in Fig. 2.




                          Fig. 1. Segmentation module



Then the area of interest is normalized so that all images have the same dimen-
sions.


After being normalized interest area, gives the frequency histogram of gray levels,
ranging from 0 to 255 (see Fig. 3).




                   Fig. 2. Frequency histogram of gray levels



This histogram is stored in a xlsx file called Histograma.xlsx for purpose of
building the testing base.
Then the radiologist choose a tag if the detected abnormality is benign (B),
malignant (M) or normal (N).



                                        117
    Finally for this module, open the file Histograma.xlsx and saved in CSV format
    (comma delimited) for later use.



    5     Classification module


    In this module selects the Histograma.csv file to implement the classification of
    the abnormality:

    The classification is activated by pressing the Detección button, which indicates
    the call to Decorate strategy, which uses, in this case, the algorithm LADTree
    base, and this, in turn, is based on the algorithm LoogitBoost 1.1.

    The basic algorithm LoogitBoost:

  Algorithm 1.1: LogitBoost algorithm
  Result: If p (1|a)>0.5 predict the first class Else the second
1 begin
2    for j = 1 until t do
3       for a [i] do
4           Assign the target value for the regression to
            z [i] = (y [i] − p (1 |a [i])) (p (1 |a [i])Λ (1 − p (1 |a [i])) Assign the
            weight of the instance to [i] = p (1 |a [i])Λ (1 − p (1 |a [i]))
5           Fit a regression model fj at data with class values z [i] and weights
            w [i]
6
7
8   end


    Finally, the result is released (see Fig. 6) for analysis by the radiologist.




                               Fig. 3. Classification results



                                             118
This table shows four predictions made by SDCA, the first three samples are
needed for the proper operation of the system, however, the new sample, which
interests the radiologist is the fourth, and in this new sample the radiologist
predicts the area of interest was the first type, i.e., benign (1: B) (blue circle),
but the system indicates that normality can be either three, or malignant (3: M)
(red circle). This aid increase the reliability, as is now required to take another
mammogram from another angle to confirm whether the abnormality is type 3:
M.


6   Results

Classification Algorithms We tested each classification algorithm for each
strategy, the strategies are: bayesian algorithms (bayes) classification functions
(functions), algorithms to generate rules (rules), meta classifiers (meta), lazy
algorithms (lazy), algorithms generation of decision trees (trees) and miscella-
neous algorithms (misc).

We used the 322 digitized mammograms of MIAS BD (Table 1) for the first test
and even to choose the algorithm that performed better. As shown in the table
below, the best result was obtained with the strategy Decorate. This strategy
was 95% of instances correctly classified.


                          Strategy Algorithm         CCI* % of CCI*
                          Bayes     NaiveBayes        13       65%
                          Functions Logistic          14       70%
                          Rules     PART              12       60%
                          Meta      Decorate          19       95%
                          Lazy      IB1               12       60%
                          Trees     RandomForest      17       85%
                          Misc      HyperPipes        10       50%
                      * CCI ← Correctly Classified Instances




Experiments and analysis For testing we used a data sample of eighty, di-
vided into four sets of twenty instances each. These results have been presented
previously [10] and [11].

In the first dataset is obtained 95%; in the second dataset is obtained, similarly,
95%; in the third dataset is obtained 90% and finally, in the fourth dataset is ob-
tained 100% of instances correctly classified. These results fluctuate between 90%
and 100%, giving an average of 95% of instances correctly classified. This shows
that SDAC is very reliable to use in the classification of cancerous abnormalities.




                                               119
Evaluación. En los últimos años se han presentado aportaciones para ayudar
al radiólogo en el diagnóstico de cáncer de mama. Las pruebas en estos trabajos
se han realizado con diferentes BDs.
En la tabla 2, vemos que [6] obtuvo 95.35%, sin embargo, la BD nos dice que
estos resultados son propios para la comunidad española y, además, el tamaño de
la muestra es pequeño. [8] obtuvo 91% utilizando la misma BD que se usa en esta
investigación, sin embargo, el tamaño de la muestra es muy pequeño. En [2] se
obtuvo 73% utilizando una BD diferente, resultados realmente bajos y, además,
no se menciona el tamaño de la muestra. Finalmente, en esta invetigación, se
obtiene 95%, resultados satisfactorios, teniendo en cuenta que la BD utilizada
goza de amplia aprobación por la comunidad cientı́fica, y que el tamaño de la
muestra es considerable. Por lo que se concluye que SDAC es altamente fiable
para apoyar el diagnóstico del radiólogo.



Método año BD utilizada                           Tamaño muestra % de ICC*
SDAC     2011 MIAS                                         80            95%
[2]      2008 DDSM[5]                                     s/n            73%
[8]      2008 MIAS                                         30            91%
[6]      2005 Hospital Puerta de Hierro de Madrid          43           95.35%
                 * ICC ← Instancias Correctamente Clasificadas
Table 2. Comparación de resultados obtenidos con otros trabajos previamente
reportados en la literatura




7   Conclusión y trabajos futuros



En este trabajo, presentamos SDCA para detectar anormalilades cancerı́genas
en mastografı́as digitales. SDCA es una semi-automaización del proceso KDD.
Los resultados obtenidos muestran que SDCA aumenta la fiabilidad en la de-
tección de anormalidades cancerı́genas, dando al radiólogo una segunda opinión
sobre la revisión de la mastografı́a.


Como trabajo futuro se propone aumentar el número de datos de pruebas, para
verificar que se mantenga el promedio de fiabilidad alrededor de 95%. Además,
teniendo en cuenta los buenos resultados aqui obtenidos e inspirados en [4],
quienes primero trabajaron con mastografı́as y luego con Pap Smear Microscopic
Image, se pretende migrar el proceso KDD para detectar cáncer cervicoúterino.




                                       120
References
 1. M. Antonie, O. Zaiane, and A. Coman. Application of data mining techniques
    for medical image classification. In Proc. Of Second Intl. Workshop on Multime-
    dia Data Minino (MDM/KDD2001) in conjunction with Seventh ACM SIGKDD,
    pages 94–101, 2001.
 2. Enrique Calot, Hernán Merlino, and Paola Britos Ramón Garcı́a-Martı́nez. Clasi-
    ficación de tumores en mamografı́as mediante uso combinado de rbp y filtros sobel.
    2008.
 3. Ahmed Farag and Samia Mashali. Dct based features for the detection of micro-
    calcifications in digital mammograms. 2004. Univ of Texas at El Paso. IEEE.
 4. Francisco Gallegos-Funes, Margarita Gómez-Mayorga, José Lopez-Bonilla, and
    Rene Cruz-Santiago. Rank m-type radial basis function (rmrbf) neural net-
    work for pap smear microscopic image classification. C. Roy Keys Inc. http:
    // redshift. vif. com , 16:4, 2009.
 5. M. Heath, K. Bowyer, D. Kopans, R. Moore, and W. Kegelmeyer. The digital
    database for screening mammography (ddsm). Medical Physics Publishing. ISBN:
    1-930524-00-5, pages 212–218, 2001. University of South Florida Digital Mammog-
    raphy Home Page http://marathon.csee.usf.edu/Mammography/Database.html.
 6. José Avelino Manzano Lizcano, Laura Moyano Pérez, and Carmen Sánchez Ávila.
    Sistema para la detección automatizada de microcalcificaciones en mamografı́a dig-
    italizada utilizando la transformada contourlet. Congreso de Métodos Numéricos
    en Ingenierı́a. ISBN: 978-607-7557-71-5, pages 2–11, 2005.
 7. José M. Mejı́a Mu noz. The nonsubsampled contourlet transform for enhancement
    of microcalcifications in digital mammograms. 2009. 8th Mexican International
    Conference on Artificial Intelligence MICAI-2009. Guanajuato, México.
 8. Lorena Vargas Quintero, Leiner Barba Jiménez, Cesar Torres, and Lorenzo Mat-
    tos. Transformada wavelet y técnicas de filtrado no lineal aplicadas a la detección
    de microcalcificaciones en mamografı́as digitales. Memorias. XIII Simposio de
    Tratamiento de Señales, Imágenes y Visión Artificial STSIVA. ISSN 978-958-
    8477-00-8, II:23–26, 2008.
 9. R. Rangayyan, N. El-Faramawy, Leo Desautels, and O. Alim. Measures of acu-
    tance and shape for classification of breast tumors. IEEE Transactions on Medical
    Imaging, 16:799, 1997.
10. Eddy Sánchez, Pilar Pozos-Parra, and Homero Alpuı́n-Jiménez. Cancer detection
    using the kdd process. Advances in Soft Computing Algorithms. ISSN: 1870-4079,
    49:109–117, 2010.
11. Eddy Sánchez, Pilar Pozos-Parra, and Homero Alpuı́n-Jiménez. Detección de
    cáncer de mama usando el proceso kdd en mastografı́as digitales. Avances en In-
    formática y Sistemas Computacionales. ISBN: 978-607-7557-71-5, V:40–51, 2010.




                                          121
122