-

Comparing Data Mining with Ensemble Classi cation of Breast Cancer Masses in Digital Mammograms

Shima Ghassem Pour

shima.ghassempour@gamil.com 0

Peter Mc Leod

mcleod.ptr@gamil.com 1

Brijesh Verma

b.verma@cqu.edu.au 1

Anthony Maeder

a.maeder@uws.edu.au 0 0 School of Computing, Engineering and Mathematics, University of Western Sydney Campbelltown , New South Wales , Australia 1 School of Information and Communication Technology, Central Queensland University Rockhampton , Queensland , Australia

2012

55 63

Medical diagnosis sometimes involves detecting subtle indications of a disease or condition amongst a background of diverse healthy individuals. The amount of information that is available for discovering such indications for mammography is large and has been growing at an exponential rate, due to population wide screening programmes. In order to analyse this information data mining techniques have been utilised by various researchers. A question that arises is: do exible data mining techniques have comparable accuracy to dedicated classi cation techniques for medical diagnostic processes? This research compares a model-based data mining technique with a neural network classi cation technique and the improvements possible using an ensemble approach. A publicly available breast cancer benchmark database is used to determine the utility of the techniques and compare the accuracies obtained.

latent class analysis digital mammography breast cancer clustering classi cation neural network

Medical diagnosis is an active area of pattern recognition with di erent techniques being employed [ 17, 19, 12 ]. The expansion of digital information for different cohorts [ 15 ] has allowed researchers to examine relationships that were previously not uncovered due to the limited nature of information as well as a lack of techniques being available for the analysis of large data sets. Flexible data mining techniques have the capacity to predict disease and reveal previous unknown trends.

The question that arises is whether the relationships that are revealed by those techniques are as accurate or as comparable as techniques that are specifically developed for other purposes, such as a diagnostic system for a particular

Comparing Data Mining with Ensemble Classi cation disease or condition. This research aims at contrasting the cluster analysis technique (Latent Class Analysis) of Ghassem Pour, Maeder and Jorm [ 4 ] against a baseline neural network classi er, and then considers the e ects of applying an ensemble technique to improve the accuracies obtained.

The organisation of this paper is that section two provides a background on the approaches that have been utilised for breast cancer diagnosis, sections three and four detail the proposed techniques for comparison, section ve outlines the experimental results obtained and conclusions are presented in section six. 2

Background

Medical diagnosis is a problematic paradigm in that complex relationships can exist in the diagnostic features that are utilised to map to a resultant diagnosis about the disease state. In di erent cases the state of the disease condition itself can be marked by stages where the diagnostic symptoms or signs can be subtle or di erent to other stages of the disease. This means that there is often not a clean mapping between the diagnostic features and the diagnosis [ 13, 14 ].

Breast cancer screening using mammography provides an exemplar of this situation. Early detection and treatment have been the most e ective way of reducing mortality [ 2 ] however Christoyianni et al. [ 1 ] noted that 10-30% of breast cancers remain undetected while 15-30% of biopsies are cancerous. Taylor and Potts [ 22 ] made similar observations in their research. There are many reasons why various cancers can remain undetected. These include the obfuscation of anomalies by surrounding breast tissue, the asymmetry of the breast, prior surgery, natural di erences in breast appearance on mammograms, the low contrast nature of the mammogram itself, distortion from the mammographic process and even talc or powder on the outside of the breast making it hard to identify and discriminate anomalies. Even if an anomaly is detected, a high rate of false positives exist [ 17, 18 ].

Clustering has provided a widely used mechanism for organising data into similar groupings. The usage of clustering has also been extended to classi ers and detection systems in order to improve detection and provide greater classication accuracy. Kim et al. [ 9 ] developed a classi er based on Adaptive Resonance Theory (ART2) where micro-calci cations were grouped into di erent classes with a three-layered back propagation network performing the classi cation. The system achieved 90% sensitivity (Az of 0.997) with a low false positive rate of 0.67 per cropped image.

Other researchers such as Mohanty, Senapati and Lenka [ 16 ] explored the application of data mining techniques to breast cancer diagnosis. They indicated that data mining medical images would allow for the collection of e ective models, rules as well as patterns and reveal abnormalities from large datasets. Their approach was to use a hybrid feature selection technique with a decision tree classi er to classify breast cancer. They utilised 300 images from the MIAS database. They achieved a classi cation accuracy of 97.7% however their dataset images contained microcalci cations as well as mass anomalies.

Latent Class Analysis and Data Mining

Latent Class Analysis (LCA) has been proposed as a mechanism for improved clustering of data over traditional clustering algorithms like k-means [ 11 ]. LCA classi es subjects into one of K unobserved classes based on the observed data, where K is a constant and known parameter. These latent or potential classes are then re ned based upon their statistical relationships with the observed variables.

LCA is a probabilistic clustering approach: although each object is assumed to belong to one cluster, there is uncertainty about an object's membership of a cluster [ 11, 10 ]. This type of approach o ers some advantages in dealing with noisy data or data with complex relationships between variables, although as an iterative method there is always some chance that it will be susceptible to noise and in some cases fail to converge.

An advantage of using a statistical model is that the choice of the cluster criterion is less arbitrary. Nevertheless, the log-likelihood functions corresponding to LC cluster models may be similar to the criteria used by certain non-hierarchical cluster techniques [ 18 ]. Another advantage of the model-based clustering approach is that no decisions have to be made about the scaling of the observed variables: for instance, when working with normal distributions with unknown variances, the results will be the same irrespective of whether the variables are normalized or not.

Other advantages are that it is relatively easy to deal with variables of mixed measurement levels (di erent scale types) and that there are more formal criteria to make decisions about the number of clusters and other model features [ 3 ]. We have successfully applied LCA for cases in health data mining when the anomalous range of variables results in more clusters than have been expected from a causal or hypothesis based approach [ 5 ]. This implies that in some cases LCA may be used to reveal associations between variables that are more subtle and complex.

Unsupervised clustering requires prior speci cation of the number of clusters K to be constructed, implying that a model for the data is necessary which provides K. The binary nature of the diagnosis problem implies that K=2 should be used in ideal circumstances, but the possibility exists that allowing more clusters would give a better solution (e.g. by allowing several di erent classes within the positive or negative groups). Consequently a gure of merit is needed to establish that the chosen K value is optimal. In this research the Bayesian Information Criteria (BIC) is determined for the mass dataset in order to gauge the best number of clusters.

Repeated application of the clustering approach can also lead to di erent solutions due to randomness in starting conditions. In this work we used multiple applications of the clustering calculations to allow improvement in the results, in an ensemble-like approach. Our improvement strategy was based on selection of the most frequent membership of classes per element, over di erent numbers of clustering repetitions.

Comparing Data Mining with Ensemble Classi cation

Neural Network and Ensemble Methods

Neural networks have been advocated for breast cancer detection by many researchers. Various e orts to re ne classi cation performance have been made, using a number of strategies involving some means of choice between alternatives. Ensembles have been proposed as a mechanism for improving the classi cation accuracy of existing classi ers [ 6 ] providing that constituents are diverse.

Zhang et al. [ 23 ] partitioned their mass dataset obtained from the DDSM into several subsets based on mass shape and age. Several classi ers were then tested and the best performing classi er on each subset was chosen. They used SVM, k-nearest neighbour and Decision Tree (DT) classi ers in their ensemble and achieved a combined classi cation accuracy of 72% that was better than any individual classi er.

Surrendiran and Vadivel [ 21 ] proposed a technique that could determine what features had the most appropriate correlation on classi cation accuracy and achieved 87.3% classi cation accuracy. They achieved this by using ANOVA DA, Principal Component Analysis and Stepwise ANOVA analysis to determine the relationship between input feature and classi cation accuracy.

Mc Leod and Verma [ 14 ] utilised a clustered ensemble technique that relied on the notion that some patterns could be readily identi ed through clustering (atomic). Other patterns that were not so easily separable (non-atomic) were classi ed by a neural network. The classi cation process involved an initial lookup to determine if a pattern was associated with an atomic class however for non-atomic classes a neural network ensemble that had been created through an iterative clustering mechanism (to introduce diversity into the ensemble) was employed. The advantage of this technique is that the ensemble was not adversely a ected by outliers (atomic clusters). This technique was applied to the same mass dataset as utilised in this research and achieved a classi cation accuracy of 91%.

The ensemble utilised in this research was created by fusing together (using the majority vote algorithm) constituent neural networks that were created by varying the number of neurons in the hidden layer to create diverse networks for incorporation into an ensemble classi er. 5

Experimental Results

The experiments were conducted for LCA and neural network techniques and the related ensemble approaches using mass type anomalies from the Digital Database of Screening Mammography (DDSM) [ 7 ]. The features used for classication purposes coincided with the Breast Imaging Reporting and Data System (BI-RADS) as this is how radiologists classify breast cancer. The BI-RADS features of density, mass shape, mass margin and abnormality assessment rank are used as they have been proven to provide good classi cation accuracy [ 20 ]. These features are then combined with patient age and a subtlety value [ 7 ].

Experiments were performed utilising the clustering technique of Ghassem Pour, Maeder and Jorm [ 4 ] on this dataset. This was achieved using the Latent Gold R software package. The rst step was to utilise the analysis feature of LatentGold R to calculate the BIC value and the classi cation error rate. This information appears in Table 1 below, with Npar designating the resulting parameter value associated with the LCA.

Minimisation of BIC and the Classi cation Error determines the best number of clusters for the LCA analysis in terms of classi cation accuracy and this was found to be 2 clusters. Nevertheless, it might be expected that some further complexity could be identi ed in higher numbers of clusters, where multiple clusters may exist for either positive or negative classes. The results obtained when cases of more than 2 clusters were merged to form the dominant positive and negative classes, are detailed in Table 2. These results show the instability of LCA classi cation for this dataset at higher numbers of clusters, for example the 2-cluster solution gives better accuracy than the 3-cluster solution (merging into 2 clusters) and so forth. From this we conclude that the natural 2-cluster solution is indeed optimal.

In order to provide a comparison, further experiments were performed using a neural network and then applying an ensemble classi er. The neural network and ensemble techniques were implemented in MATLAB R utilising the neural network toolbox. The parameters utilised are detailed in the Table 3 below. Experiments were rst performed with a neural network classi er alone, in order to provide a baseline for measuring the classi cation accuracy on the selected dataset. The results obtained are detailed in Table 4 below. Further experiments were then performed utilising an ensemble technique with a summary of the neural network test results using ten-fold cross validation, as detailed in Table 5 below.

Comparing Data Mining with Ensemble Classi cation

Experiments were also performed for the ensemble-like optimising of results from the LCA technique. It is di cult to match this process directly with the complexity used for the NN-ensemble experiments, so the number of repetitions has been modelled on plausible choice based on dataset size of 100 cases. The results for these experiments are shown in Table 6 below. Examination of the results from Tables 1 to 6 demonstrates that the accuracy obtained with the LCA technique is below that of the baseline classi cation performed with the neural network. However an ensemble oriented approach enabled improvement of the results from both techniques.

In order to examine the results more closely the sensitivity, speci city and positive predictive value have been calculated for the best performing results for each of the trialled techniques, shown below in Table 7.

Sensitivity is the True Positive diagnosis divided by the True Positive and False Negative components. Sensitivity can be thought of as the probability of detecting cancer when it exists.

Speci city is the True Negative component divided by the True Negative component plus the False Positive component. Speci city can be thought of as the probability of being correctly diagnosed as not having cancer.

Positive Predictive Value (PPV) is the True Positive component divided by the True Positive component plus the False Positive component. PPV is the accuracy of being able to identify malignant abnormalities. The latent class analysis technique was not as sensitive as the neural network but had better speci city and a higher positive predictive value than the neural network. Both ensemble approaches resulted in substantially better performance, which of course must be traded o against the increased computational cost. The NN-ensemble technique performed the best with good sensitivity, speci city and a high positive predictive value.

The exibility of clustering techniques such as LCA provides a mechanism for gaining insight from large data repositories. However once patterns in the data become evident it would appear that other less exible but more specialised techniques could be utilised to obtain analysis at a higher degree of granularity of the data in question.

A summary of the overall performance of the techniques employed in this paper are presented in Figure 1. The optimal LCA-ensemble result, while less than the optimal NN-ensemble result, is obtained with somewhat less processing e ort and complexity, and further improvement may be possible.

Future work could look at extending the comparison of LCA with other data mining algorithms to determine their applicability. Breast cancer represents only one problem domain and applying these methods to other datasets would be a logical extension. Our future research will include more experiments with LatentGold R on other breast cancer datasets to determine how di erent numbers of clusters produce di erent classi cation results for a more detailed analysis.

1. Christoyianni , I. , Koutras , A. , Dermatas , E. , Kokkinakis , G.: Computer Aided Diagnosis of Breast Cancer in Digitized Mammograms . Computerized Medical Imaging and Graphics 26 ( 5 ), 309 - 319 ( 2002 )

2. DeSantis , C. , Siegel , R. , Bandi , P. , Jemal , A. :Breast Cancer Statistics, 2011 . CA: A Cancer Journal for Clinicians 61 ( 6 ), 408 - 418 ( 2011 )

3. Fraley , C. , Raftery , A. : Model-based Clustering, Discriminant Analysis, and Density Estimation . Journal of the American Statistical Association 97 ( 458 ), 611 - 631 ( 2002 )

Ghassem

Pour , S. , Maeder , A. , Jorm , L. : Constructing a Synthetic Longitudinal Health Dataset for Data Mining . DBKDA 2012 , The Fourth International Conference on Advances in Databases, Knowledge, and Data Applications . 86 - 90 ( 2012 )

Ghassem

Pour , S. , Maeder , A. , Jorm , L. : Validating Synthetic Health Datasets for Longitudinal Clustering . The Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2013 ) 142 , to appear ( 2013 )

6. Gou , S. , Yang , H. , Jiao , L. , Zhuang , X. : Algorithm of Partition Based Network Boosting for Imbalanced Data Classi cation . The International Joint Conference on Neural Networks (IJCNN) . 1-6 . IEEE ( 2010 )

7. Heath , M. , Bowyer , K. , Kopans , D. , Moore , R. , Kegelmeyer , P. : The Digital Database for Screening Mammography . Proceedings of the 5th International Workshop on Digital Mammography . 212 - 218 ( 2000 )

8. Hofvind , S. , Ponti , A. , Patnick , J. , Ascunce , N. , Njor , S. , Broeders , M. , Giordano , L. , Frigerio , A. , Tornberg , S. : False-positive Results in Mammographic Screening for Breast Cancer in Europe: a literature review and survey of service screening programmes . Journal of Medical Screening 19 ( 1 ), 57 - 66 ( 2012 )

9. Kim , J. , Park , J. , Song , K. , Park , H.: Detection of Clustered Microcalssi cations on Mammograms Using Surrounding Region Dependence Method and Arti cial Neural Network . The Journal of VLSI Signal Processing 18 ( 3 ), 251 - 262 ( 1998 )

10. Lanza , S. , Flaherty , B. , Collins , L. : Latent Class and Latent Transition Analysis . Handbook of Psychology . 663 - 685 ( 2003 )

11. Magidson , J. , Vermunt , J.: Latent Class Models for Clustering: A Comparison with k-means . Canadian Journal of Marketing Research 20 ( 1 ), 36 - 43 ( 2002 )

12. Malich , A. , Schmidt , S. , Fischer , D. , Facius , M. , Kaiser , W.: The Performance of Computer-aided Detection when Analyzing Prior Mammograms of Newly Detected Breast Cancers with Special Focus on the Time Interval from Initial Imaging to Detection . European Journal of Radiology 69 ( 3 ), 574 - 578 ( 2009 )

13. Mannila , H. : Data mining: Machine learning , Statistics, and Databases. Proceedings of Eighth International Conference on Scienti c and Statistical Database Systems.2-9 IEEE ( 1996 )

14. McLeod , P. , Verma , B. : Clustered Ensemble Neural Network for Breast Mass Classi cation in Digital Mammography . In: The International Joint Conference on Neural Networks (IJCNN) . 1266 - 1271 ( 2012 )

15. Mealing , N. , Banks , E. , Jorm , L. , Steel , D. , Clements , M. , Rogers , K. : Investigation of Relative Risk Estimates from Studies of the Same Population with Contrasting Response rates and Designs . BMC Medical Research Methodology 10 ( 1 ), 10 - 26 ( 2010 )

16. Mohanty , A. , Senapati , M. , Lenka , S.: A Novel Image Mining Technique for Classi cation of Mammograms Using Hybrid Feature Selection . Neural Computing & Applications . 1- 11 ( 2012 )

17. Nishikawa , R. , Kallergi , M. , Orton , C. , et al.: Computer-aided Detection, in its present form, is not an E ective aid for Screening Mammography . Medical Physics 33 ( 4 ), 811 - 814 ( 2006 )

18. Nylund , K. , Asparouhov , T. , Muthen , B. : Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study . Structural Equation Modeling 14 ( 4 ), 535 - 569 ( 2007 )

19. Oh , S. , Lee , M. , Zhang , B. : Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classi cation . IEEE/ACM Transactions on Computational Biology and Bioinformatics 8 ( 2 ), 316 - 325 ( 2011 )

20. Sampat , M. , Bovik , A. , Markey , M. : Classi cation of Mammographic lesions into BIRADS Shape Categories Using the Beamlet Transform . In: Proceedings of SPIE, Medical Imaging: Image Processing . 16 - 25 . SPIE( 2005 )

21. Surrendiran , B. , Vadivel , A. : Feature Selection Using Stepwise ANOVA, Discriminant Analysis for Mammogram Mass Classi cation . International Journal of Recent Trends in Engineering and Technology 3 , 55 - 57 ( 2010 )

22. Taylor , P., Potts , H.: Computer Aids and Human Second Reading as Interventions in Screening Mammography: two systematic reviews to compare e ects on cancer detection and recall rate . European Journal of Cancer 44 ( 6 ), 798 - 807 ( 2008 )

23. Zhang , Y. , Tomuro , N. , Furst , J. , Raicu , D. : Building an Ensemble System for Diagnosing Masses in Mammograms . International Journal of Computer Assisted Radiology and Surgery 7 ( 2 ), 323 - 329 ( 2012 )