<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparing Data Mining with Ensemble Classi cation of Breast Cancer Masses in Digital Mammograms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shima Ghassem Pour</string-name>
          <email>shima.ghassempour@gamil.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Mc Leod</string-name>
          <email>mcleod.ptr@gamil.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brijesh Verma</string-name>
          <email>b.verma@cqu.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anthony Maeder</string-name>
          <email>a.maeder@uws.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computing, Engineering and Mathematics, University of Western Sydney Campbelltown</institution>
          ,
          <addr-line>New South Wales</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Information and Communication Technology, Central Queensland University Rockhampton</institution>
          ,
          <addr-line>Queensland</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <fpage>55</fpage>
      <lpage>63</lpage>
      <abstract>
        <p>Medical diagnosis sometimes involves detecting subtle indications of a disease or condition amongst a background of diverse healthy individuals. The amount of information that is available for discovering such indications for mammography is large and has been growing at an exponential rate, due to population wide screening programmes. In order to analyse this information data mining techniques have been utilised by various researchers. A question that arises is: do exible data mining techniques have comparable accuracy to dedicated classi cation techniques for medical diagnostic processes? This research compares a model-based data mining technique with a neural network classi cation technique and the improvements possible using an ensemble approach. A publicly available breast cancer benchmark database is used to determine the utility of the techniques and compare the accuracies obtained.</p>
      </abstract>
      <kwd-group>
        <kwd>latent class analysis</kwd>
        <kwd>digital mammography</kwd>
        <kwd>breast cancer</kwd>
        <kwd>clustering</kwd>
        <kwd>classi cation</kwd>
        <kwd>neural network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Medical diagnosis is an active area of pattern recognition with di erent
techniques being employed [
        <xref ref-type="bibr" rid="ref12 ref17 ref19">17, 19, 12</xref>
        ]. The expansion of digital information for
different cohorts [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] has allowed researchers to examine relationships that were
previously not uncovered due to the limited nature of information as well as a
lack of techniques being available for the analysis of large data sets. Flexible
data mining techniques have the capacity to predict disease and reveal previous
unknown trends.
      </p>
      <p>The question that arises is whether the relationships that are revealed by
those techniques are as accurate or as comparable as techniques that are
specifically developed for other purposes, such as a diagnostic system for a particular</p>
      <p>
        Comparing Data Mining with Ensemble Classi cation
disease or condition. This research aims at contrasting the cluster analysis
technique (Latent Class Analysis) of Ghassem Pour, Maeder and Jorm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] against a
baseline neural network classi er, and then considers the e ects of applying an
ensemble technique to improve the accuracies obtained.
      </p>
      <p>The organisation of this paper is that section two provides a background on
the approaches that have been utilised for breast cancer diagnosis, sections three
and four detail the proposed techniques for comparison, section ve outlines the
experimental results obtained and conclusions are presented in section six.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Medical diagnosis is a problematic paradigm in that complex relationships can
exist in the diagnostic features that are utilised to map to a resultant diagnosis
about the disease state. In di erent cases the state of the disease condition itself
can be marked by stages where the diagnostic symptoms or signs can be subtle
or di erent to other stages of the disease. This means that there is often not a
clean mapping between the diagnostic features and the diagnosis [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ].
      </p>
      <p>
        Breast cancer screening using mammography provides an exemplar of this
situation. Early detection and treatment have been the most e ective way of
reducing mortality [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] however Christoyianni et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] noted that 10-30% of
breast cancers remain undetected while 15-30% of biopsies are cancerous.
Taylor and Potts [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] made similar observations in their research. There are many
reasons why various cancers can remain undetected. These include the
obfuscation of anomalies by surrounding breast tissue, the asymmetry of the breast,
prior surgery, natural di erences in breast appearance on mammograms, the low
contrast nature of the mammogram itself, distortion from the mammographic
process and even talc or powder on the outside of the breast making it hard to
identify and discriminate anomalies. Even if an anomaly is detected, a high rate
of false positives exist [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ].
      </p>
      <p>
        Clustering has provided a widely used mechanism for organising data into
similar groupings. The usage of clustering has also been extended to classi ers
and detection systems in order to improve detection and provide greater
classication accuracy. Kim et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] developed a classi er based on Adaptive
Resonance Theory (ART2) where micro-calci cations were grouped into di erent
classes with a three-layered back propagation network performing the classi
cation. The system achieved 90% sensitivity (Az of 0.997) with a low false positive
rate of 0.67 per cropped image.
      </p>
      <p>
        Other researchers such as Mohanty, Senapati and Lenka [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] explored the
application of data mining techniques to breast cancer diagnosis. They
indicated that data mining medical images would allow for the collection of e ective
models, rules as well as patterns and reveal abnormalities from large datasets.
Their approach was to use a hybrid feature selection technique with a decision
tree classi er to classify breast cancer. They utilised 300 images from the MIAS
database. They achieved a classi cation accuracy of 97.7% however their dataset
images contained microcalci cations as well as mass anomalies.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Latent Class Analysis and Data Mining</title>
      <p>
        Latent Class Analysis (LCA) has been proposed as a mechanism for improved
clustering of data over traditional clustering algorithms like k-means [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. LCA
classi es subjects into one of K unobserved classes based on the observed data,
where K is a constant and known parameter. These latent or potential classes
are then re ned based upon their statistical relationships with the observed
variables.
      </p>
      <p>
        LCA is a probabilistic clustering approach: although each object is assumed
to belong to one cluster, there is uncertainty about an object's membership of
a cluster [
        <xref ref-type="bibr" rid="ref10 ref11">11, 10</xref>
        ]. This type of approach o ers some advantages in dealing with
noisy data or data with complex relationships between variables, although as an
iterative method there is always some chance that it will be susceptible to noise
and in some cases fail to converge.
      </p>
      <p>
        An advantage of using a statistical model is that the choice of the
cluster criterion is less arbitrary. Nevertheless, the log-likelihood functions
corresponding to LC cluster models may be similar to the criteria used by certain
non-hierarchical cluster techniques [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Another advantage of the model-based
clustering approach is that no decisions have to be made about the scaling of the
observed variables: for instance, when working with normal distributions with
unknown variances, the results will be the same irrespective of whether the
variables are normalized or not.
      </p>
      <p>
        Other advantages are that it is relatively easy to deal with variables of mixed
measurement levels (di erent scale types) and that there are more formal
criteria to make decisions about the number of clusters and other model features
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We have successfully applied LCA for cases in health data mining when the
anomalous range of variables results in more clusters than have been expected
from a causal or hypothesis based approach [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This implies that in some cases
LCA may be used to reveal associations between variables that are more subtle
and complex.
      </p>
      <p>Unsupervised clustering requires prior speci cation of the number of clusters
K to be constructed, implying that a model for the data is necessary which
provides K. The binary nature of the diagnosis problem implies that K=2 should
be used in ideal circumstances, but the possibility exists that allowing more
clusters would give a better solution (e.g. by allowing several di erent classes
within the positive or negative groups). Consequently a gure of merit is needed
to establish that the chosen K value is optimal. In this research the Bayesian
Information Criteria (BIC) is determined for the mass dataset in order to gauge
the best number of clusters.</p>
      <p>Repeated application of the clustering approach can also lead to di erent
solutions due to randomness in starting conditions. In this work we used multiple
applications of the clustering calculations to allow improvement in the results,
in an ensemble-like approach. Our improvement strategy was based on selection
of the most frequent membership of classes per element, over di erent numbers
of clustering repetitions.</p>
      <p>Comparing Data Mining with Ensemble Classi cation</p>
    </sec>
    <sec id="sec-4">
      <title>Neural Network and Ensemble Methods</title>
      <p>
        Neural networks have been advocated for breast cancer detection by many
researchers. Various e orts to re ne classi cation performance have been made,
using a number of strategies involving some means of choice between alternatives.
Ensembles have been proposed as a mechanism for improving the classi cation
accuracy of existing classi ers [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] providing that constituents are diverse.
      </p>
      <p>
        Zhang et al. [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] partitioned their mass dataset obtained from the DDSM
into several subsets based on mass shape and age. Several classi ers were then
tested and the best performing classi er on each subset was chosen. They used
SVM, k-nearest neighbour and Decision Tree (DT) classi ers in their ensemble
and achieved a combined classi cation accuracy of 72% that was better than
any individual classi er.
      </p>
      <p>
        Surrendiran and Vadivel [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] proposed a technique that could determine what
features had the most appropriate correlation on classi cation accuracy and
achieved 87.3% classi cation accuracy. They achieved this by using ANOVA
DA, Principal Component Analysis and Stepwise ANOVA analysis to determine
the relationship between input feature and classi cation accuracy.
      </p>
      <p>
        Mc Leod and Verma [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] utilised a clustered ensemble technique that relied
on the notion that some patterns could be readily identi ed through
clustering (atomic). Other patterns that were not so easily separable (non-atomic)
were classi ed by a neural network. The classi cation process involved an initial
lookup to determine if a pattern was associated with an atomic class however
for non-atomic classes a neural network ensemble that had been created through
an iterative clustering mechanism (to introduce diversity into the ensemble) was
employed. The advantage of this technique is that the ensemble was not
adversely a ected by outliers (atomic clusters). This technique was applied to the
same mass dataset as utilised in this research and achieved a classi cation
accuracy of 91%.
      </p>
      <p>The ensemble utilised in this research was created by fusing together (using
the majority vote algorithm) constituent neural networks that were created by
varying the number of neurons in the hidden layer to create diverse networks for
incorporation into an ensemble classi er.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental Results</title>
      <p>
        The experiments were conducted for LCA and neural network techniques and
the related ensemble approaches using mass type anomalies from the Digital
Database of Screening Mammography (DDSM) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The features used for
classication purposes coincided with the Breast Imaging Reporting and Data System
(BI-RADS) as this is how radiologists classify breast cancer. The BI-RADS
features of density, mass shape, mass margin and abnormality assessment rank are
used as they have been proven to provide good classi cation accuracy [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. These
features are then combined with patient age and a subtlety value [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Experiments were performed utilising the clustering technique of Ghassem
Pour, Maeder and Jorm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] on this dataset. This was achieved using the
Latent Gold R software package. The rst step was to utilise the analysis feature of
LatentGold R to calculate the BIC value and the classi cation error rate. This
information appears in Table 1 below, with Npar designating the resulting
parameter value associated with the LCA.
      </p>
      <p>Minimisation of BIC and the Classi cation Error determines the best number
of clusters for the LCA analysis in terms of classi cation accuracy and this was
found to be 2 clusters. Nevertheless, it might be expected that some further
complexity could be identi ed in higher numbers of clusters, where multiple
clusters may exist for either positive or negative classes. The results obtained
when cases of more than 2 clusters were merged to form the dominant positive
and negative classes, are detailed in Table 2. These results show the instability
of LCA classi cation for this dataset at higher numbers of clusters, for example
the 2-cluster solution gives better accuracy than the 3-cluster solution (merging
into 2 clusters) and so forth. From this we conclude that the natural 2-cluster
solution is indeed optimal.</p>
      <p>In order to provide a comparison, further experiments were performed using
a neural network and then applying an ensemble classi er. The neural network
and ensemble techniques were implemented in MATLAB R utilising the neural
network toolbox. The parameters utilised are detailed in the Table 3 below.
Experiments were rst performed with a neural network classi er alone, in order
to provide a baseline for measuring the classi cation accuracy on the selected
dataset. The results obtained are detailed in Table 4 below. Further experiments
were then performed utilising an ensemble technique with a summary of the
neural network test results using ten-fold cross validation, as detailed in Table
5 below.</p>
      <p>Comparing Data Mining with Ensemble Classi cation</p>
      <p>Experiments were also performed for the ensemble-like optimising of results
from the LCA technique. It is di cult to match this process directly with the
complexity used for the NN-ensemble experiments, so the number of repetitions
has been modelled on plausible choice based on dataset size of 100 cases. The
results for these experiments are shown in Table 6 below.
Examination of the results from Tables 1 to 6 demonstrates that the accuracy
obtained with the LCA technique is below that of the baseline classi cation
performed with the neural network. However an ensemble oriented approach
enabled improvement of the results from both techniques.</p>
      <p>In order to examine the results more closely the sensitivity, speci city and
positive predictive value have been calculated for the best performing results for
each of the trialled techniques, shown below in Table 7.</p>
      <p>Sensitivity is the True Positive diagnosis divided by the True Positive and
False Negative components. Sensitivity can be thought of as the probability of
detecting cancer when it exists.</p>
      <p>Speci city is the True Negative component divided by the True Negative
component plus the False Positive component. Speci city can be thought of as
the probability of being correctly diagnosed as not having cancer.</p>
      <p>Positive Predictive Value (PPV) is the True Positive component divided by
the True Positive component plus the False Positive component. PPV is the
accuracy of being able to identify malignant abnormalities. The latent class analysis
technique was not as sensitive as the neural network but had better speci city
and a higher positive predictive value than the neural network. Both ensemble
approaches resulted in substantially better performance, which of course must
be traded o against the increased computational cost. The NN-ensemble
technique performed the best with good sensitivity, speci city and a high positive
predictive value.</p>
      <p>The exibility of clustering techniques such as LCA provides a mechanism for
gaining insight from large data repositories. However once patterns in the data
become evident it would appear that other less exible but more specialised
techniques could be utilised to obtain analysis at a higher degree of granularity
of the data in question.</p>
      <p>A summary of the overall performance of the techniques employed in this
paper are presented in Figure 1. The optimal LCA-ensemble result, while less
than the optimal NN-ensemble result, is obtained with somewhat less processing
e ort and complexity, and further improvement may be possible.</p>
      <p>Future work could look at extending the comparison of LCA with other
data mining algorithms to determine their applicability. Breast cancer represents
only one problem domain and applying these methods to other datasets would
be a logical extension. Our future research will include more experiments with
LatentGold R on other breast cancer datasets to determine how di erent numbers
of clusters produce di erent classi cation results for a more detailed analysis.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Christoyianni</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koutras</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dermatas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kokkinakis</surname>
          </string-name>
          , G.:
          <article-title>Computer Aided Diagnosis of Breast Cancer in Digitized Mammograms</article-title>
          .
          <source>Computerized Medical Imaging and Graphics</source>
          <volume>26</volume>
          (
          <issue>5</issue>
          ),
          <fpage>309</fpage>
          -
          <lpage>319</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>DeSantis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siegel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bandi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jemal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :Breast Cancer Statistics,
          <year>2011</year>
          . CA: A
          <source>Cancer Journal for Clinicians</source>
          <volume>61</volume>
          (
          <issue>6</issue>
          ),
          <fpage>408</fpage>
          -
          <lpage>418</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fraley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raftery</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Model-based Clustering, Discriminant Analysis, and Density Estimation</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          <volume>97</volume>
          (
          <issue>458</issue>
          ),
          <fpage>611</fpage>
          -
          <lpage>631</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Ghassem</given-names>
            <surname>Pour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Maeder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Jorm</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>Constructing a Synthetic Longitudinal Health Dataset for Data Mining</article-title>
          .
          <source>DBKDA</source>
          <year>2012</year>
          ,
          <source>The Fourth International Conference on Advances in Databases, Knowledge, and Data Applications</source>
          .
          <fpage>86</fpage>
          -
          <lpage>90</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Ghassem</given-names>
            <surname>Pour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Maeder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Jorm</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>Validating Synthetic Health Datasets for Longitudinal Clustering</article-title>
          .
          <source>The Australasian Workshop on Health Informatics and Knowledge Management (HIKM</source>
          <year>2013</year>
          )
          <volume>142</volume>
          , to appear (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhuang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Algorithm of Partition Based Network Boosting for Imbalanced Data Classi cation</article-title>
          .
          <source>The International Joint Conference on Neural Networks (IJCNN)</source>
          .
          <article-title>1-6</article-title>
          . IEEE (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bowyer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kopans</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kegelmeyer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>The Digital Database for Screening Mammography</article-title>
          .
          <source>Proceedings of the 5th International Workshop on Digital Mammography</source>
          .
          <fpage>212</fpage>
          -
          <lpage>218</lpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hofvind</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patnick</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ascunce</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Njor</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Broeders</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giordano</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frigerio</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tornberg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>False-positive Results in Mammographic Screening for Breast Cancer in Europe: a literature review and survey of service screening programmes</article-title>
          .
          <source>Journal of Medical Screening</source>
          <volume>19</volume>
          (
          <issue>1</issue>
          ),
          <fpage>57</fpage>
          -
          <lpage>66</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
          </string-name>
          , H.:
          <article-title>Detection of Clustered Microcalssi cations on Mammograms Using Surrounding Region Dependence Method and Arti cial Neural Network</article-title>
          .
          <source>The Journal of VLSI Signal Processing</source>
          <volume>18</volume>
          (
          <issue>3</issue>
          ),
          <fpage>251</fpage>
          -
          <lpage>262</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lanza</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flaherty</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Collins</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Latent Class and Latent Transition Analysis</article-title>
          .
          <source>Handbook of Psychology</source>
          .
          <volume>663</volume>
          -
          <fpage>685</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Magidson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vermunt</surname>
          </string-name>
          , J.:
          <article-title>Latent Class Models for Clustering: A Comparison with k-means</article-title>
          .
          <source>Canadian Journal of Marketing Research</source>
          <volume>20</volume>
          (
          <issue>1</issue>
          ),
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Malich</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Facius</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>The Performance of Computer-aided Detection when Analyzing Prior Mammograms of Newly Detected Breast Cancers with Special Focus on the Time Interval from Initial Imaging to Detection</article-title>
          .
          <source>European Journal of Radiology</source>
          <volume>69</volume>
          (
          <issue>3</issue>
          ),
          <fpage>574</fpage>
          -
          <lpage>578</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mannila</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Data mining: Machine learning</article-title>
          , Statistics, and Databases. Proceedings of Eighth International Conference on Scienti c and
          <source>Statistical Database Systems.2-9 IEEE</source>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>McLeod</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verma</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Clustered Ensemble Neural Network for Breast Mass Classi cation in Digital Mammography</article-title>
          .
          <source>In: The International Joint Conference on Neural Networks (IJCNN)</source>
          .
          <volume>1266</volume>
          -
          <fpage>1271</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mealing</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banks</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jorm</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clements</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Investigation of Relative Risk Estimates from Studies of the Same Population with Contrasting Response rates and Designs</article-title>
          .
          <source>BMC Medical Research Methodology</source>
          <volume>10</volume>
          (
          <issue>1</issue>
          ),
          <fpage>10</fpage>
          -
          <lpage>26</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Mohanty</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Senapati</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenka</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A Novel Image Mining Technique for Classi cation of Mammograms Using Hybrid Feature Selection</article-title>
          .
          <source>Neural Computing &amp; Applications</source>
          . 1-
          <fpage>11</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Nishikawa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kallergi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , et al.:
          <article-title>Computer-aided Detection, in its present form, is not an E ective aid for Screening Mammography</article-title>
          .
          <source>Medical Physics</source>
          <volume>33</volume>
          (
          <issue>4</issue>
          ),
          <fpage>811</fpage>
          -
          <lpage>814</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Nylund</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Asparouhov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muthen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study</article-title>
          .
          <source>Structural Equation Modeling</source>
          <volume>14</volume>
          (
          <issue>4</issue>
          ),
          <fpage>535</fpage>
          -
          <lpage>569</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Oh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classi cation</article-title>
          .
          <source>IEEE/ACM Transactions on Computational Biology and Bioinformatics</source>
          <volume>8</volume>
          (
          <issue>2</issue>
          ),
          <fpage>316</fpage>
          -
          <lpage>325</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Sampat</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bovik</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Classi cation of Mammographic lesions into BIRADS Shape Categories Using the Beamlet Transform</article-title>
          .
          <source>In: Proceedings of SPIE, Medical Imaging: Image Processing</source>
          .
          <fpage>16</fpage>
          -
          <lpage>25</lpage>
          . SPIE(
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Surrendiran</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vadivel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Feature Selection Using Stepwise ANOVA, Discriminant Analysis for Mammogram Mass Classi cation</article-title>
          .
          <source>International Journal of Recent Trends in Engineering and Technology</source>
          <volume>3</volume>
          ,
          <fpage>55</fpage>
          -
          <lpage>57</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , P.,
          <string-name>
            <surname>Potts</surname>
          </string-name>
          , H.:
          <article-title>Computer Aids and Human Second Reading as Interventions in Screening Mammography: two systematic reviews to compare e ects on cancer detection and recall rate</article-title>
          .
          <source>European Journal of Cancer</source>
          <volume>44</volume>
          (
          <issue>6</issue>
          ),
          <fpage>798</fpage>
          -
          <lpage>807</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomuro</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furst</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raicu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Building an Ensemble System for Diagnosing Masses in Mammograms</article-title>
          .
          <source>International Journal of Computer Assisted Radiology and Surgery</source>
          <volume>7</volume>
          (
          <issue>2</issue>
          ),
          <fpage>323</fpage>
          -
          <lpage>329</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>