<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>FS-XCS vs. GRD-XCS: An analysis using high-dimensional DNA microarray gene expression data sets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mani Abedini</string-name>
          <email>mabedini@csse.unimelb.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Kirley</string-name>
          <email>mkirley@csse.unimelb.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raymond Chiong</string-name>
          <email>rchiong@csse.unimelb.edu.au</email>
          <email>rchiong@swin.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computing and Information Systems, The University of Melbourne</institution>
          ,
          <addr-line>Victoria 3010</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Higher Education Lilydale, Swinburne University of Technology</institution>
          ,
          <addr-line>Victoria 3140</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <fpage>21</fpage>
      <lpage>32</lpage>
      <abstract>
        <p>XCS, a Genetic Based Machine Learning model that combines reinforcement learning with evolutionary algorithms to evolve a population of classifiers in the form of condition-action rules, has been used successfully for many classification tasks. However, like many other machine learning algorithms, XCS becomes less effective when it is applied to high-dimensional data sets. In this paper, we present an analysis of two XCS extensions - FS-XCS and GRD-XCS - in an attempt to overcome the dimensionality issue. FS-XCS is a standard combination of a feature selection method and XCS. As for GRD-XCS, we use feature quality information to bias the evolutionary operators without removing any features from the data sets. Comprehensive numerical simulation experiments show that both approaches can effectively enhance the learning performance of XCS. While GRD-XCS has obtained significantly more accurate classification results than FS-XCS, the latter has produced much quicker execution time than the former.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Classification tasks arise in many areas of science and engineering. One such
example is disease classification based on gene expression profiles in bioinformatics.
Gene expression profiles provide important insights into, and further our
understanding of, biological processes. They are key tools used in medical diagnosis,
treatment, and drug design [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. From a clinical perspective, the classification
of gene expression data is an important problem and a very active research area
(see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for a review). DNA microarray technology has advanced a great deal
in recent years. It is possible to simultaneously measure the expression levels
of thousands of genes under particular experimental environments and
conditions [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. However, the number of samples tends to be much smaller than the
number of genes (features)1. Consequently, the high dimensionality of a given
      </p>
      <sec id="sec-1-1">
        <title>1 Generally speaking, the number of samples must be larger than the number of fea</title>
        <p>tures for good classification performance.</p>
        <sec id="sec-1-1-1">
          <title>Mani Abedini, Michael Kirley, and Raymond Chiong</title>
          <p>data set poses many statistical and analytical challenges, which often degrade
the performance of classification methods used.</p>
          <p>
            XCS – the eXtended Classifier System – is a Genetic Based Machine Learning
(GBML) method that has been successfully used for a wide variety of
classification applications, including medical data mining. XCS can learn from sample
data in multiple iterative cycles. This is a great characteristic, but it also
exhibits two common pitfalls that most classification methods have: sensitivity to
data noise and “the curse of dimensionality” [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]. Both issues can easily
jeopardise the learning process. A well-known solution is to use a cleansing stage. For
example, feature selection/ranking techniques can remove unnecessary features
from the data set. Reducing the dimensionality and removing noisy features can
improve learning performance. Nevertheless, there exist data sets with highly
co-expressed features, such as those studying Epistasis phenomena, that do not
allow effective feature reduction. Examples of this include protein structure
prediction and protein-protein interaction.
          </p>
          <p>
            In this paper, we study two extensions of XCS inspired by feature selection
techniques commonly used in machine learning: FS-XCS with effective feature
reduction in place and GRD-XCS [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] that does not remove any features. The
proposed model uses some prior knowledge, provided by a feature ranking method,
to bias the discovery operators of XCS. A series of comprehensive numerical
experiments on high-dimensional medical data sets has been conducted. The
results of these simulation experiments suggest that both extensions can effectively
enhance the XCS’s learning performance. While GRD-XCS has performed
significantly more accurate than FS-XCS, the latter is shown to have much quicker
execution time compared to the former.
          </p>
          <p>The remainder of this paper is organised as follows: Section 2 briefly describes
some related work on XCS. In Section 3, we present the details of our proposed
model. Section 4 discusses the experimental settings and results. Finally, we draw
conclusion and highlight future possibilities in Section 5.
2</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        GBML concerns applying evolutionary algorithms (EAs) to machine learning.
EAs belong to the family of nature-inspired optimisation algorithms [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]. As
a manifestation of population-based, stochastic search algorithms that mimic
natural evolution, EAs use genetic operators such as crossover and mutation for
the search process to generate new solutions through a repeated application of
variation and selection [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>It is well documented in the evolutionary computation literature that the
implementation of EA’s genetic operators can influence the trajectory of the
evolving population. However, there has been a paucity of studies focused specifically
on the impact of selected evolutionary operator implementations in Learning
Classifier Systems (LCSs), a type of GBML algorithm for rule induction. Here,
we briefly describe some of the key studies related to LCSs in general and XCS
– a Michigan-style LCS – in particular.</p>
      <p>
        In one of the first studies focused on the rule discovery component specifically
for XCS, Butz et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] have shown that uniform crossover can ensure
successful learning in many tasks. In subsequent work, Butz et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] introduced an
informed crossover operator, which extended the usual uniform operator such
that exchanges of effective building blocks occurred. This approach helped to
avoid the over-generalisation phenomena inherent in XCS [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In other work,
Bacardit et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] customised the GAssist crossover operator to switch between
the standard crossover or a new simple crossover, SX. The SX operator uses a
heuristic selection approach to take a minimum number of rules from the
parents (more than two), which can obtain maximum accuracy. Morales-Ortigosa et
al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] have also proposed a new XCS crossover operator, BLX, which allowed
for the creation of multiple offspring with a diversity parameter to control
differences between offspring and parents. In a more comprehensive overview paper,
Morales-Ortigosa et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] presented a systematic experimental analysis of the
rule discovery component in LCSs. Subsequently, they developed crossover
operators to enhance the discovery component based on evolution strategies with
significant performance improvements.
      </p>
      <p>
        Other work focused on biased evolutionary operators in LCSs include the
work of Jos-Revuelta [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], who introduced a hybridised Genetic Algorithm-Tabu
Search (GA-TS) method that employed modified mutation and crossover
operators. Here, the operator probabilities were tuned by analysing all the fitness
values of individuals during the evolution process. Wang et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] used
Information Gain as part of the fitness function in an EA. They reported improved
results when comparing their model to other machine learning algorithms.
Recently, Huerta et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] combined linear discriminant analysis with a GA to
evaluate the fitness of individuals and associated discriminate coefficients for
crossover and mutation operators. Moore et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] argued that biasing the
initial population, based on expert knowledge preprocessing, would lead to
improved performance of the evolutionary based model. In their approach, a
statistical method, Tuned ReliefF, was used to determine the dependencies between
features to seed the initial population. A modified fitness function and a new
guided mutation operator based on features dependency was also introduced,
leading to significantly improved performance.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>The Model</title>
      <p>We have designed and developed two extensions of XCS, both inspired by
feature selection techniques commonly used in machine learning. The first
extension, which we call FS-XCS, is a combination of a Feature Selection method and
the original XCS. The second extension, which we call GRD-XCS, incorporates
a probabilistically Guided Rule Discovery mechanism for FS-XCS. The
motivation behind both extensions was to improve classifier performance (in terms
of accuracy and execution time), especially for high-dimensional classification
problems.
4</p>
      <p>FS-XCS uses feature ranking methods to reduce the dimension of a given
data set before XCS starts to process the data set. It is a fairly
straightforward hybrid approach. However, in GRD-XCS information gathered from feature
ranking methods is used to build a probability model that biases the
evolutionary operators of XCS. The feature ranking probability distribution values are
recorded in a Rule Discovery Probability (RDP ) vector. Each value of the RDP
vector (∈ [0, 1.0]) is associated with a corresponding feature. The RDP vector
is then used to bias the feature-wise uniform crossover, mutation, and don’t care
operators, which are part of the XCS rule discovery component.</p>
      <p>The actual values in the RDP vector are calculated based on the rank of the
corresponding feature as described below:</p>
      <p>RDPi = ⎨
⎩ ξ
⎧ 1−Ωγ × (Ω − i) + γ if i ≤ Ω
otherwise
(1)
where i represents the rank index in ascending order for the selected top ranked
features Ω. The probability values associated with the top ranked features would
be some relatively large values (∈ [γ, 1]) depending on the feature rank; for the
others a very low probability value ξ is given. Thus, all features have a chance
to participate in the rule discovery process. However, the Ω-top ranked features
have a greater chance of being selected (see Figure 1 for an example).</p>
      <p>GRD-XCS uses the probability values recorded in the RDP vector in the
preprocessing phase to bias the evolutionary operators used in the rule discovery
phase of XCS. The modified algorithms describing the crossover, mutation and
don’t care operators in GRD-XCS are very similar to standard XCS operators:
– GRD-XCS crossover operator: This is a hybrid uniform/n-point function. An
additional check of each feature is carried out before the exchange of genetic
material. If Random[0, 1) &lt; RDP [i] then feature i is swapped between the
selected parents (Algorithm 1).
– GRD-XCS mutation operator: It uses the RDP vector to determine if feature
i is to undergo mutation; the base-line mutation probability is multiplied by
RDP for each feature. Therefore, the mutation probability is not a uniform
distribution anymore. The more informative features have better chance to
be selected for mutation (Algorithm 2).
– GRD-XCS don’t care operator: In this special mutation operator, the values
in the RDP vector are used in the reverse order. That is, if feature i has
been selected to be mutated and Random[0, 1) &lt; (1 − RDP [i]), then feature
i is changed to # (“don’t care”) (see Algorithm 3).</p>
      <p>The application of the RDP vector reduces the crossover and mutation
probabilities for “uninformative” features. However, it increases the “don’t care”
operator probability for the same feature. Therefore, the more informative features
should appear in rules more often than the “uninformative” ones.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <p>We have conducted a series of independent experiments to compare the
performance of FS-XCS and GRD-XCS. A suite of feature selection techniques have</p>
      <sec id="sec-4-1">
        <title>Mani Abedini, Michael Kirley, and Raymond Chiong Table 1. Data set details</title>
        <p>been tested: Correlation based Feature Selection (CFS), Gain Ratio,
Information Gain, One Rule, ReliefF and Support Vector Machine (SVM). Four DNA
microarray gene expression data sets have been used in the experiments. The
details of these data sets are reported in Table 1.</p>
        <p>Our algorithms were implemented in C++, based on the Butz’s XCS code2.
The WEKA package (version 3.6.1)3 was used for feature ranking. All
experiments were performed on the VPAC 4 Tango Cluster server. Tango has 111
computing nodes. Each node is equipped with two 2.3 GHz AMD based quad
core Opteron processors, 32GB of RAM and four 320GB hard drives. Tango’s
operating system is the Linux distribution CentOS (version 5).
4.1</p>
        <p>
          Parameter settings
Default parameter values as recommended in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] have largely been used to
configure the underlying XCS model. For parameters specific to our proposed model,
we have carried out a detailed analysis to determine the optimal settings. In
particular, we have tested a range of Ω values Ω = 10, 20, 32, 64, 128, 256 and
population sizes pop size = 500, 1000, 2000, 5000. The analysis suggested that
Ω = 20 with a population size of 2000 can provide an acceptable accuracy level
within reasonable execution time for FS-XCS. As for GRD-XCS, the setting of
Ω = 128 and pop size = 500 was found to have produced the best results. As
such, these parameter values were used for the results presented in Section 4.3.
        </p>
        <p>The limits used in probability value calculations in Equation 1 were set to
γ = 0.5 and ξ = 0.1. In all experiments, the number of iterations was capped at
5000.
4.2</p>
        <p>Evaluation
For each scenario (parameter value–data set combination), we performed N -fold
cross validation experiments over 100 trials (see Table 1). The average accuracy</p>
        <sec id="sec-4-1-1">
          <title>2 The source code is available at the Illinois Genetic Algorithms Laboratory (IlliGAL)</title>
          <p>site http://www.illigal.org/</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>3 Weka 3 is an open source data mining tool (in Java), with a collection of ma</title>
          <p>chine learning algorithms developed by the Machine Learning Group at University
of Waikato – http://www.cs.waikato.ac.nz/ml/weka/</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>4 Victorian Partnership for Advanced Computing: www.vpac.org</title>
          <p>values for specific parameter combinations have been reported using the Area
Under the ROC Curve – the AUC value. The ROC curve is a graphical way to
depict the tradeoff between the True Positive rate (TPR) on the Y axis and the
False Positive rate (FPR) on the X axis. The AUC values obtained from the
ROC graphs allow for easy comparison between two or more plots. Larger AUC
values represent higher overall accuracy.</p>
          <p>Appropriate statistical analyses using paired t-tests were conducted to
determine whether there were statistically significant differences between particular
scenarios in terms of both accuracy and execution time. Scatter plots of the
observed and fitted values and Q-Q plots were used to verify normality
assumptions. These statistical analyses were performed using the IBM SPSS Statistics
(version 19) software.
4.3</p>
          <p>FS-XCS vs. GRD-XCS
To begin with, we have compared the average accuracy of FS-XCS and
GRDXCS with the base-line XCS (without feature selection) using all the
aforementioned feature ranking methods on the microarray gene expression data sets
listed in Table 1. The results, as shown in Table 2, indicate that GRD-XCS has
an overall better accuracy than FS-XCS: the average FS-XCS accuracy using
various feature selection techniques is 0.88 while the average accuracy of
GRDXCS using the same feature ranking methods is 0.98. Meanwhile, both FS-XCS
and GRD-XCS are better than the base-line XCS – the latter has managed only
an average accuracy of 0.77. For the rest of this section, we will focus on a
detailed comparison between FS-XCS and GRD-XCS.</p>
          <p>Figure 2 shows the AUC values of FS-XCS and GRD-XCS when different
feature ranking methods are used. From the figure, it is clear that GRD-XCS is
significantly more accurate than FS-XCS. The accuracy result of both FS-XCS
and GRD-XCS for every feature ranking method, except Information Gain over
the Breast Cancer data set, is significantly different (p &lt; 0.001).</p>
          <p>In Figure 3, FS-XCS is shown to be significantly faster than GRD-XCS (p &lt;
0.001) in terms of execution time. This is much expected since FS-XCS works
with only a fraction of the original data set size (i.e., 20 features) while GRD-XCS
still accepts the entire data set with thousands of features. The only exception
is when Gain Ratio has been applied over the Breast Cancer data set – in this
case there is strong evidence that both FS-XCS and GRD-XCS have significantly
equal average execution time (p = 0.94).</p>
          <p>Figures 4 and 5 depict some general insight into the population diversity. In
the majority of cases, GRD-XCS has less diversity.
GRD-XCS</p>
          <p>FS-XCS
(c) Leukemia
(d) Colon Cancer</p>
          <p>The average length of each classifier in GRD-XCS is almost always
significantly smaller than FS-XCS (p &lt; 0.05). The significant similar cases are Gain
Ratio (p = 0.80) and ReliefF (p = 0.26) on the Prostate Cancer data set.</p>
          <p>The average number of macro classifiers in GRD-XCS is significantly smaller
than the average number of macro classifiers in FS-XCS. As can be seen in
Figures 5(b) and (d), the difference is getting more obvious when the dimensionality
increases (for Prostate Cancer and Colon Cancer). However, there is a different
story for the Breast Cancer data set where the average number of macro
classifiers in the GRD-XCS population is larger than FS-XCS. It would be a fair
conclusion to say that GRD-XCS is exploring the solution space in a more
focused manner than FS-XCS. In other words, the guided rule discovery approach
forces the learning process to generate less diverse testing hypothesis; however
this behaviour can evolve more accurate classifiers.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>In this paper, we have analysed the performance of FS-XCS and GRD-XCS
based on some high-dimensional classification problems. Comprehensive
numer</p>
      <sec id="sec-5-1">
        <title>FS-XCS vs. GRD-XCS – A comparative study 9</title>
        <p>GRD-XCS
FS-XCS
GRD-XCS</p>
        <p>FS-XCS
(c) Leukemia
(d) Colon Cancer
ical simulations have established that GRD-XCS is significantly more accurate
than FS-XCS in terms of classification results. On the other hand, FS-XCS is
significantly faster than GRD-XCS in terms of execution time. The results of
FS-XCS suggest that normally 20 top-ranked features would be enough to build
a good classifier, although this classifier is significantly less accurate than the
equivalent GRD-XCS model. Nevertheless, both models have performed better
than the base-line XCS.</p>
        <p>To sum up, using feature selection to highlight the more informative features
and using them to guide the XCS rule discovery process is better than applying
feature reduction approaches. This is mainly due to the fact that GRD-XCS can
transform poor classifiers (created from the uninformative features) into highly
accurate classifiers. From the empirical analysis presented it is clear that the
performance of different feature selection techniques varies inevitably depending
on the data set characteristic. Future work will therefore attempt to rectify
this through the idea of ensemble learning. That is, we can build an ensemble
classifier from multiple XCS based models (may it be FS-XCS or GRD-XCS).
Each of these XCS cores can use a distinctive feature selection method. The</p>
      </sec>
      <sec id="sec-5-2">
        <title>Mani Abedini, Michael Kirley, and Raymond Chiong</title>
        <p>results of all XCS cores are then combined to form the ensemble result – for
instance by using a majority voting technique.
11
GRD-XCS</p>
      </sec>
      <sec id="sec-5-3">
        <title>Mani Abedini, Michael Kirley, and Raymond Chiong</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Abedini</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kirley</surname>
          </string-name>
          .
          <article-title>An enhanced XCS rule discovery module using feature ranking</article-title>
          .
          <source>International Journal of Machine Learning and Cybernetics</source>
          ,
          <volume>10</volume>
          .1007/s13042-012
          <source>-0085-9</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>U.</given-names>
            <surname>Alon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Barkai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Notterman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gishdagger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ybarradagger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mackdagger</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Levine</surname>
          </string-name>
          .
          <article-title>Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays</article-title>
          .
          <source>Proc. of the National Academy of Sciences of the USA</source>
          ,
          <volume>96</volume>
          :
          <fpage>6745</fpage>
          -
          <lpage>6750</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Asyali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Colak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Demirkaya</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Inan</surname>
          </string-name>
          .
          <article-title>Gene expression profile classification: A review</article-title>
          .
          <source>Current Bioinformatics</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <fpage>55</fpage>
          -
          <lpage>73</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Bacardit</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Krasnogor</surname>
          </string-name>
          .
          <article-title>Smart crossover operator with multiple parents for a Pittsburgh learning classifier system</article-title>
          .
          <source>In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)</source>
          , pages
          <fpage>1441</fpage>
          -
          <lpage>1448</lpage>
          . ACM Press,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>E. Bonilla</given-names>
            <surname>Huerta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Hernandez Hernandez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L. A. Hernandez</given-names>
            <surname>Montiel</surname>
          </string-name>
          .
          <article-title>A new combined filter-wrapper framework for gene subset selection with specialized genetic operators</article-title>
          .
          <source>In Advances in Pattern Recognition</source>
          , volume
          <volume>6256</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>250</fpage>
          -
          <lpage>259</lpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Butz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pelikan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lloral</surname>
          </string-name>
          , and
          <string-name>
            <given-names>David E.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          .
          <article-title>Automated global structure extraction for effective local building block processing in XCS</article-title>
          .
          <source>Evolutionary Computation</source>
          ,
          <volume>14</volume>
          (
          <issue>3</issue>
          ):
          <fpage>345</fpage>
          -
          <lpage>380</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Butz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Tharakunnel</surname>
          </string-name>
          .
          <article-title>Analysis and improvement of fitness exploitation in XCS: Bounding models, tournament selection, and bilateral accuracy</article-title>
          .
          <source>Evolutionary Computation</source>
          ,
          <volume>11</volume>
          (
          <issue>3</issue>
          ):
          <fpage>239</fpage>
          -
          <lpage>277</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Butz</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Wilson</surname>
          </string-name>
          .
          <article-title>An algorithmic description of XCS</article-title>
          .
          <source>Soft Computing</source>
          ,
          <volume>6</volume>
          (
          <issue>3</issue>
          -4):
          <fpage>144</fpage>
          -
          <lpage>153</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. R. Chiong, editor.
          <source>Nature-Inspired Algorithms for Optimisation</source>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>R.</given-names>
            <surname>Chiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Neri</surname>
          </string-name>
          , and
          <string-name>
            <surname>R. I. McKay.</surname>
          </string-name>
          <article-title>Nature that breeds solutions</article-title>
          . In R. Chiong, editor,
          <article-title>Nature-Inspired Informatics for Intelligent Applications and Knowledge Discovery: Implications in Business, Science</article-title>
          and Engineering, chapter
          <volume>1</volume>
          , pages
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          . Information Science Reference, Hershey, PA,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>R.</given-names>
            <surname>Chiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Weise</surname>
          </string-name>
          , and
          <string-name>
            <surname>Z</surname>
          </string-name>
          . Michalewicz, editors.
          <source>Variants of Evolutionary Algorithms for Real-World Applications</source>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>T. R. Golub</surname>
            ,
            <given-names>D. K.</given-names>
          </string-name>
          <string-name>
            <surname>Slonim</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Tamayo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Huard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gaasenbeek</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          <string-name>
            <surname>Mesirov</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Coller</surname>
            ,
            <given-names>M. L.</given-names>
          </string-name>
          <string-name>
            <surname>Loh</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          <string-name>
            <surname>Downing</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Caligiuri</surname>
            , and
            <given-names>C. D.</given-names>
          </string-name>
          <string-name>
            <surname>Bloomfield</surname>
          </string-name>
          .
          <article-title>Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring</article-title>
          .
          <source>Science</source>
          ,
          <volume>286</volume>
          :
          <fpage>531</fpage>
          -
          <lpage>537</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. I.
          <string-name>
            <surname>Hedenfalk</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Duggan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Radmacher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bittner</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Simon</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Meltzer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Gusterson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Esteller</surname>
            ,
            <given-names>O. P.</given-names>
          </string-name>
          <string-name>
            <surname>Kallioniemi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Wilfond</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Borg</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Trent</surname>
          </string-name>
          .
          <article-title>Gene-expression profiles in hereditary breast cancer</article-title>
          .
          <source>The New England Journal of Medicine</source>
          ,
          <volume>344</volume>
          (
          <issue>8</issue>
          ):
          <fpage>539</fpage>
          -
          <lpage>548</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>P. L.</given-names>
            <surname>Lanzi</surname>
          </string-name>
          .
          <article-title>A study of the generalization capabilities of XCS</article-title>
          . In Thomas B¨ack, editor,
          <source>Proceedings of the 7th International Conference on Genetic Algorithms</source>
          , pages
          <fpage>418</fpage>
          -
          <lpage>425</lpage>
          . Morgan Kaufmann,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Moore</surname>
          </string-name>
          and
          <string-name>
            <given-names>B. C.</given-names>
            <surname>White</surname>
          </string-name>
          .
          <article-title>Exploiting expert knowledge in genetic programming for genome-wide genetic analysis</article-title>
          .
          <source>In PPSN</source>
          , volume
          <volume>4193</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>969</fpage>
          -
          <lpage>977</lpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>S.</given-names>
            <surname>Morales-Ortigosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Orriols-Puig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and E.</given-names>
            <surname>Bernad</surname>
          </string-name>
          <article-title>´o-</article-title>
          <string-name>
            <surname>Mansilla</surname>
          </string-name>
          .
          <article-title>New crossover operator for evolutionary rule discovery in XCS</article-title>
          .
          <source>In Proceedings of the 8th International Conference on Hybrid Intelligent Systems</source>
          , pages
          <fpage>867</fpage>
          -
          <lpage>872</lpage>
          . IEEE Computer Society,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>S.</given-names>
            <surname>Morales-Ortigosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Orriols-Puig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and E.</given-names>
            <surname>Bernad</surname>
          </string-name>
          <article-title>´o-Mansilla. Analysis and improvement of the genetic discovery component of XCS</article-title>
          .
          <source>International Journal of Hybrid Intelligent Systems</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ):
          <fpage>81</fpage>
          -
          <lpage>95</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>L. M. San</surname>
          </string-name>
          Jose-Revuelta.
          <article-title>A Hybrid GA-TS Technique with Dynamic Operators and its Application to Channel Equalization and Fiber Tracking. I-Tech Education</article-title>
          and Publishing,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>D.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Febbo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G</given-names>
            .
            <surname>Jackson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Manola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ladd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tamayo</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Renshaw</surname>
          </string-name>
          .
          <article-title>Gene expression correlates of clinical prostate cancer behavior</article-title>
          .
          <source>Cancer Cell</source>
          ,
          <volume>1</volume>
          :
          <fpage>203</fpage>
          -
          <lpage>209</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Weise</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Chiong</surname>
          </string-name>
          .
          <article-title>Novel evolutionary algorithms for supervised classification problems: An experimental study</article-title>
          .
          <source>Evolutionary Intelligence</source>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>16</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>F.-X. Wu</surname>
            ,
            <given-names>W. J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            , and
            <given-names>A. J.</given-names>
          </string-name>
          <string-name>
            <surname>Kusalik</surname>
          </string-name>
          .
          <article-title>On determination of minimum sample size for discovery of temporal gene expression patterns</article-title>
          .
          <source>In Proceedings of the First International Multi-Symposiums on Computer and Computational Sciences</source>
          , pages
          <fpage>96</fpage>
          -
          <lpage>103</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Rajapakse</surname>
          </string-name>
          .
          <source>Machine Learning in Bioinformatics. Wiley Series in Bioinformatics. 1'st edition</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>