<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>[6] Yoav Freund and Robert E. Schapire. A decision-
theoretic generalization of on-line learning and an
application to boosting. Journal of Computer and
System Sciences</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1573-0565</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/BF00116037</article-id>
      <title-group>
        <article-title>Sequential Model Building in Symbolic Regression</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Žegklitz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petr Pošík</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Czech Academy of Sciences, Institute of Computer Science</institution>
          ,
          <addr-line>Pod Vodárenskou veˇží 271/2, 182 07 Praha 8, Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Czech Technical University in Prague, Faculty of Electrical Engineering</institution>
          ,
          <addr-line>Technická 2, 166 27 Praha 6, Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <volume>1</volume>
      <fpage>77</fpage>
      <lpage>80</lpage>
      <abstract>
        <p>Symbolic Regression is a supervised learning technique for regression based on Genetic Programming. A popular algorithm is the Multi-Gene Genetic Programming which builds models as a linear combination of a number of components which are all built together. However, in recent years a different approach emerged, represented by the Sequential Symbolic Regression algorithm, which builds the model sequentially, one component at a time, and the components are combined using a method based on geometric semantic crossover. In this article we show that the SSR algorithm effectively produces linear combination of components and we introduce another sequential approach very similar to classical ensemble method of boosting. All algorithms are compared with MGGP as a baseline on a number of real-world datasets. The results show that the sequential approaches are overall worse than MGGP both in terms of accuracy and model size.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Symbolic Regression (SR) is a supervised learning task
with the goal to find a mathematical function (preferably a
simple one) of a number of variables that fits the training
data available for training. Regression is a well known task
with a number of well known, well tested and successful
approaches, e.g. neural networks, support vector machines
and random forests to name a few. However, these
conventional approaches produce models which, while
useful, are often essentially black boxes which are difficult to
interpret. One of the goals of SR is to produce
“whitebox” models in the form of a mathematical expression.
The overwhelming majority of SR approaches utilize some
form of Genetic Programming (GP) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>This article is structured as follows: Section 2
introduces previous research relevant for this article and and
provide a new view on the main algorithm examined in
this article, in Section 3 we propose a simple
boostingbased sequential SR algorithm, Sections 4 and 5 present
the experiments and results respectively and provide a
discussion of these results. Finally, Section 6 concludes the
article and proposes further research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>In this section we briefly revisit important approaches and
algorithms related to the sequential model building and to
the experiments performed later in this article.</p>
      <p>First we introduce Multi-Gene Genetic Programming
which is not a sequential algorithm but is important for
this article. Then we briefly revisit boosting, a classical
machine learning ensemble method. Finally we introduce
the most important related research, the Sequential
Symbolic Regression algorithm.
2.1</p>
      <sec id="sec-2-1">
        <title>Multi-Gene Genetic Programming</title>
        <p>
          Multi-Gene GP (MGGP) [
          <xref ref-type="bibr" rid="ref2 ref3">2, 3, 4</xref>
          ] is an extension of
classical GP for symbolic regression. The main idea behind
MGGP is that each individual is composed of one or more
independent trees, called genes, and these then form the
complete model together. To make the complete model,
the outputs of the genes are linearly combined with the
coefficients determined using linear regression. In this
article, we call this linear combination of genes a “top-level
linear combination”.
        </p>
        <p>MGGP does not build the model sequentially. However,
since we use the concept of top-level linear combination
to put SSR (see the next section) in a similar framework
and since we use the algorithm in the experiments, it is
necessary that it is mentioned.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Boosting</title>
        <p>Building predictive models sequentially is a well known
machine learning technique. The prominent example
of this approach is the boosting ensemble method [5, 6, 7].
Boosting in the context of least-squares regression (i.e.
regression using the squared error as the criterion) works
in the following way (simplified):
1. Start with a constant model f0(x) = c, e.g. using the
mean target value: f0(x) = y¯
2. Iterate for k = 1; 2; :::; M, where M is the desired
number of iterations:
(a) Compute the residuals y˜ = y
fk 1(x).
(b) Train a model component h(x) using y˜ as the
target values.
(c) Update the complete model fk(x) = fk 1(x) +
h(x).</p>
        <sec id="sec-2-2-1">
          <title>3. fM(x) is the final model.</title>
          <p>One possible view is that the latter model components
correct the mistakes of the previous ones.
2.3</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Sequential Symbolic Regression</title>
        <p>
          SSR [
          <xref ref-type="bibr" rid="ref4 ref5">8, 9</xref>
          ] is a GP-based method that builds the model
in a sequential fashion. The overall structure of the SSR
method is following:
1. An SR algorithm (e.g. GP) is executed with the
training set, producing a model component.
2. The model component is added to the complete
model and the training data is modified.
3. Unless stopping criteria are met, go to step 1 and
repeat, using the modified training data.
        </p>
        <p>This structure is somewhat similar to boosting – a model
component is trained, added to the complete model and the
next one is trained using a modified training data.
However, SSR does not do boosting. The difference is in step 2,
i.e. how is the new component combined with the other
components of the complete model and how is the training
data modified.</p>
        <p>
          For the first iteration, the target values are not
modified and the new component simply becomes the complete
model as at the time it is the only component. In the
subsequent iterations, SSR utilizes the Geometric Semantic
Crossover (GSX) from Geometric Semantic Genetic
Programming (GSGP) [
          <xref ref-type="bibr" rid="ref6">10</xref>
          ] to add the new component to the
complete model. GSX performs a convex combination of
two functions:
f (x) = r f (x) + (1
r) f 0(x)
(1)
where r is a random number from the interval [0; 1), and f
and f 0 are the two combined functions.
        </p>
        <p>In SSR, the GSX is employed in a different way than
to combine two known functions. Only one of the two
functions is known (the component from the previous
iteration, i.e. f ). SSR therefore performs an “incomplete”
GSX, meaning that f 0 is only a placeholder for a function
to be found in the next iteration using a modified training
data. The training data is derived by the following
reasoning. The result of the GSX should optimally produce zero
error, i.e. f (x) = y. Substituting this into Equation 1 and
slightly rearranging it, the following equation is obtained
f 0(x) =
y</p>
        <p>r f (x)
1 r
:
(2)
This is, in effect, a requirement on how the outputs of f 0
should look like, i.e. it is a definition of its target values for
the next iteration. Therefore the problem is transformed
and the next iteration with different training data is started.
It is important to note that the convex combinations are
“nested” in depth in the second term of the GSX
expression. In the last iteration, the f 0 placeholder is not replaced
with another GSX result but with the function found in that
iteration, terminating the nesting. The complete SSR
procedure has the following form:
1. y1 = y, k</p>
        <p>1
2. Run the base algorithm with yk as target values,
producing a component fk.
3. Sample a random number rk 2 [0; 1).
4. Update the complete model:
if k = 1 and it is not the last iteration:
f = rk fk + (1 rk) f 0
if k = 1 and it is the last iteration:
f = fk, terminate
if k &gt; 1 and it is not the last iteration:
f 0 rk fk + (1 rk) f 0
if k &gt; 1 and it is the last iteration:
f 0 fk, terminate
5. Adjust target values yk+1 = yk rk fk(x)
1 rk
6. k k + 1, repeat from step 2 unless a stopping
condition is met.</p>
        <sec id="sec-2-3-1">
          <title>7. return f where f 0 is the placeholder. The nesting happens in the third case in step 4, where the placeholder is replaced with the GSX of the new function and the placeholder again.</title>
          <p>2.4</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>SSR as Top-Level Linear Combination</title>
        <p>In previous subsection we have briefly described the SSR
procedure and how it constructs the models via GSX. Now
we rewrite the procedure in such a way that the result
is a linear combination of expressions as in MGGP and
Boost-GP.</p>
        <p>By iterating Equation 1 (with added subscripts to
indicate the iteration number) and expanding the
multiplication at each step, we can see how the model looks like as
if it was finished at that iteration, i.e. in the second and
fourth case of step 4 of the SSR procedure from the
previous subsection (for the sake of simplicity, let r¯k = 1 rk):
.
.</p>
        <p>.
k = 1 : f1 = f1
k = 2 : f2 = r1 f1 + r¯1 f2
k = 3 : f3 = r1 f1 + r¯1(r2 f2 + r¯2 f3)</p>
        <p>= r1 f1 + r¯1r2 f2 + r¯1r¯2 f3
k = 4 : f4 = r1 f1 + r¯1r2 f2 + r¯1r¯2(r3 f3 + r¯3 f4)
= r1 f1 + r¯1r2 f2 + r¯1r¯2r3 f3 + r¯1r¯2r¯3 f4
(3)</p>
        <p>From the above iteration we can see that the model
in fact is a linear combination of several expressions. We
can also see a pattern of the multiplicative coefficients:
each time a component fk is added to the model, the
coefficient of component fk 1 is multiplied by rk 1 and the
coefficient of the new component fk is the coefficient of
component fk 1 multiplied by r¯k 1. This allows for
efficient implementation using a list of expressions and a
corresponding list of their coefficients, just multiplying a
number in the second list and adding an element to the end
of both lists. This view also allows further manipulation
of the coefficients and model components in a way similar
to MGGP, e.g. performing an additional linear regression.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Boosting-GP</title>
      <p>In the previous section we briefly introduced Sequential
Symbolic Regression algorithm (SSR) which utilizes
Geometric Semantic Crossover to sequentially combine the
components of the complete model and to derive modified
training data for subsequent iterations. We propose an
alternative approach that is almost identical to the boosting
scheme (see Section 2.2), hence we call it Boost-GP.</p>
      <p>The procedure matches that of boosting we already
described, except for a few details. In Boost-GP, we do
not start with a constant model, instead we start with no
prior model at all and the first component is found using
GP too. The procedure that finds the individual
components is a GP-based algorithm. If the underlying algorithm
produces a linear combination of several expressions (as
MGGP does), those are treated as several components and
are added to the model at the same time, summing possible
constant offsets. Other than these details, the procedure is
exactly the same as that of boosting.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>We perform a series of experiments with the aim to
compare the performance of a number of algorithms both
in terms of model accuracy and complexity. First we
describe the algorithms used in the experiments and their
settings, then we describe the datasets and finally we describe
the experimental protocol.
4.1</p>
      <sec id="sec-4-1">
        <title>Algorithms</title>
        <p>
          For the experimental evaluation we selected a number
of algorithms and their variants based on SSR, boosting
with GP, MGGP and linear regression as a baseline.
MGGP This algorithm represents an algorithm that
produces models composed of multiple components but
which are all evolved simultaneously.
1G The same as MGGP but the individuals are limited
to a single gene, making it effectively a Scaled Symbolic
Regression algorithm [
          <xref ref-type="bibr" rid="ref7">11</xref>
          ] (not to be confused with
Sequential Symbolic Regression that is discussed in this
article, which has the same abbreviation SSR). In order to
provide similar complexity, the tree depth limit is increased
to allow for at least the same number of nodes as MGGP.
SSR-GP, SSR-1G Sequential Symbolic Regression
algorithm (see Section 2.3) with ordinary GP and 1G (see
above) as the base algorithm respectively.
nSSR-GP, nSSR-1G Similar to SSR-GP and SSR-1G but
using the normalization and denormalization as described
in [
          <xref ref-type="bibr" rid="ref5">9</xref>
          ].
        </p>
        <p>SSR with final fitting All the SSR-based algorithms
mentioned above (coded identically but then suffixed with the
word “fit”, e.g. “SSR-GP fit”) are also used in such way
that after the algorithm completes, the coefficients of the
individual components are recalculated using linear
regression in the same manner as is used in MGGP.
Boost-GP, Boost-1G Boosting-type sequential approach
(see Section 3) with ordinary GP and 1G (see above) as the
base algorithm respectively.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Algorithm settings</title>
        <p>The description of settings of the algorithms (except for
LR which has no settings) follows. MGGP, SSR and Boost
algorithms are set to produce a model of 10 components
with the maximum depth of 10. That means that MGGP
has the gene number limit set to 10 and SSR and Boost
algorithms are run for 10 iterations. The stopping
criterion for MGGP is a wall-clock time limit of 1800 s, for
SSR and Boost algorithms it is the number of 10
iterations of which each has a wall-clock time limit of 180 s,
therefore all algorithms have an equal amount of time
available to find a good solution. The 1G algorithm
produces only one component and in order to have roughly
the same amount of nodes available, the maximum depth
is set to 16. 1G is also run for 1800 s. Other settings which
are common for the algorithms are described in Table 1.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Datasets</title>
        <p>LR The first algorithm that serves purely as a baseline
is an ordinary least-squares linear regression on the
problem’s features. Its purpose is to frame all the other
algorithms to a well established context.</p>
        <p>
          We use four real-world datasets freely available from the
UCI repository [
          <xref ref-type="bibr" rid="ref8">12</xref>
          ]. The datasets are described in the
following paragraphs. Summary information about these
datasets is in Table 2.
        </p>
        <p>parameter
population size</p>
        <p># of elites
tournament size
prob. of mutation
prob. of crossover
constant/subtree mutation</p>
        <p>ratio
s of constant mutation</p>
        <p>function set
dataset</p>
        <p># of dimensions
ASN
CCS
ENC
ENH</p>
        <p>ASN Airfoil self-noise (ASN) is a 5-dimensional dataset
describing acoustic pressure for various airfoils during
wind tunnel tests.</p>
        <p>CCS Concrete compressive strength (CCS) is
an 8-dimensional dataset describing compressive strength
of concrete based on its ingredients.</p>
        <p>ENC, ENH Energy efficiency of cooling/heating (ENC,
ENH) are 8-dimensional datasets describing energy
efficiency of cooling and heating buildings.
4.4</p>
      </sec>
      <sec id="sec-4-4">
        <title>Experimental protocol</title>
        <p>Each dataset is randomly split into training/testing subset
25 times. Each algorithm is run once for each split of each
dataset, i.e. 25 runs per algorithm per dataset in total. For
each of the 25 runs of an algorithm on a dataset a different
seed for the random number generator used by the
algorithm (except for LR which is fully deterministic).</p>
        <p>During the algorithm run, only the training set is used.
After the run completes, the models produced by the
algorithms are evaluated on the testing set. Performance
is measured using the coefficient of determination, or R2
value
800
10 % of population
4 % of population
0.2
0.7
0.3/0.7</p>
        <p>10
a + b, a b, a b,
p1a+b2 , ea, jaj, sin a, a2</p>
        <p># of samples
training testing
score.1 The model complexity is measured as the number
of nodes in the model. The coefficients of the linear
combination that combines the components of the model is not
counted towards this number. These two measures
(performance and complexity) are then aggregated over the 25
runs of each algorithm for each dataset.</p>
        <p>All runs were performed on machines of identical
configuration that were part of the MetaCentrum grid (see
Acknowledgements).
In this section we present the results of the experiments.
As described in Section 4.4, we focus on the performance
(R2) the complexity (number of nodes) of the models.
Test-set performance along with complexity is depicted
in Figure 1. The result is displayed in the form of two
crossing lines whose intersection is at the median R2 and
median number of nodes, and they stretch from 1st to 3rd
quartile in both the measures. SSR algorithms of the same
configuration except for final fitting are plotted in the same
color for easier visual assessment of the effect of final
fitting.
From Figure 1 it is clear that MGGP produces the
smallest models for all four datasets. We hypothesize that this
is caused by the fact that all the components are evolved
together so they can complement each other while in the
other algorithms (except for 1G) each component is forced
to evolve on its own. When the component is being
evolved, it doesn’t “know” about the other components so
the evolution tries to solve the current problem with the
single component and it produces a big component as it
tries to do that. When it is time for the next component, the
previous one is fixed and it cannot get smaller anymore.</p>
        <p>In terms of test-set performance, MGGP, Boost-GP and
Boost-1G are the best performers overall, with (n)SSR-1G
fit being close. On the other hand, SSR-GP and nSSR-GP
are, overall, the worst performers, performing even worse
than LR in some cases. Even though the Boost algorithms
are competitive in terms of performance, they are in line
with the SSR algorithms regarding the number of nodes,
which is much larger than that of MGGP or 1G.</p>
        <p>The SSR algorithms, especially the variants without
final fitting, are performing rather poorly compared to the
other algorithms except for 1G and LR. It can be seen
that the final fitting improves the performance
considerably. We hypothesize that this is caused by the fact that
the original coefficients are chosen randomly and from a
bounded interval. Even though the target values are
modified in such way that the result should be close to optimum
if the new component models the new target values well,</p>
        <p>Boost-1G
SSR-GP
SSR-GP fit</p>
        <p>SSR-1G
SSR-1G fit
nSSR-GP
nSSR-GP fit
nSSR-1G
nSSR-1G fit
SSR-1G
SSR-1G fit
nSSR-GP
nSSR-GP fit
nSSR-1G
nSSR-1G fit
1G
MGGP
Boost-GP</p>
        <p>Boost-1G
SSR-GP
SSR-GP fit</p>
        <p>SSR-1G
SSR-1G fit
nSSR-GP
nSSR-GP fit
nSSR-1G
nSSR-1G fit
1G
MGGP
Boost-GP</p>
        <p>800 1000
no. of nodes
(a) ASN
Boost-1G
SSR-GP</p>
        <p>SSR-GP fit
400
600
1200
1400
200
400
600
1200
1400
1600
the contribution of previous component can be multiplied
by a very small number which makes the new target values
not much “easier” than for previous components. Also,
looking at Equation 3, each component is multiplied by a
number of values, each smaller than or equal to 1 and the
later the component is added the more such multipliers it
has and therefore it contributes less and less to the final
model. However, SSR-1G (and its fitted version) perform
much better. The reason is that with 1G, the component
is scaled by a factor, mitigating the just described issue to
some extent.</p>
        <p>Another notable pattern is that algorithms using 1G as
the base algorithm perform considerably better than those
using GP, which is to be expected since even the simple
linear scaling can considerably reduce the error, especially
if the inner structure of the component is good but it only
lacks such scaling.</p>
        <p>The final interesting pattern is that the normalization in
SSR has almost no effect in terms of performance when
the base algorithm is 1G, and quite inconsistent effect
when the base algorithm is GP. The reason of the small
effect with 1G is again the fact that 1G introduces its
own multiplicative and additive constant so the
normalization and denormalization coefficients become redundant to
some extent.
Figure 2 displays the results on training and testing set side
by side. With a few exceptions, it can be said that no
algorithm suffered from overfitting as the testing performance
values are about as much smaller than the training ones
as is exhibited by the linear regression. The most notable
exceptions are the Boost algorithms, especially on ASN
and CCS datasets. Boost-1G produced two significant
outliers in both datasets and the other testing values are also
generally smaller than the training ones. It is best seen in
the CCS dataset where Boost-1G is the best performer by
training performance but the testing performance is
significantly2 worse than the training performance.</p>
        <p>With the exception of nSSR-GP fit on the CCS dataset,
all the SSR algorithms have, overall, the smallest
difference in training and testing performance. In some cases
the testing performance is even better than the training
performance. This could indicate potential underfitting, either
of the whole algorithm or the individual components.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>In this article we reviewed the idea of sequential approach
to symbolic regression. We reviewed an existing approach,
called Sequential Symbolic Regression, which is unique
by using Geometric Semantic Crossover to combine the
individual model components and derive training data for
subsequent component evolution.</p>
      <p>We have shown that the models SSR produces are
effectively a linear combination of a number of
components, similar to MGGP, but constructed by a different
approach. We have also proposed a simple boosting-inspired
approach which uses the residuals directly as the
training target values for the next component and combines the
components just by adding them together.</p>
      <p>We have performed a series of experiments with the
algorithms, using different base algorithms of GP and 1G,
and different variants of SSR, most importantly with final
fitting of the coefficients with linear regression. The
results have shown that MGGP, a non-sequential algorithm,
is the best or one of the best algorithms on each dataset
and producing by far the smallest models. Of the
sequential algorithms, the SSR is, overall, the worst, except for
the variants using final fitting and 1G as the base, which
are comparable to the boosting-inspired algorithms on two
of the four datasets.</p>
      <p>Except for the boosting-inspired algorithms in some
cases, no algorithm experienced overfitting, which might
be a useful information for some users, and it also shows
that big and complex models don’t necessarily mean
overfitting.</p>
      <p>2Mann-Whitney U test with significance level a = 0:01.
This article investigated the sequential algorithms in their
basic form only, using a predefined number of components
and a predefined scheme of the evolution of the individual
components. However, the sequential algorithms naturally
lend themselves to tuning and tweaking. One of
possible approaches is not to switch components regularly but
based on some other scheme, e.g. the performance of the
current component. Another approach worth investigating
might be using full MGGP as the base algorithm in the
sequential algorithms which would mean that multiple
components will be produced at each stage, not just one. This
would allow for a “continuum” of algorithms with MGGP
on one side as a totally non-sequential algorithm, and SSR
or boosting-inspired algorithms as described in this article
on the other side, as totally sequential algorithms.</p>
      <p>Another area worth exploring is the interplay of the
complexity of individual components or limits on their
size, the number of components, and the amount of
computation resources available for a single component. The
classical boosting works with so-called weak learners,
i.e. models which by themselves are not capable of
solving the problem to any reasonable degree, and uses many
of such models. A similar approach could be used for the
sequential approaches reviewed in this article, by setting
tighter limits on the complexity and/or computational
resources for the evolution of each component.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The reported research was supported by the Czech Science
Foundation grant No. 17-01251. Access to computing and
storage facilities owned by parties and projects
contributing to the National Grid Infrastructure MetaCentrum,
provided under the programme “Projects of Large Research,
Development, and Innovations Infrastructures ” (CESNET
LM2015042), is greatly appreciated.
1.0
0.8
2R00..64
0.2
0.0
1.0
0.8
2 0.6
R
0.4
0.2
R
L</p>
      <p>P PG
G
G t
M soo</p>
      <p>B
-SS1RG if-tSS1RG -PnSSRG if-tPnSSRG</p>
      <p>G ift
-nSS1R -SS1RG
n
(c) ENC</p>
      <p>t
P i
-RG fPG
S
S SR</p>
      <p>S</p>
      <p>t
G if
-1R 1G
S
S SR</p>
      <p>S</p>
      <p>t
P i
-nSSRG f-PSSRG
n</p>
      <p>t
G if
-nSS1R -SS1RG
n
(b) CCS; two outlier values in testing Boost-1G at -127797 and
-85.6 and one value in testing nSSR-GP fit at -2.91 are not shown.
1.00
0.95
2 0.90
R
0.85
0.80
-SS1RG if-tSS1RG -PnSSRG if-tPnSSRG</p>
      <p>G ift
-nSS1R -SS1RG
n
(d) ENH
[7] Trevor Hastie, Robert Tibshirani, and Jerome
Friedman. The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. Springer-Verlag
New York, 2 edition, 2009. ISBN
978-0-387-848587.
2015. ISBN 978-3-319-16030-6. doi: 10.1007/
978-3-319-16030-6_5. URL https://doi.org/
10.1007/978-3-319-16030-6_5.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>John</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Koza</surname>
          </string-name>
          .
          <article-title>Genetic Programming: On the Programming of Computers by Means of Natural Selection</article-title>
          . MIT Press, Cambridge, MA, USA,
          <year>1992</year>
          . ISBN 0-262-11170-
          <fpage>5</fpage>
          . URL http://mitpress. mit.edu/books/genetic-programming.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Mark</given-names>
            <surname>Hinchliffe</surname>
          </string-name>
          , Hugo Hiden,
          <string-name>
            <surname>Ben</surname>
            <given-names>McKay</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Mark</given-names>
            <surname>Willis</surname>
          </string-name>
          , Ming Tham, and
          <string-name>
            <given-names>Geoffery</given-names>
            <surname>Barton</surname>
          </string-name>
          .
          <article-title>Modelling chemical process systems using a multi-gene genetic programming algorithm</article-title>
          .
          <source>In Late Breaking Paper, GP'96</source>
          , pages
          <fpage>56</fpage>
          -
          <lpage>65</lpage>
          , Stanford, USA,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Dominic</surname>
            <given-names>P Searson</given-names>
          </string-name>
          , David E Leahy, and
          <string-name>
            <surname>Mark J Willis. GPTIPS:</surname>
          </string-name>
          <article-title>an open source genetic programming toolbox for multigene symbolic regression</article-title>
          .
          <source>In Proceedings of the International MultiConference of</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Luiz</given-names>
            <surname>Otávio V.B. Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Fernando E.B.</given-names>
            <surname>Otero</surname>
          </string-name>
          , Gisele L.
          <string-name>
            <surname>Pappa</surname>
            , and
            <given-names>Julio</given-names>
          </string-name>
          <string-name>
            <surname>Albinati</surname>
          </string-name>
          .
          <source>Sequential Symbolic Regression with Genetic Programming</source>
          , pages
          <fpage>73</fpage>
          -
          <lpage>90</lpage>
          . Springer International Publishing, Cham,
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L. O. V. B.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. E. B.</given-names>
            <surname>Otero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. F.</given-names>
            <surname>Miranda</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Pappa</surname>
          </string-name>
          .
          <article-title>Revisiting the sequential symbolic regression genetic programming</article-title>
          .
          <source>In 2016 5th Brazilian Conference on Intelligent Systems (BRACIS)</source>
          , pages
          <fpage>163</fpage>
          -
          <lpage>168</lpage>
          ,
          <year>Oct 2016</year>
          . doi:
          <volume>10</volume>
          .1109/BRACIS.
          <year>2016</year>
          .
          <volume>039</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Alberto</surname>
            <given-names>Moraglio</given-names>
          </string-name>
          , Krzysztof Krawiec, and ColinG. Johnson.
          <article-title>Geometric semantic genetic programming</article-title>
          .
          <source>In CarlosA.Coello Coello</source>
          , Vincenzo Cutello, Kalyanmoy Deb, Stephanie Forrest, Giuseppe Nicosia, and Mario Pavone, editors,
          <source>Parallel Problem Solving from Nature - PPSN XII</source>
          , volume
          <volume>7491</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>21</fpage>
          -
          <lpage>31</lpage>
          . Springer Berlin Heidelberg,
          <year>2012</year>
          . ISBN 978-3-
          <fpage>642</fpage>
          -32936-4. doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>642</fpage>
          -32937-1\_3. URL http://dx.doi. org/10.1007/978-3-
          <fpage>642</fpage>
          -32937-
          <issue>1</issue>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Maarten</given-names>
            <surname>Keijzer</surname>
          </string-name>
          .
          <article-title>Scaled symbolic regression</article-title>
          .
          <source>Genetic Programming and Evolvable Machines</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):
          <fpage>259</fpage>
          -
          <lpage>269</lpage>
          ,
          <year>2004</year>
          . ISSN 1573-
          <fpage>7632</fpage>
          . doi:
          <volume>10</volume>
          .1023/B:GENP.
          <volume>0000030195</volume>
          .77571.f9. URL http://dx.doi.org/10.1023/B: GENP.
          <volume>0000030195</volume>
          .77571.
          <year>f9</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bache</surname>
          </string-name>
          and
          <string-name>
            <surname>M. Lichman.</surname>
          </string-name>
          <article-title>UCI machine learning repository</article-title>
          ,
          <year>2013</year>
          . http://archive.ics.uci.edu/ml.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>