=Paper=
{{Paper
|id=Vol-2473/paper5
|storemode=property
|title=Sequential Model Building in Symbolic Regression
|pdfUrl=https://ceur-ws.org/Vol-2473/paper5.pdf
|volume=Vol-2473
|authors=Jan Žegklitz,Petr Pošík
|dblpUrl=https://dblp.org/rec/conf/itat/ZegklitzP19
}}
==Sequential Model Building in Symbolic Regression==
Sequential Model Building in Symbolic Regression
Jan Žegklitz1,2 and Petr Pošík1
1 Czech Technical University in Prague, Faculty of Electrical Engineering,
Technická 2, 166 27 Praha 6, Prague, Czech Republic
2 Czech Academy of Sciences, Institute of Computer Science,
Pod Vodárenskou věží 271/2, 182 07 Praha 8, Prague, Czech Republic
Abstract: Symbolic Regression is a supervised learning 2 Related Work
technique for regression based on Genetic Programming.
A popular algorithm is the Multi-Gene Genetic Program- In this section we briefly revisit important approaches and
ming which builds models as a linear combination of a algorithms related to the sequential model building and to
number of components which are all built together. How- the experiments performed later in this article.
ever, in recent years a different approach emerged, repre- First we introduce Multi-Gene Genetic Programming
sented by the Sequential Symbolic Regression algorithm, which is not a sequential algorithm but is important for
which builds the model sequentially, one component at a this article. Then we briefly revisit boosting, a classical
time, and the components are combined using a method machine learning ensemble method. Finally we introduce
based on geometric semantic crossover. In this article the most important related research, the Sequential Sym-
we show that the SSR algorithm effectively produces lin- bolic Regression algorithm.
ear combination of components and we introduce another
sequential approach very similar to classical ensemble
method of boosting. All algorithms are compared with 2.1 Multi-Gene Genetic Programming
MGGP as a baseline on a number of real-world datasets. Multi-Gene GP (MGGP) [2, 3, 4] is an extension of clas-
The results show that the sequential approaches are over- sical GP for symbolic regression. The main idea behind
all worse than MGGP both in terms of accuracy and model MGGP is that each individual is composed of one or more
size. independent trees, called genes, and these then form the
complete model together. To make the complete model,
1 Introduction the outputs of the genes are linearly combined with the
coefficients determined using linear regression. In this ar-
Symbolic Regression (SR) is a supervised learning task ticle, we call this linear combination of genes a “top-level
with the goal to find a mathematical function (preferably a linear combination”.
simple one) of a number of variables that fits the training MGGP does not build the model sequentially. However,
data available for training. Regression is a well known task since we use the concept of top-level linear combination
with a number of well known, well tested and successful to put SSR (see the next section) in a similar framework
approaches, e.g. neural networks, support vector machines and since we use the algorithm in the experiments, it is
and random forests to name a few. However, these con- necessary that it is mentioned.
ventional approaches produce models which, while use-
ful, are often essentially black boxes which are difficult to 2.2 Boosting
interpret. One of the goals of SR is to produce “white-
box” models in the form of a mathematical expression. Building predictive models sequentially is a well known
The overwhelming majority of SR approaches utilize some machine learning technique. The prominent example
form of Genetic Programming (GP) [1]. of this approach is the boosting ensemble method [5, 6, 7].
This article is structured as follows: Section 2 intro- Boosting in the context of least-squares regression (i.e. re-
duces previous research relevant for this article and and gression using the squared error as the criterion) works
provide a new view on the main algorithm examined in in the following way (simplified):
this article, in Section 3 we propose a simple boosting-
based sequential SR algorithm, Sections 4 and 5 present 1. Start with a constant model f0 (x) = c, e.g. using the
the experiments and results respectively and provide a dis- mean target value: f0 (x) = ȳ
cussion of these results. Finally, Section 6 concludes the 2. Iterate for k = 1, 2, ..., M, where M is the desired
article and proposes further research. number of iterations:
(a) Compute the residuals ỹ = y − fk−1 (x).
Copyright c 2019 for this paper by its authors. Use permitted un-
der Creative Commons License Attribution 4.0 International (CC BY (b) Train a model component h(x) using ỹ as the
4.0). target values.
(c) Update the complete model fk (x) = fk−1 (x) + and the next iteration with different training data is started.
h(x). It is important to note that the convex combinations are
“nested” in depth in the second term of the GSX expres-
3. fM (x) is the final model. sion. In the last iteration, the f 0 placeholder is not replaced
One possible view is that the latter model components cor- with another GSX result but with the function found in that
rect the mistakes of the previous ones. iteration, terminating the nesting. The complete SSR pro-
cedure has the following form:
2.3 Sequential Symbolic Regression 1. y1 = y, k ← 1
SSR [8, 9] is a GP-based method that builds the model 2. Run the base algorithm with yk as target values, pro-
in a sequential fashion. The overall structure of the SSR ducing a component fk .
method is following:
3. Sample a random number rk ∈ [0, 1).
1. An SR algorithm (e.g. GP) is executed with the train-
4. Update the complete model:
ing set, producing a model component.
• if k = 1 and it is not the last iteration:
2. The model component is added to the complete
f ∗ = rk fk + (1 − rk ) f 0
model and the training data is modified.
• if k = 1 and it is the last iteration:
3. Unless stopping criteria are met, go to step 1 and re- f ∗ = fk , terminate
peat, using the modified training data.
• if k > 1 and it is not the last iteration:
This structure is somewhat similar to boosting – a model f 0 ← rk fk + (1 − rk ) f 0
component is trained, added to the complete model and the • if k > 1 and it is the last iteration:
next one is trained using a modified training data. How- f 0 ← fk , terminate
ever, SSR does not do boosting. The difference is in step 2,
i.e. how is the new component combined with the other 5. Adjust target values yk+1 = yk −r
1−r
k f k (x)
k
components of the complete model and how is the training
data modified. 6. k ← k + 1, repeat from step 2 unless a stopping con-
For the first iteration, the target values are not modi- dition is met.
fied and the new component simply becomes the complete 7. return f ∗
model as at the time it is the only component. In the sub-
sequent iterations, SSR utilizes the Geometric Semantic where f 0 is the placeholder. The nesting happens in the
Crossover (GSX) from Geometric Semantic Genetic Pro- third case in step 4, where the placeholder is replaced with
gramming (GSGP) [10] to add the new component to the the GSX of the new function and the placeholder again.
complete model. GSX performs a convex combination of
two functions: 2.4 SSR as Top-Level Linear Combination
∗ 0
f (x) = r f (x) + (1 − r) f (x) (1) In previous subsection we have briefly described the SSR
procedure and how it constructs the models via GSX. Now
where r is a random number from the interval [0, 1), and f we rewrite the procedure in such a way that the result
and f 0 are the two combined functions. is a linear combination of expressions as in MGGP and
In SSR, the GSX is employed in a different way than Boost-GP.
to combine two known functions. Only one of the two By iterating Equation 1 (with added subscripts to indi-
functions is known (the component from the previous it- cate the iteration number) and expanding the multiplica-
eration, i.e. f ). SSR therefore performs an “incomplete” tion at each step, we can see how the model looks like as
GSX, meaning that f 0 is only a placeholder for a function if it was finished at that iteration, i.e. in the second and
to be found in the next iteration using a modified training fourth case of step 4 of the SSR procedure from the previ-
data. The training data is derived by the following reason- ous subsection (for the sake of simplicity, let r̄k = 1 − rk ):
ing. The result of the GSX should optimally produce zero
error, i.e. f ∗ (x) = y. Substituting this into Equation 1 and
slightly rearranging it, the following equation is obtained k=1: f1∗ = f1
k=2: f2∗ = r1 f1 + r̄1 f2
y − r f (x) k=3: f3∗ = r1 f1 + r̄1 (r2 f2 + r̄2 f3 )
f 0 (x) = . (2)
1−r = r1 f1 + r̄1 r2 f2 + r̄1 r̄2 f3 (3)
k=4: f4∗ = r1 f1 + r̄1 r2 f2 + r̄1 r̄2 (r3 f3 + r̄3 f4 )
This is, in effect, a requirement on how the outputs of f 0
= r1 f1 + r̄1 r2 f2 + r̄1 r̄2 r3 f3 + r̄1 r̄2 r̄3 f4
should look like, i.e. it is a definition of its target values for
..
the next iteration. Therefore the problem is transformed .
From the above iteration we can see that the model MGGP This algorithm represents an algorithm that pro-
in fact is a linear combination of several expressions. We duces models composed of multiple components but
can also see a pattern of the multiplicative coefficients: which are all evolved simultaneously.
each time a component fk is added to the model, the co-
efficient of component fk−1 is multiplied by rk−1 and the 1G The same as MGGP but the individuals are limited
coefficient of the new component fk is the coefficient of to a single gene, making it effectively a Scaled Symbolic
component fk−1 multiplied by r̄k−1 . This allows for ef- Regression algorithm [11] (not to be confused with Se-
ficient implementation using a list of expressions and a quential Symbolic Regression that is discussed in this arti-
corresponding list of their coefficients, just multiplying a cle, which has the same abbreviation SSR). In order to pro-
number in the second list and adding an element to the end vide similar complexity, the tree depth limit is increased
of both lists. This view also allows further manipulation to allow for at least the same number of nodes as MGGP.
of the coefficients and model components in a way similar
to MGGP, e.g. performing an additional linear regression. SSR-GP, SSR-1G Sequential Symbolic Regression algo-
rithm (see Section 2.3) with ordinary GP and 1G (see
above) as the base algorithm respectively.
3 Boosting-GP
nSSR-GP, nSSR-1G Similar to SSR-GP and SSR-1G but
In the previous section we briefly introduced Sequential
using the normalization and denormalization as described
Symbolic Regression algorithm (SSR) which utilizes Ge-
in [9].
ometric Semantic Crossover to sequentially combine the
components of the complete model and to derive modified
training data for subsequent iterations. We propose an al- SSR with final fitting All the SSR-based algorithms men-
ternative approach that is almost identical to the boosting tioned above (coded identically but then suffixed with the
scheme (see Section 2.2), hence we call it Boost-GP. word “fit”, e.g. “SSR-GP fit”) are also used in such way
The procedure matches that of boosting we already de- that after the algorithm completes, the coefficients of the
scribed, except for a few details. In Boost-GP, we do individual components are recalculated using linear re-
not start with a constant model, instead we start with no gression in the same manner as is used in MGGP.
prior model at all and the first component is found using
GP too. The procedure that finds the individual compo- Boost-GP, Boost-1G Boosting-type sequential approach
nents is a GP-based algorithm. If the underlying algorithm (see Section 3) with ordinary GP and 1G (see above) as the
produces a linear combination of several expressions (as base algorithm respectively.
MGGP does), those are treated as several components and
are added to the model at the same time, summing possible
4.2 Algorithm settings
constant offsets. Other than these details, the procedure is
exactly the same as that of boosting. The description of settings of the algorithms (except for
LR which has no settings) follows. MGGP, SSR and Boost
algorithms are set to produce a model of 10 components
4 Experiments with the maximum depth of 10. That means that MGGP
has the gene number limit set to 10 and SSR and Boost
We perform a series of experiments with the aim to com- algorithms are run for 10 iterations. The stopping crite-
pare the performance of a number of algorithms both rion for MGGP is a wall-clock time limit of 1800 s, for
in terms of model accuracy and complexity. First we de- SSR and Boost algorithms it is the number of 10 itera-
scribe the algorithms used in the experiments and their set- tions of which each has a wall-clock time limit of 180 s,
tings, then we describe the datasets and finally we describe therefore all algorithms have an equal amount of time
the experimental protocol. available to find a good solution. The 1G algorithm pro-
duces only one component and in order to have roughly
the same amount of nodes available, the maximum depth
4.1 Algorithms
is set to 16. 1G is also run for 1800 s. Other settings which
For the experimental evaluation we selected a number are common for the algorithms are described in Table 1.
of algorithms and their variants based on SSR, boosting
with GP, MGGP and linear regression as a baseline. 4.3 Datasets
LR The first algorithm that serves purely as a baseline We use four real-world datasets freely available from the
is an ordinary least-squares linear regression on the prob- UCI repository [12]. The datasets are described in the
lem’s features. Its purpose is to frame all the other algo- following paragraphs. Summary information about these
rithms to a well established context. datasets is in Table 2.
parameter value score.1 The model complexity is measured as the number
of nodes in the model. The coefficients of the linear com-
population size 800 bination that combines the components of the model is not
# of elites 10 % of population counted towards this number. These two measures (per-
formance and complexity) are then aggregated over the 25
tournament size 4 % of population
runs of each algorithm for each dataset.
prob. of mutation 0.2 All runs were performed on machines of identical con-
prob. of crossover 0.7 figuration that were part of the MetaCentrum grid (see Ac-
knowledgements).
constant/subtree mutation
0.3/0.7
ratio
σ of constant mutation 10
5 Results
a + b, a − b, a · b, In this section we present the results of the experiments.
function set √ a , ea , |a|, sin a, a2 As described in Section 4.4, we focus on the performance
1+b2
(R2 ) the complexity (number of nodes) of the models.
Table 1: Common algorithm settings. Test-set performance along with complexity is depicted
in Figure 1. The result is displayed in the form of two
crossing lines whose intersection is at the median R2 and
# of samples median number of nodes, and they stretch from 1st to 3rd
dataset # of dimensions
training testing quartile in both the measures. SSR algorithms of the same
ASN 5 1052 451 configuration except for final fitting are plotted in the same
CCS 8 721 309 color for easier visual assessment of the effect of final fit-
ENC 8 537 231 ting.
ENH 8 537 231
5.1 Discussion
Table 2: Summary information on the datasets used in
the experiments. The division into training and testing From Figure 1 it is clear that MGGP produces the small-
samples comes from a random 70:30 split of the original est models for all four datasets. We hypothesize that this
dataset. is caused by the fact that all the components are evolved
together so they can complement each other while in the
other algorithms (except for 1G) each component is forced
ASN Airfoil self-noise (ASN) is a 5-dimensional dataset to evolve on its own. When the component is being
describing acoustic pressure for various airfoils during evolved, it doesn’t “know” about the other components so
wind tunnel tests. the evolution tries to solve the current problem with the
single component and it produces a big component as it
CCS Concrete compressive strength (CCS) is tries to do that. When it is time for the next component, the
an 8-dimensional dataset describing compressive strength previous one is fixed and it cannot get smaller anymore.
of concrete based on its ingredients. In terms of test-set performance, MGGP, Boost-GP and
Boost-1G are the best performers overall, with (n)SSR-1G
fit being close. On the other hand, SSR-GP and nSSR-GP
ENC, ENH Energy efficiency of cooling/heating (ENC,
are, overall, the worst performers, performing even worse
ENH) are 8-dimensional datasets describing energy effi-
than LR in some cases. Even though the Boost algorithms
ciency of cooling and heating buildings.
are competitive in terms of performance, they are in line
with the SSR algorithms regarding the number of nodes,
4.4 Experimental protocol which is much larger than that of MGGP or 1G.
The SSR algorithms, especially the variants without fi-
Each dataset is randomly split into training/testing subset nal fitting, are performing rather poorly compared to the
25 times. Each algorithm is run once for each split of each other algorithms except for 1G and LR. It can be seen
dataset, i.e. 25 runs per algorithm per dataset in total. For that the final fitting improves the performance consider-
each of the 25 runs of an algorithm on a dataset a different ably. We hypothesize that this is caused by the fact that
seed for the random number generator used by the algo- the original coefficients are chosen randomly and from a
rithm (except for LR which is fully deterministic). bounded interval. Even though the target values are modi-
During the algorithm run, only the training set is used. fied in such way that the result should be close to optimum
After the run completes, the models produced by the al- if the new component models the new target values well,
gorithms are evaluated on the testing set. Performance N 2
1 R2 = 1 − ∑i=0 (yi −ŷi )
is measured using the coefficient of determination, or R2 ∑Ni=0 (yi −ȳ)
2
1G Boost-1G SSR-1G nSSR-GP fit 1G Boost-1G SSR-1G nSSR-GP fit
MGGP SSR-GP SSR-1G fit nSSR-1G MGGP SSR-GP SSR-1G fit nSSR-1G
Boost-GP SSR-GP fit nSSR-GP nSSR-1G fit Boost-GP SSR-GP fit nSSR-GP nSSR-1G fit
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
R2
R2
0.5 0.5
0.4 0.4
0.3 0.3
400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 1600
no. of nodes no. of nodes
(a) ASN (b) CCS
1G Boost-1G SSR-1G nSSR-GP fit 1G Boost-1G SSR-1G nSSR-GP fit
MGGP SSR-GP SSR-1G fit nSSR-1G MGGP SSR-GP SSR-1G fit nSSR-1G
Boost-GP SSR-GP fit nSSR-GP nSSR-1G fit Boost-GP SSR-GP fit nSSR-GP nSSR-1G fit
1.00
0.98
0.98
0.96 0.96
0.94 0.94
R2
R2
0.92 0.92
0.90
0.90
0.88
0.88
0.86
200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400
no. of nodes no. of nodes
(c) ENC (d) ENH
Figure 1: Complexity-performance plot of the final models for all four datasets. The horizontal lines are vertically placed
at the median R2 value and stretch in the horizontal direction from 1st to 3rd quartile of the number of nodes. The vertical
lines are horizontally placed at the median number of nodes and stretch in the vertical direction from 1st to 3rd quartile of
R2 . The intersections show the point of median number of nodes and median R2 . The closer the lines are to the upper left
corner of the plot, the better (i.e. well performing and small) the models are. The dotted line shows the median R2 of LR
and the gray band stretches from 1st to 3rd quartile of R2 of LR.
the contribution of previous component can be multiplied linear scaling can considerably reduce the error, especially
by a very small number which makes the new target values if the inner structure of the component is good but it only
not much “easier” than for previous components. Also, lacks such scaling.
looking at Equation 3, each component is multiplied by a
number of values, each smaller than or equal to 1 and the
later the component is added the more such multipliers it
has and therefore it contributes less and less to the final The final interesting pattern is that the normalization in
model. However, SSR-1G (and its fitted version) perform SSR has almost no effect in terms of performance when
much better. The reason is that with 1G, the component the base algorithm is 1G, and quite inconsistent effect
is scaled by a factor, mitigating the just described issue to when the base algorithm is GP. The reason of the small
some extent. effect with 1G is again the fact that 1G introduces its
Another notable pattern is that algorithms using 1G as own multiplicative and additive constant so the normaliza-
the base algorithm perform considerably better than those tion and denormalization coefficients become redundant to
using GP, which is to be expected since even the simple some extent.
5.2 Overfitting and underfitting 6.1 Future Work
Figure 2 displays the results on training and testing set side This article investigated the sequential algorithms in their
by side. With a few exceptions, it can be said that no algo- basic form only, using a predefined number of components
rithm suffered from overfitting as the testing performance and a predefined scheme of the evolution of the individual
values are about as much smaller than the training ones components. However, the sequential algorithms naturally
as is exhibited by the linear regression. The most notable lend themselves to tuning and tweaking. One of possi-
exceptions are the Boost algorithms, especially on ASN ble approaches is not to switch components regularly but
and CCS datasets. Boost-1G produced two significant out- based on some other scheme, e.g. the performance of the
liers in both datasets and the other testing values are also current component. Another approach worth investigating
generally smaller than the training ones. It is best seen in might be using full MGGP as the base algorithm in the se-
the CCS dataset where Boost-1G is the best performer by quential algorithms which would mean that multiple com-
training performance but the testing performance is signif- ponents will be produced at each stage, not just one. This
icantly2 worse than the training performance. would allow for a “continuum” of algorithms with MGGP
With the exception of nSSR-GP fit on the CCS dataset, on one side as a totally non-sequential algorithm, and SSR
all the SSR algorithms have, overall, the smallest differ- or boosting-inspired algorithms as described in this article
ence in training and testing performance. In some cases on the other side, as totally sequential algorithms.
the testing performance is even better than the training per- Another area worth exploring is the interplay of the
formance. This could indicate potential underfitting, either complexity of individual components or limits on their
of the whole algorithm or the individual components. size, the number of components, and the amount of com-
putation resources available for a single component. The
classical boosting works with so-called weak learners,
6 Conclusions and Future Work i.e. models which by themselves are not capable of solv-
ing the problem to any reasonable degree, and uses many
In this article we reviewed the idea of sequential approach of such models. A similar approach could be used for the
to symbolic regression. We reviewed an existing approach, sequential approaches reviewed in this article, by setting
called Sequential Symbolic Regression, which is unique tighter limits on the complexity and/or computational re-
by using Geometric Semantic Crossover to combine the sources for the evolution of each component.
individual model components and derive training data for
subsequent component evolution.
We have shown that the models SSR produces are ef- Acknowledgements
fectively a linear combination of a number of compo-
nents, similar to MGGP, but constructed by a different ap- The reported research was supported by the Czech Science
proach. We have also proposed a simple boosting-inspired Foundation grant No. 17-01251. Access to computing and
approach which uses the residuals directly as the train- storage facilities owned by parties and projects contribut-
ing target values for the next component and combines the ing to the National Grid Infrastructure MetaCentrum, pro-
components just by adding them together. vided under the programme “Projects of Large Research,
We have performed a series of experiments with the al- Development, and Innovations Infrastructures ” (CESNET
gorithms, using different base algorithms of GP and 1G, LM2015042), is greatly appreciated.
and different variants of SSR, most importantly with final
fitting of the coefficients with linear regression. The re-
References
sults have shown that MGGP, a non-sequential algorithm,
is the best or one of the best algorithms on each dataset
[1] John R. Koza. Genetic Programming: On the Pro-
and producing by far the smallest models. Of the sequen-
gramming of Computers by Means of Natural Se-
tial algorithms, the SSR is, overall, the worst, except for
lection. MIT Press, Cambridge, MA, USA, 1992.
the variants using final fitting and 1G as the base, which
ISBN 0-262-11170-5. URL http://mitpress.
are comparable to the boosting-inspired algorithms on two
mit.edu/books/genetic-programming.
of the four datasets.
Except for the boosting-inspired algorithms in some [2] Mark Hinchliffe, Hugo Hiden, Ben McKay, Mark
cases, no algorithm experienced overfitting, which might Willis, Ming Tham, and Geoffery Barton. Modelling
be a useful information for some users, and it also shows chemical process systems using a multi-gene genetic
that big and complex models don’t necessarily mean over- programming algorithm. In Late Breaking Paper,
fitting. GP’96, pages 56–65, Stanford, USA, 1996.
[3] Dominic P Searson, David E Leahy, and Mark J
Willis. GPTIPS: an open source genetic program-
ming toolbox for multigene symbolic regression. In
2 Mann-Whitney U test with significance level α = 0.01. Proceedings of the International MultiConference of
1.0 1.0
0.8 0.8
0.6 0.6
R2
R2
0.4 0.4
0.2
0.0 0.2
LR
1G
MGGP
Boost-GP
Boost-1G
SSR-GP
SSR-GP fit
SSR-1G
SSR-1G fit
nSSR-GP
nSSR-GP fit
nSSR-1G
nSSR-1G fit
LR
1G
MGGP
Boost-GP
Boost-1G
SSR-GP
SSR-GP fit
SSR-1G
SSR-1G fit
nSSR-GP
nSSR-GP fit
nSSR-1G
nSSR-1G fit
(a) ASN; two outlier values in testing Boost-1G at -206 and -2.85 (b) CCS; two outlier values in testing Boost-1G at -127797 and
are not shown. -85.6 and one value in testing nSSR-GP fit at -2.91 are not shown.
1.00 1.00
0.95 0.95
0.90 0.90
R2
R2
0.85 0.85
0.80 0.80
LR
1G
MGGP
Boost-GP
Boost-1G
SSR-GP
SSR-GP fit
SSR-1G
SSR-1G fit
nSSR-GP
nSSR-GP fit
nSSR-1G
nSSR-1G fit
LR
1G
MGGP
Boost-GP
Boost-1G
SSR-GP
SSR-GP fit
SSR-1G
SSR-1G fit
nSSR-GP
nSSR-GP fit
nSSR-1G
nSSR-1G fit
(c) ENC (d) ENH
Figure 2: Performance box plots of the final models for all four datasets. Each algorithm has two box plots, the left
displays training set performance, the right one displays the testing set performance. In plots for ASN and CCS datasets
some outlier values are not shown in in the plot for the sake of clarity.
Engineers and Computer Scientists, volume 1, pages 2015. ISBN 978-3-319-16030-6. doi: 10.1007/
77–80, March 2010. 978-3-319-16030-6_5. URL https://doi.org/
10.1007/978-3-319-16030-6_5.
[4] Dominic P. Searson. GPTIPS 2: An Open-Source
Software Platform for Symbolic Data Mining, pages [9] L. O. V. B. Oliveira, F. E. B. Otero, L. F. Mi-
551–573. Springer International Publishing, Cham, randa, and G. L. Pappa. Revisiting the sequen-
2015. ISBN 978-3-319-20883-1. doi: 10.1007/ tial symbolic regression genetic programming. In
978-3-319-20883-1\_22. URL http://dx.doi. 2016 5th Brazilian Conference on Intelligent Sys-
org/10.1007/978-3-319-20883-1_22. tems (BRACIS), pages 163–168, Oct 2016. doi:
10.1109/BRACIS.2016.039.
[5] Robert E. Schapire. The strength of weak learnabil-
ity. Machine Learning, 5(2):197–227, 1990. ISSN [10] Alberto Moraglio, Krzysztof Krawiec, and Col-
1573-0565. doi: 10.1007/BF00116037. URL http: inG. Johnson. Geometric semantic genetic pro-
//dx.doi.org/10.1007/BF00116037. gramming. In CarlosA.Coello Coello, Vin-
cenzo Cutello, Kalyanmoy Deb, Stephanie For-
[6] Yoav Freund and Robert E. Schapire. A decision- rest, Giuseppe Nicosia, and Mario Pavone, edi-
theoretic generalization of on-line learning and an tors, Parallel Problem Solving from Nature - PPSN
application to boosting. Journal of Computer and XII, volume 7491 of Lecture Notes in Computer
System Sciences, 55(1):119–139, 1997. ISSN 0022- Science, pages 21–31. Springer Berlin Heidelberg,
0000. doi: doi:10.1006/jcss.1997.1504. URL http: 2012. ISBN 978-3-642-32936-4. doi: 10.1007/
//dx.doi.org/10.1006/jcss.1997.1504. 978-3-642-32937-1\_3. URL http://dx.doi.
org/10.1007/978-3-642-32937-1_3.
[7] Trevor Hastie, Robert Tibshirani, and Jerome Fried-
man. The Elements of Statistical Learning: Data [11] Maarten Keijzer. Scaled symbolic regression.
Mining, Inference, and Prediction. Springer-Verlag Genetic Programming and Evolvable Ma-
New York, 2 edition, 2009. ISBN 978-0-387-84858- chines, 5(3):259–269, 2004. ISSN 1573-7632.
7. doi: 10.1023/B:GENP.0000030195.77571.f9.
URL http://dx.doi.org/10.1023/B:
[8] Luiz Otávio V.B. Oliveira, Fernando E.B. Otero, GENP.0000030195.77571.f9.
Gisele L. Pappa, and Julio Albinati. Sequential Sym-
bolic Regression with Genetic Programming, pages [12] K. Bache and M. Lichman. UCI machine learning
73–90. Springer International Publishing, Cham, repository, 2013. http://archive.ics.uci.edu/ml.