<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Combining Feature and Algorithm Hyperparameter Selection using some Metalearning Methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Miguel Viana Cachada</string-name>
          <email>mcachada@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Salisu Mamman Abdulrahman</string-name>
          <email>salisu.abdul@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Brazdil</string-name>
          <email>pbrazdil@inesctec.pt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LIAAD - INESC TEC/Faculdade de Ci</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIAAD - INESC TEC/Faculdade de Economia, Universidade do Porto</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>encias da Universidade do Porto and Kano University of Science and Technology Wudil</institution>
          ,
          <addr-line>Kano State</addr-line>
          ,
          <country country="NG">Nigeria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Machine learning users need methods that can help them identify algorithms or even work ows (combination of algorithms with preprocessing tasks, using or not hyperparameter con gurations that are di erent from the defaults), that achieve the potentially best performance. Our study was oriented towards average ranking (AR), an algorithm selection method that exploits meta-data obtained on prior datasets. We focused on extending the use of a variant of AR* that takes A3R as the relevant metric (combining accuracy and run time). The extension is made at the level of diversity of the portfolio of workows that is made available to AR. Our aim was to establish whether feature selection and di erent hyperparameter con gurations improve the process of identifying a good solution. To evaluate our proposal we have carried out extensive experiments in a leave-one-out mode. The results show that AR* was able to select work ows that are likely to lead to good results, especially when the portfolio is diverse. We additionally performed a comparison of AR* with Auto-WEKA, running with different time budgets. Our proposed method shows some advantage over Auto-WEKA, particularly when the time budgets are small.</p>
      </abstract>
      <kwd-group>
        <kwd>Average Ranking</kwd>
        <kwd>Selection of Classi cation Algorithms</kwd>
        <kwd>Combining Feature and Algorithm Selection</kwd>
        <kwd>Hyperparameters Con guration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Users of machine learning systems are facing the problem of how to choose a
combination of data processing tools and algorithms. The goal is usually
dened as maximizing or minimizing some quantitative measure. In classi cation
problems, the goal could be optimizing the classi cation accuracy, the lift score
or the ROC area (AUC). A typical practical data mining task consists of many
sub-tasks which result in an extremely large search space that could be very time
consuming for humans to explore manually. These sub-tasks correspond, mostly,
to the work ow phases highlighted in Fig. 1: preprocessing (e.g., feature
selection), algorithm selection and parametrization. Therefore strategies and methods
are needed that can help the users to suggest or select an optimized data mining
solution.</p>
      <p>Machine learning workflow
Data transformation</p>
      <p>Model configuration</p>
      <p>Data
extraction</p>
      <p>Cleansing</p>
      <p>Preprocessing</p>
      <p>Algorithm
Selection</p>
      <p>Hyperparameters</p>
      <p>Model
Evaluation</p>
      <p>Model
Deployment</p>
      <p>
        Our work aimed to use average ranking, a very simple algorithm selection
method [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and study how the performance of this method is a ected by a more
diversi ed portfolio of work ows. These, in addition to using algorithm default
con gurations (as in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), include work ows where the application of classi cation
algorithms is preceded by a feature selection method and/or alternative
hyperparameter con gurations are used. It is expected that average ranking would
perform better with a larger portfolio. On the other hand, it can be argued that
this does not come without a cost, in terms of resources and time.
      </p>
      <p>
        However, with the emergence of on-line sources such as OpenML [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ], the
results from applying di erent work ows to a wide range of datasets are
becoming available. Our work can be used as a proof of concept that average ranking
serves as a good baseline recommendation method with which alternative
methods could be compared. With that objective in mind we performed a small-scale
comparison between average ranking and Auto-WEKA [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] and present those
results here.
      </p>
      <p>
        The remainder of this paper is organized as follows: in Section 2 we present
an overview of existing work in related areas. Section 3 describes the average
ranking method with a focus on the variants of ranking methods that
incorporate both accuracy and run time in the evaluation strategy. Section 4 provides
the experimental results and an empirical comparison with Auto-WEKA [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ].
Section 5 presents conclusions and discusses future work.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In this paper we address a particular case of the algorithm selection problem,
oriented towards the selection of classi cation algorithms, that has been
thoroughly investigated over the last 25 years. One approach to algorithm
selection/recommendation relies on metalearning. The simplest method uses just
performance results on di erent datasets in the form of rankings. Some
commonly used measures of performance are accuracy, AUC or A3R that combines
accuracy and run time [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The rankings are then aggregated to obtain a single
aggregated ranking. The aggregated ranking can be used as a simple model to
identify the top algorithms to be used. This strategy is sometimes referred to as
the T op-N strategy [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        A more advanced approach, often considered as the classical metalearning
approach, uses, in addition to performance results, a set of measures that
characterize datasets [
        <xref ref-type="bibr" rid="ref28 ref32 ref5">28, 5, 32</xref>
        ]. Other approaches exploit estimates of performance
based on past tests in so-called active testing method for algorithm selection
[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>
        The main idea of feature subset selection (FS) is to remove redundant or
irrelevant features from the dataset as they can lead to a reduction of the
classi cation accuracy or clustering quality and to an unnecessary increase of
computational cost. Feature selection and dimensionality reduction approaches are
discussed extensively in literature [
        <xref ref-type="bibr" rid="ref14 ref27">27, 14</xref>
        ]. Many practical studies were performed
to evaluate these techniques on di erent elds, for example: e-mail ltering and
drug discovery [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], e-mail spam detection [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], bioinformatics [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] and
healthcare [
        <xref ref-type="bibr" rid="ref7 ref8">8, 7</xref>
        ]. In some of these studies it was observed that dimensionality reduction
techniques improved the classi ers accuracy while others conclude the opposite.
It can be observed that the usefulness of such techniques is likely to be
dependent on the type of data and problem. A question arises whether the inclusion
of such work ows in the given portfolio improves the overall quality of search
for the best solution.
      </p>
      <p>
        The choice of algorithm hyperparameters has been typically formalized as
an optimization problem, being the objective function the same metric used
to evaluate the performance of the corresponding algorithm. A simple method
is grid search, which consists in exhaustively searching within a prede ned set
of hyperparameter values. Another method, random search, was introduced by
Bergstra and Bengio [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to overcome the potential large computational cost of
grid search. Some classes of algorithms, like neural networks, allow for the use
of gradient descent for hyperparameter optimization [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Classical heuristics for
optimized search have also been suggested, for example, genetic algorithms [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ],
tabu search [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and particle swarm optimisation [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ].
      </p>
      <p>
        An approach that has recently been attracting attention for its good results
is Bayesian optimization. It consists in iteratively tting a probabilistic model
as each hyperparameter combination is tested. The aim is that this model gives
good suggestions on what combinations should be tried next. Some methods to
construct this function, optimizers, have been suggested, namely, SMAC -
Sequential Model-based Algorithm Con guration [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], Spearmint [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] and TPE
Tree Parzen Estimator [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. An optimizer benchmarking framework and
comparison of the three methods is presented in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Auto-WEKA [
        <xref ref-type="bibr" rid="ref20 ref34">20, 34</xref>
        ] is a tool designed to help novice users of ML by
automatically searching through the joint space of WEKA learning algorithms and
their respective hyperparameter settings to maximize a given performance
measure (for instance accuracy, AUC, etc.) by using a SMAC [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] optimizer.
      </p>
      <p>
        Other algorithm selection tools include auto-sklearn [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] that, apart from
using a SMAC optimizer, uses classical metalearning techniques, namely, dataset
similarity through metafeatures and automatic ensemble construction, ASlib [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
a benchmark library for algorithm selection containing 17 algorithm selection
scenarios from six di erent areas with a focus on (but not limited to) constraint
satisfaction problems, AutoFolio [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] that uses SMAC to automatically
determine a well-performing algorithm selection approach and its hyperparameters
for a given algorithm selection data and Leveraging Learning to Automatically
Manage Algorithms (LLAMA) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], an R package for algorithm portfolios and
selection.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Overview of the AR</title>
    </sec>
    <sec id="sec-4">
      <title>Method</title>
      <p>
        This section presents a brief review of the average ranking method that is often
used in comparative studies in the machine learning literature. This method can
be regarded as a variant of Borda's method [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. For each dataset the algorithms
are ordered according to the performance measure chosen (e.g., predictive
accuracy) and assigned ranks. Among popular ranking criteria we nd, for instance,
success rates, AUC, and signi cant wins [
        <xref ref-type="bibr" rid="ref21 ref6 ref9">6, 9, 21</xref>
        ]. The best algorithm is assigned
j
rank 1, the runner-up is assigned rank 2, and so on. Let rAk be the rank of
algorithm k on dataset j. In this work we use average ranks, which is obtained
using
0 D
      </p>
      <p>1</p>
      <p>X rAk A</p>
      <p>j
j=1
n
(1)
where n is the number of datasets. The nal ranking is obtained by ordering the
average ranks and assigning ranks to the individual algorithms accordingly.</p>
      <p>Average ranking represents a useful method for deciding which algorithm
should be used. It would normally be followed on a new, target dataset: rst the
algorithm with rank 1 is evaluated, then the one with rank 2 and so on. In each
step the better one is maintained as the potentially best option. In this context,
average ranking can be referred to as the recommended ranking.
3.1</p>
      <sec id="sec-4-1">
        <title>Average Ranking AR* that gives preference to fast tests</title>
        <p>
          Ranking methods use a particular performance measure to construct an ordering
of algorithms. Some commonly used measures of performance are accuracy, AUC
or A3R that combines accuracy and run time [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Although other measures could
have been used instead, here we focus on the ranking that exploits a combined
measure of accuracy and run time, A3R. As the authors of [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] have shown, this
leads to good results when loss time curves [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] are used in the evaluation. Here
we use the following formulation for this measure:
        </p>
        <p>A3Rdairef ;aj =</p>
        <p>SRdaij</p>
        <p>
          SRdairef
(Tadji =Tadrief )P
(2)
where SRdaij and SRdairef represent the success rates (accuracies) of algorithms aj
and aref on dataset di, where aref represents a given reference algorithm. Instead
of accuracy, AUC or another measure can be used as well. Similarly, Tadji and Tadrief
represent the run times of the algorithms, in seconds. As the average ranking
method does not require pairwise comparisons, the values of SRdairef and Tadrief
can be set to 1. Hence, we can use the simpli ed formula A3R0aj = SRdaij =(Tadji )P ,
as in [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ].
        </p>
        <p>
          To trade o the importance of time, the denominator is raised to the power of
P, while P is usually some small number, such as 1=64, representing in e ect, the
64th root. This is motivated by the observation that run times vary much more
than accuracies. It is not uncommon that one particular algorithm is several
orders of magnitude slower (or faster) than another. Obviously, we do not want
the time ratios to completely dominate the equation. If we take P = 1=N (i.e.
N th root), we get a number that goes to 1 when P is approaching 0. In our
experiments described in Section 4, we follow [
          <xref ref-type="bibr" rid="ref2 ref30">30, 2</xref>
          ] and choose P = 1=64, as
it was shown to lead to better results than some other settings. This version of
average ranking is referred to as AR*.
3.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluation using loss time curves</title>
        <p>
          Our aim was to investigate the impact of using feature selection and di erent
hyperparameter con gurations on average ranking method. The results are
presented in the form of loss time curves [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] which show how the performance loss
depends on time. The loss is calculated as the di erence between the
performance of the algorithm identi ed using the proposed ranking method and the
ideal choice (which is the best performance known, identi ed within the largest
set of performance results available in our study). The individual loss time curves
are aggregated into a mean loss time curve1. The mean loss time curve can be
characterized by a number representing the mean loss in a given interval (MIL).
This characteristic is similar to AUC, but there is an important di erence. When
talking about AUCs, the values fall in the 0-1 interval. Our loss time curves span
the interval Tmin - Tmax which is speci ed by the user. Typically the user only
worries about run times when they exceed a minimum value. In the experiments
here we have set Tmin to 10 seconds. The value of Tmax was set to 104 seconds,
i.e. about 2.78 hours.
1 Another possibility would be to use median loss time curves. These could be
accompanied by 25% and 75% percentile bands.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>The Methodology Adopted and Experimental Results</title>
      <sec id="sec-5-1">
        <title>Overview of the methodology adopted</title>
        <p>Our rst aim is to study the e ect of using feature selection with given
classication algorithms when using the A3R-based average raking method. We
accomplished this by constructing portfolios of algorithms that are preceded or
not by feature selection. We then performed a similar analysis with portfolios of
work ows that also include di erent hyperparameter con gurations for some of
the algorithms.</p>
        <p>The work ows are evaluated over each of the 37 datasets we use (refer to
Table 4 in the Appendix). In each study, loss time curves and MIL values are
generated using leave-one-out (LOO) cross-validation, where 36 datasets are
used to generate the model (i.e., average ranking) and the corresponding loss
time curve on the dataset left out.</p>
        <p>The average ranking is followed sequentially. The AR* method uses the
concept of current best (abest). When this method comes to testing algorithm ai in
the ranking, the test is carried out using 10-fold cross validation (CV). If the
performance of algorithm ai is better than that of abest, ai is used as the new
abest. This way we obtain one loss time curve per dataset in every LOO cycle.
The individual loss time curves are used to generate the mean loss time curve.
Using LOO cross-validation helps to gain con dence that the average ranking
can be e ectively transferred to new datasets and produce satisfactory outcomes.</p>
        <p>Note that the loss is computed in reference to the best work ow overall for
the dataset left out, which is identi ed within the largest portfolio of algorithms
available. In our study, this portfolio includes all variants that take di erent
hyperparameter con gurations into consideration, with and without feature
selection.</p>
        <p>
          All experiments were performed using Weka software [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and its algorithms
implementations.
4.2
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>E ect of feature selection</title>
        <p>
          To study the e ect of feature selection, we use a total of 124 work ows. The
setting for the experiment uses two di erent portfolios. The rst one contains 62
classi cation algorithms with their default hyperparameter settings, while the
second includes work ows consisting of Correlation Feature Selection (CFS) [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]
followed by one of the 62 classi cation algorithms.
        </p>
        <p>Fig. 2 shows the results of 3 di erent variants of the A3R-based average
ranking method. The variant AR* uses just our set of classi cation algorithms
with no feature selection method and default hyperparameter con gurations.
AR*+FS+A is the variant where our algorithms are always preceded by CFS.
The variant AR* FS+A uses all 124 algorithm con gurations.</p>
        <p>Overall, the variant AR* FS+A yields the smallest MIL value (Table 1).
However, the MIL of the variant AR* is very close and, in addition, both loss
time curves are quite similar (Fig. 2).
0
1e+0
1e+1
1e+4
1e+5</p>
        <p>Regarding AR*+FS+A, although the algorithms preceded by feature
selection are, in general, faster than their counterparts, as they deal with datasets
with fewer features, their accuracy tends to be negatively a ected. In Fig. 2 it
can be observed that the AR*+FS+A loss time curve is able to compete with the
other two variants up to approximately the 6 second mark, but eventually reaches
its limit in accuracy improvement. This happens because the set of algorithms
used does not include the ones with performance closer to the best available.
In our study, AR*+FS+A provides the best accuracy for just 4 datasets, while
AR* FS+A achieves the best accuracy in 13 datasets.
4.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>E ect of diversity of hyperparameter con gurations</title>
        <p>We have additionally experimented with several versions of some of the
algorithms in our algorithm portfolio. Each version was associated with a particular
con guration of the corresponding hyperparameters. The extended portfolio
included 3 versions of Multilayer Perceptron, 7 of Support Vector Machines (with
polynomial and radial basis function kernels), 7 of Random Forests, 8 of J48 and
8
7
)
s
iton 6
p
e
tag 5
n
e
c
rpe 4
(
s
s
Lo 3
y
c
a
rcu 2
c
A
1</p>
        <p>AR*</p>
        <p>AR* + Hyp + A
AR* ± FS + Hyp + A
5 of K-Nearest Neighbors, totalling 30 additions (refer to Tables 5 and 6 in the
Appendix for more information regarding the algorithms used).</p>
        <p>To study the e ect of di erent algorithm parametrizations we again de ne
AR* as the baseline portfolio and compare it with AR*+Hyp+A that includes
the 30 extra algorithm versions mentioned above and the variant AR* FS+Hyp+A,
which contemplates also feature selection.</p>
        <p>AR* FS+Hyp+A is the portfolio with the largest number of work ows, 184,
corresponding to 62 algorithms with default con gurations, 30 versions with
alternative hyperparameter con gurations and the same 92 previous con gurations
preceded by feature selection.
1e+1
1e+2 1e+3</p>
        <p>Time (seconds)
1e+4
1e+5</p>
        <p>From Table 2 it can be observed that, similarly as in the previous study
(Section 4.2), the variant with more diversity, AR* FS+Hyp+A in this case,
is the one with the lowest MIL. However, when compared with AR*+Hyp+A,
we note that the di erence in terms of MIL is not very high and also, when
observing both loss time curves (Fig. 3), we cannot say that one is better than
the other. In fact, up to the 100 seconds mark AR* FS+Hyp+A appears to be
worse than AR*+Hyp+A. In our view, this is because, in many cases, average
ranking includes di erent variants of the same algorithm in a close succession.
This may lead to a loss of time spent on testing variants that do not result in
any gain. Nevertheless, the variant AR* FS+Hyp+A is the most complete set
of algorithm alternatives and includes all best possible accuracies achievable per
dataset, hence its loss time curve eventually reaches 0.</p>
        <p>When compared with AR*, the other two portfolios allow to clearly achieve
better loss time curves and lower MIL, con rming that a portfolio that is richer
in algorithm versions has increased the chances of providing better
recommendations. Our results also suggest, however, that it is desirable to provide methods
that allow to construct portfolios that include all important variants, but exclude
others, in order to avoid losing time testing many "similar" alternatives.
4.4</p>
      </sec>
      <sec id="sec-5-4">
        <title>Comparison between A3R-based AR and Auto-WEKA</title>
        <p>
          For the purpose of evaluating the e ectiveness of our proposed method in
practice, we have decided to carry out similar experiments using Auto-WEKA [
          <xref ref-type="bibr" rid="ref20 ref34">20,
34</xref>
          ]. An empirical study was done by comparing the accuracy obtained by each
system for the same run time. The Auto-WEKA runs were made using
crossvalidation with 10 folds: reference accuracy results from the average of all folds;
reference run time is the sum of training and testing time, measured using all
data to train and test the model.
        </p>
        <p>Regarding the Auto-WEKA experiment, a required parameter is time limit,
which sets a time budget for it to run. The actual run time is, however, somewhat
di erent from the one de ned in the parameter, so it needs to be measured.
After running, Auto-WEKA outputs one or more recommendations for the model
con gurations (that always include an algorithm and its hyperparameters and
may or may not include a feature selection procedure). For the purpose of this
empirical study, we used only the rst Auto-WEKA recommendation for each
dataset. The train and test time spent needs to be added to the given budget. In
this study we have used four time budgets: 5, 15, 30 and 60 minutes. The steps
to compare both methods are detailed below. They refer to one dataset but the
same steps were repeated for all datasets.
1. Auto-WEKA is run for a prede ned time budget. Its actual run time (search
time) is measured and the recommended con guration is recorded.
2. The recommended con guration is run within a WEKA Experiment class,
using cross-validation with 10 folds. The predictive accuracy for the
recommended algorithm returned by Auto-WEKA is obtained.
3. The Auto-WEKA model run time is measured from the tasks of training and
testing the recommended con guration over all data.
4. Auto-WEKA total run time is computed by adding the search time to the
recommended model run time. The sum of the two times is used to retrieve
the actual performance of our system.</p>
        <p>One way to compare the overall performances of the algorithm
recommendation methods is to count the number of data sets on which a particular method is
the overall winner. Table 3 shows the aggregated results for the four Auto-WEKA
time budgets, where Win refers to the number of cases where AR* FS+Hyp+A
has higher accuracy than Auto-WEKA. This is a simple comparison that does
not consider the statistical signi cance test of the di erences. Still, it can be
observed that AR* FS+Hyp+A has more wins for lower Auto-WEKA time
budgets.
In this paper we have presented a study that exploits the average ranking method
AR* that gives preference to well-performing work ows which are also fast to
test. Our portfolios combine algorithm selection with feature selection and a
set of hyperparameter con gurations. The portfolio that uses the most diverse
number of work ow con gurations (AR* FS+Hyp+A) achieved the best
results. The second runner-up was AR*+Hyp+A with quite similar performance.
These results con rm that it is indeed important to incorporate
hyperparameter con gurations into the portfolio, as it can lead to improved performance.
Although the addition of extra variants to the given portfolio could, in
principle, slow down the process of identifying the best or the near best solution, the
usage of AR* mitigates this adverse e ect. It opts for fast and good performing
work ows before the others.</p>
        <p>Additionally, we have compared A3R-based average ranking against
AutoWEKA and showed that the proposed method competes quite well with the
latter, especially when smaller time budgets are used.</p>
      </sec>
      <sec id="sec-5-5">
        <title>Future work</title>
        <p>Our future plan is to devise a method of pruning portfolios of work ows with
the aim to avoid investing time in testing the variants of algorithms that are of
similar nature and have similar performance. Another alternative line that could
be followed could explore the approaches based on active testing (AT) and/or
surrogate models, whose aim is to select the most promising candidate to test.</p>
        <p>
          It would be desirable to use a much larger portfolio to guarantee that the
potentially best solutions are not really left out. We could collect more test
results with alternative feature selection methods and preprocessing methods,
as well as hyperparameter con gurations, or even reuse a larger set of results
already available through OpenML [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ].
        </p>
        <p>
          We could also extend the comparison with Auto-WEKA to higher time
budgets, as well as use other currently available methods to perform further
comparisons (e.g., auto-sklearn [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]).
        </p>
        <p>We are also considering the investigation of approaches that allow to take
into account the costs (run time) of o -line tests used to gather the meta-data
that our proposed method exploits. Their cost could be set to some fraction
of the cost of on-line test (i.e. tests on a new dataset), but not really ignored
altogether.</p>
        <p>Acknowledgements The authors wish to express our gratitude to the following
institutions which have provided funding to support this work:
{ Federal Government of Nigeria Tertiary Education Trust Fund under the
TETFund 2012 AST$D Intervention for Kano University of Science and
Technology, Wudil, Kano State, Nigeria for PhD Overseas Training;
{ Project NanoSTIMA: Macro-to-Nano Human Sensing: Towards Integrated
Multimodal Health Monitoring and Analytics/NORTE-01-0145-FEDER-000016,
which is nanced by the North Portugal Regional Operational Programme
(NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and
through the European Regional Development Fund (ERDF).
SMO PolyKernel
SMO RBFKernel
IBK
MultilayerPerceptron H = [a, o]
*Parameters in bold are the default.
**Adapted from WEKA documentation.</p>
        <p>Parameter
interval*</p>
        <p>Parameter description**</p>
        <p>Size of each bag, as a percentage of the training
P = [80, 100] set size.</p>
        <p>Number of attributes to randomly investigate,
K = [0, 250] where 0 = int(log2(#attributes)+1).</p>
        <p>
          M = [
          <xref ref-type="bibr" rid="ref1 ref10">1, 10</xref>
          ] Minimum number of instances per leaf .
        </p>
        <p>Con dence threshold for pruning (smaller values
C = [0.01, 0.25, 0.5] incur more pruning).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abdulrahman</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brazdil</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Measures for Combining Accuracy and Time for Meta-learning</article-title>
          .
          <source>In: Meta-Learning and Algorithm Selection Workshop at ECAI 2014</source>
          . pp.
          <volume>49</volume>
          {
          <issue>50</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Abdulrahman</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brazdil</surname>
            , P., van Rijn,
            <given-names>J.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanschoren</surname>
          </string-name>
          , J.:
          <article-title>Speeding up algorithm selection using average ranking and active testing by introducing runtime</article-title>
          .
          <source>Machine learning Special Issue on Metalearning and Algorithm Selection</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bergstra</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bardenet</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kegl</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Algorithms for hyper-parameter optimization</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          . pp.
          <volume>2546</volume>
          {
          <issue>2554</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bischl</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kerschke</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kottho</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lindauer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malitsky</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frechette</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoos</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leyton-Brown</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tierney</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , et al.:
          <article-title>Aslib: A benchmark library for algorithm selection</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>237</volume>
          ,
          <fpage>41</fpage>
          {
          <fpage>58</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Brazdil</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giraud-Carrier</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soares</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vilalta</surname>
          </string-name>
          , R.: Metalearning:
          <article-title>Applications to data mining</article-title>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Brazdil</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soares</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Da</surname>
            <given-names>Costa</given-names>
          </string-name>
          ,
          <string-name>
            <surname>J.P.</surname>
          </string-name>
          :
          <article-title>Ranking learning algorithms: Using ibl and meta-learning on accuracy and time results</article-title>
          .
          <source>Machine Learning</source>
          <volume>50</volume>
          (
          <issue>3</issue>
          ),
          <volume>251</volume>
          {
          <fpage>277</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Chu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chou</surname>
            ,
            <given-names>K.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bandettini</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Initiative</surname>
            ,
            <given-names>A.D.N.</given-names>
          </string-name>
          :
          <article-title>Does feature selection improve classi cation accuracy? Impact of sample size and feature selection on classi cation using anatomical magnetic resonance images</article-title>
          .
          <source>Neuroimage</source>
          <volume>60</volume>
          (
          <issue>1</issue>
          ),
          <volume>59</volume>
          {
          <fpage>70</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Cuingnet</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chupin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benali</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colliot</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Spatial and anatomical regularization of SVM for brain image analysis.</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          pp.
          <volume>460</volume>
          {
          <issue>468</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Demsar</surname>
          </string-name>
          , J.:
          <article-title>Statistical Comparisons of Classi ers over Multiple Data Sets</article-title>
          .
          <source>The Journal of Machine Learning Research 7</source>
          ,
          <issue>1</issue>
          {
          <fpage>30</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Eggensperger</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feurer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergstra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snoek</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoos</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , LeytonBrown, K.:
          <article-title>Towards an empirical foundation for assessing bayesian optimization of hyperparameters</article-title>
          .
          <source>In: NIPS workshop on Bayesian Optimization in Theory and Practice</source>
          . pp.
          <volume>1</volume>
          {
          <issue>5</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Feurer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eggensperger</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Springenberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blum</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutter</surname>
          </string-name>
          , F.:
          <article-title>E cient and robust automated machine learning</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          . pp.
          <volume>2962</volume>
          {
          <issue>2970</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Gansterer</surname>
            ,
            <given-names>W.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janecek</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumayer</surname>
          </string-name>
          , R.:
          <article-title>Spam ltering based on latent semantic indexing</article-title>
          .
          <source>In: Survey of Text Mining II</source>
          , pp.
          <volume>165</volume>
          {
          <fpage>183</fpage>
          . Springer (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Gomes</surname>
            ,
            <given-names>T.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prudncio</surname>
            ,
            <given-names>R.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soares</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carvalho</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Combining meta-learning and search techniques to select parameters for support vector machines</article-title>
          .
          <source>Neurocomputing</source>
          <volume>75</volume>
          (
          <issue>1</issue>
          ),
          <volume>3</volume>
          {
          <fpage>13</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Guyon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elissee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An introduction to variable and feature selection</article-title>
          .
          <source>In: Journal of machine learning research</source>
          . pp.
          <volume>1157</volume>
          {
          <fpage>1182</fpage>
          . JMLR. (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfahringer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reutemann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          :
          <article-title>The WEKA Data Mining Software: An Update</article-title>
          .
          <source>ACM SIGKDD explorations newsletter 11(1)</source>
          ,
          <volume>10</volume>
          {
          <fpage>18</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Correlation-based feature selection for machine learning</article-title>
          .
          <source>Ph.D. thesis</source>
          , The University of Waikato (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoos</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leyton-Brown</surname>
          </string-name>
          , K.:
          <article-title>Sequential model-based optimization for general algorithm con guration</article-title>
          .
          <source>International Conference on Learning and Intelligent</source>
          Optimization pp.
          <volume>507</volume>
          {
          <issue>523</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Kohavi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , John, G.H.:
          <article-title>Wrappers for feature subset selection</article-title>
          .
          <source>Arti cial intelligence</source>
          .
          <volume>97</volume>
          (
          <issue>1</issue>
          ),
          <volume>273</volume>
          {
          <fpage>324</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Kottho</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Llama: leveraging learning to automatically manage algorithms</article-title>
          .
          <source>arXiv preprint arXiv:1306.1031</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Kottho</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thornton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoos</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leyton-Brown</surname>
          </string-name>
          , K.:
          <article-title>Auto-weka 2.0: Automatic model selection and hyperparameter optimization in weka</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>17</volume>
          ,
          <issue>1</issue>
          {
          <issue>5</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Leite</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brazdil</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Active Testing Strategy to Predict the Best Classi cation Algorithm via Sampling and Metalearning</article-title>
          .
          <source>In: ECAI</source>
          . pp.
          <volume>309</volume>
          {
          <issue>314</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Leite</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brazdil</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanschoren</surname>
          </string-name>
          , J.:
          <article-title>Selecting Classi cation Algorithms with Active Testing</article-title>
          .
          <source>In: Machine Learning and Data Mining in Pattern Recognition</source>
          , pp.
          <volume>117</volume>
          {
          <fpage>131</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Rank aggregation methods</article-title>
          .
          <source>WIREs Computational Statistics</source>
          <volume>2</volume>
          ,
          <issue>555</issue>
          {
          <fpage>570</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Lindauer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoos</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Autofolio: An automatically con gured algorithm selector</article-title>
          .
          <source>Journal of Arti cial Intelligence Research</source>
          <volume>53</volume>
          ,
          <volume>745</volume>
          {
          <fpage>778</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Maclaurin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duvenaud</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adams</surname>
            ,
            <given-names>R.P.</given-names>
          </string-name>
          :
          <article-title>Gradient-based hyperparameter optimization through reversible learning</article-title>
          .
          <source>In: Proceedings of the 32nd International Conference on Machine Learning</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26. de Miranda, P.B., Prud^encio, R.B.,
          <string-name>
            <surname>de Carvalho</surname>
            ,
            <given-names>A.C.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soares</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Combining a multi-objective optimization approach with meta-learning for svm parameter selection</article-title>
          .
          <source>In: In Systems, Man, and Cybernetics (SMC)</source>
          . pp.
          <volume>2909</volume>
          {
          <fpage>2914</fpage>
          . IEEE. (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Molina</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belanche</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nebot</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Feature selection algorithms: a survey and experimental evaluation</article-title>
          .
          <source>In: Proceedings</source>
          . 2002 IEEE International Conference on. pp.
          <volume>306</volume>
          {
          <fpage>313</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Pfahringer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bensusan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giraud-Carrier</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Tell me who can learn you and I can tell you who you are: Landmarking various learning algorithms</article-title>
          .
          <source>In: Proceedings of the 17th International Conference on Machine Learning</source>
          . pp.
          <volume>743</volume>
          {
          <issue>750</issue>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Reif</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shafait</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dengel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Meta-learning for evolutionary parameter optimization of classi ers</article-title>
          .
          <source>Machine learning 87(3)</source>
          ,
          <volume>357</volume>
          {
          <fpage>380</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30. van Rijn,
          <string-name>
            <given-names>J.N.</given-names>
            ,
            <surname>Abdulrahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.</given-names>
            ,
            <surname>Brazdil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Vanschoren</surname>
          </string-name>
          , J.:
          <article-title>Fast Algorithm Selection using Learning Curves</article-title>
          .
          <source>In: Advances in Intelligent Data Analysis XIV</source>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Saeys</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inza</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Larraaga</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A review of feature selection techniques in bioinformatics</article-title>
          .
          <source>bioinformatics</source>
          <volume>23</volume>
          (
          <issue>19</issue>
          ),
          <volume>2507</volume>
          {
          <fpage>2517</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Smith-Miles</surname>
            ,
            <given-names>K.A.</given-names>
          </string-name>
          :
          <article-title>Cross-disciplinary Perspectives on Meta-Learning for Algorithm Selection</article-title>
          .
          <source>ACM Computing Surveys (CSUR) 41(1)</source>
          , 6:
          <issue>1</issue>
          {6:
          <issue>25</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Snoek</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adams</surname>
            ,
            <given-names>R.P.</given-names>
          </string-name>
          :
          <article-title>Practical bayesian optimization of machine learning algorithms</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>2951</volume>
          {
          <issue>2959</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Thornton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoos</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leyton-Brown</surname>
          </string-name>
          , K.:
          <article-title>Auto-WEKA: Combined selection and hyperparameter optimization of classi cation algorithms</article-title>
          .
          <source>In: In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <volume>847</volume>
          {
          <fpage>855</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Vanschoren</surname>
            , J., van Rijn,
            <given-names>J.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bischl</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torgo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Openml: networked science in machine learning</article-title>
          .
          <source>ACM SIGKDD Explorations Newsletter</source>
          <volume>15</volume>
          (
          <issue>2</issue>
          ),
          <volume>49</volume>
          {
          <fpage>60</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>