<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The experiment database for machine learning (Demo)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Joaquin Vanschoren</string-name>
          <email>joaquin@liacs.nl</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>We demonstrate the use of the experiment database for machine learning, a community-based platform for the sharing, reuse, and in-depth investigation of the thousands of machine learning experiments executed every day. It is aimed at researchers and practitioners of data mining techniques, and is publicly available at http://expdb.cs.kuleuven.be. This demo gives a handson overview of how to share novel experimental results, how to integrate the database in existing data mining toolboxes, and how to query the database through an intuitive graphical query interface.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Experimentation is the lifeblood of machine learning (ML) research.
A considerable amount of effort and resources are invested in
assessing the usefulness of new algorithms, finding the optimal approach
for new applications or just to gain some insight into, for instance, the
effect of a parameter. Yet in spite of all these efforts, experimental
results are often discarded or forgotten shortly after they are obtained,
or at best averaged out to be published, which again limits their
future use. If we could collect all these ML experiments in a central
resource and make them publicly available in an organized
(searchable) fashion, the combined results would provide a highly detailed
picture of the performance of algorithms on a wide range of data
configurations, speeding up ML research.</p>
      <p>In this paper, we demonstrate a community-based platform
designed to do just this: the experiment database for machine
learning. First, experiments are automatically transcribed in a common
language that captures the exact experiment setup and all details
needed to reproduce them. Then, they are uploaded to pre-designed
databases where they are stored in an organized fashion: the results
of every experiment are linked to the exact underlying components
(such as the algorithm, parameter settings and dataset used) and thus
also integrated with all prior results. Finally, to answer any
question about algorithm behavior, we only have to write a query to the
database to sift through millions of experiments and retrieve all
results of interest. As we shall demonstrate, many kinds of questions
can be answered in one or perhaps a few queries, thus enabling fast
and thorough analysis of large numbers of collected results. The
results can also be interpreted unambiguously, as all conditions under
which they are valid are explicitly stored.</p>
    </sec>
    <sec id="sec-2">
      <title>Meta-learning</title>
      <p>
        Instead of being purely empirical, these experiment databases also
store known or measurable properties of datasets and algorithms.
For datasets, this can include the number of features, statistical and
information-theoretic properties [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and landmarkers [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], while
algorithms can be tagged by model properties, the average ratio of bias
or variance error, or their sensitivity to noise [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>As such, all empirical results, past and present, are immediately
linked to all known theoretical properties of algorithms and datasets,
providing new grounds for deeper analysis. For instance, algorithm
designers can include these properties in queries to gain precise
insights on how their algorithms are affected by certain kinds of data
or how they relate to other algorithms.
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Overview of benefits</title>
      <p>We can summarize the benefits of this platform as follows:
Reproducibility The database stores all details of the experimental
setup, resulting in truly reproducible research.</p>
      <p>Reference All experiments, including algorithms and datasets, are
automatically organized in one resource, creating an overview of
the state-of-the-art, and a useful ‘map’ of all known approaches,
their properties, and their performance. This also includes
negative results, which usually do not get published.</p>
      <p>Querying When faced with a question on the performance of
learning algorithms, e.g., ‘What is the effect of the training set size on
runtime?’, we can answer it in seconds by writing a query, instead
of spending days (or weeks) setting up new experiments.
Moreover, we can draw upon many more experiments, on many more
algorithms and datasets, than we can afford to run ourselves.
Reuse It saves time and energy, as previous experiments can be
readily reused. For instance, when benchmarking a new algorithm,
there is no need to benchmark the older algorithms over and over
again as well: their evaluations are likely stored online, and can
simply be downloaded.</p>
      <p>Larger studies Studies covering many algorithms, parameter
settings and datasets are very expensive to run, but could become
much more feasible if a large portion of the necessary experiments
are available online. Even when all the experiments have yet to be
run, the automatic storage and organization of experimental
results markedly simplifies conducting such large scale
experimentation and thorough analysis thereof.</p>
      <p>Visibility By using the database, users may learn about (new)
algorithms they were not previously aware of.</p>
      <p>Standardization The formal description of experiments may
catalyze the standardization of experiment design, execution and
exchange across labs and data mining tools.</p>
      <p>The remainder of this paper is organized as follows. Sect. 2
outlines how we constructed our pilot experiment database and the
underlying models and languages that enable the free exchange of
experiments. In Sect. 3, we demonstrate how it can be used to quickly
discover new insights into a wide range of research questions and to
verify prior studies. Sect. 4 concludes.
interface (API)</p>
      <p>share
ExpML
files</p>
      <p>DM platforms /
algorithms
Exposé
Ontology</p>
    </sec>
    <sec id="sec-4">
      <title>Framework description</title>
      <p>In this section, we outline the design of this collaborative framework,
outlined in Fig. 1. We first establish a controlled vocabulary for data
mining experimentation in the form of an open ontology (Expose´),
before mapping it to an experiment description language (called
ExpML) and an experiment database (ExpDB). These three elements
(boxed in Fig. 1) will be discussed in the next three subsections. Full
versions of the ontologies, languages and database models discussed
below will be available on http://expdb.cs.kuleuven.be.</p>
      <p>Experiments are shared (see Fig. 1) by entering all experiment
setup details and results through the framework’s interface (API),
which exports them as ExpML files or directly streams them to an
ExpDB. Any data mining platform or custom algorithm can thus use
this API to add a ‘sharing’ feature that publishes new experiments.
The ExpDB can be set up locally, e.g., for a single person or a single
lab, or globally, a central database open to submissions from all over
the world. Finally, the bottom of the figure shows different ways to
tap into the stored information:
Querying. Querying interfaces allow researchers to formulate
questions about the stored experiments, and immediately get all results
of interest. We currently offer various such interfaces, including
graphical ones (see Sect. 2.3.2).</p>
      <p>
        Mining. A second use is to automatically look for patterns in
algorithm performance by mining the stored evaluation results and
theoretical meta-data. These meta-models can then be used, for
instance, in algorithm recommendation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
2.1
      </p>
    </sec>
    <sec id="sec-5">
      <title>The Expose´ Ontology</title>
      <p>The Expose´ ontology describes the concepts and the structure of data
mining experiments. It establishes an unambiguous and
machineinterpretable (semantic) vocabulary, through which experiments can
be automatically shared, organized and queried. We will also use it
to define a common experiment description language and database
models, as we shall illustrate below. Ontologies can be easily
extended and refined, which is a key concern since data mining and
machine learning are ever-expanding fields.
2.1.1</p>
      <sec id="sec-5-1">
        <title>Collaborative Ontology Design</title>
        <p>
          Several other useful ontologies are being developed in parallel:
OntoDM [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is a top-level ontology for data mining concepts, EXPO
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] models scientific experiments, DMOP [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] describes learning
algorithms (including their internal mechanisms and models) and
workflows, and the KD ontology [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and eProPlan ontology [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
describe large arrays of DM operators, including information about
their use to support automatic workflow planning.
        </p>
        <p>To streamline ontology development, a ‘core’ ontology was
defined, and an open ontology development forum was created: the
Data Mining Ontology (DMO) Foundry2. The goal is to make the
ontologies interoperable and orthogonal, each focusing on a
particular aspect of the data mining field. Moreover, following best practices
in ontology engineering, we reuse concepts and relationships from
established top-level scientific ontologies: BFO,3 OBI,4 IAO,5 and
RO.6 We often use subproperties, e.g. implements for concretizes,
and runs for realizes, to reflect common usage in the field. Expose´ is
designed to integrate or be similar to the above mentioned ontologies,
but focusses on aspects related to experimental evaluation.
2.1.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Top-level View</title>
        <p>Fig. 2 shows Expose´’s high-level concepts and relationships. The full
arrows symbolize is-a relationships, meaning that the first concept
is a subclass of the second, and the dashed arrows symbolize other
common relationships. The most top-level concepts are reused from
the aforementioned top-level scientific ontologies, and help to
describe the exact semantics of many data mining concepts. For
instance, when speaking of a ‘data mining algorithm’, we can
semantically distinguish an abstract algorithm (e.g., C4.5 in pseudo-code),
a concrete algorithm implementation (e.g., WEKA’s J48
implementation of C4.5), and a specific algorithm setup, including parameter
settings and subcomponent setups. The latter may include other
algorithm setups, e.g. for base-learners in ensemble algorithms, as well as
mathematical functions such as kernels, distance functions and
evaluation measures. A function setup details the implementation and
parameter settings used to evaluate the function.</p>
        <p>An algorithm setup thus defines a deterministic function which can
be directly linked to a specific result: it can be run on a machine given
specific input data (e.g., a dataset), and produce specific output data
(e.g., new datasets, models or evaluations). As such, we can trace
any output result back to the inputs and processes that generated it
(data provenance). For instance, we can query for evaluation results,
and link them to the specific algorithm, implementation or individual
parameter settings used, as well as the exact input data.</p>
        <p>Algorithm setups can be combined in workflows, which
additionally describe how data is passed between multiple algorithms.
Workflows are hierarchical: they can contain sub-workflows, and
algorithm setups themselves can contain internal workflows (e.g., a
crossvalidation setup may define a workflow to train and evaluate learning
algorithms). The level of detail is chosen by the author of an
experiment: a simple experiment may require a single algorithm setup,
while others involve complex scientific workflows.</p>
        <p>Tasks cover different data mining (sub)tasks, e.g., supervised
classification. Qualities are known or measurable properties of
algorithms and datasets (see Sect. 1.1), which are useful to interpret
results afterwards. Finally, algorithms, functions or parameters can
play certain roles in a complex setup: an algorithm can sometimes
act as a base-learner in an ensemble algorithm, and a dataset can act
as a training set in one experiment and as a test set in the next.
2 The DMO Foundry: http://dmo-foundry.org
3 The Basic Formal Ontology (BFO): http://www.ifomis.org/bfo
4 The Ontology for Biomedical Investigations (OBI): http:
//obi-ontology.org
5 The Information Artifact Ontology (IAO): http://bioportal.</p>
        <p>bioontology.org/ontologies/40642
6 The Relation Ontology (RO): http://www.obofoundry.org/ro</p>
        <p>thing
An experiment tries to answer a question (in exploratory settings) or
test a hypothesis by assigning certain values to these input variables.</p>
      </sec>
      <sec id="sec-5-3">
        <title>It has experimental variables: independent variables with a range of</title>
        <p>possible values, controlled variables with a single value, or
dependent variables, i.e., a monitored output. The experiment design (e.g.,
full factorial) defines which combinations of input values are used.</p>
        <p>One experiment run may generate several workflow runs (with
different input values), and a workflow run may consist of smaller
algorithm runs. Runs are triples consisting of input data, a setup and
output data. Any sub-runs, such as the 10 algorithm runs within a
10-fold CV run, could also be stored with the exact input data (folds)
and output data (predictions). Again, the level of detail is chosen by
the experimenter. Especially for complex workflows, it might be
interesting to afterwards query the results of certain sub-runs.
2.2</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>ExpML: A Common Language</title>
      <p>Returning to our framework in Fig. 1, we now use this ontology to
define a common language to describe experiments. The most
straightforward way to do this would be to describe experiments in Expose´,
export them in RDF7 and store everything in RDF databases
(triplestores). However, such databases are still under active development,
and many researchers are more familiar with XML and relational
databases, which are also widely supported by many current data
mining tools. Therefore, we will also map the ontology to a
simple XML-based language, ExpML, and a relational database schema.
Technical details of this mapping are outside the scope of this paper.
Below, we show a small example of ExpML output to illustrate our
modeling of data mining workflows.</p>
      <sec id="sec-6-1">
        <title>7 Resource Description Framework: http://www.w3.org/RDF</title>
        <p>
          Fig. 3 shows a workflow run in ExpML, executed in WEKA [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and
exported through the aforementioned API, and a schematic
representation is shown in Fig. 4. The workflow has two inputs: a dataset
URL and parameter settings. It also contains two algorithm setups:
the first loads a dataset from the given URL, and then passes it to
a cross-validation setup (10 folds, random seed 1). The latter
evaluates a Support Vector Machine (SVM) implementation, using the
given parameter settings, and outputs evaluations and predictions.
Note that the workflow is completely concretized: all parameter
settings and implementations are fixed. The bottom of Figure 3 shows
the workflow run and its two algorithm sub-runs, each pointing to the
setup used. Here, we chose not to output the 10 per-fold SVM runs.
        </p>
        <p>The final output consists of Evaluations and Predictions. As
shown in the ExpML code, these have a predefined structure so
that they can be automatically interpreted and organized. Evaluations
contain, for each evaluation function (as defined in Expose´), the
evaluation value and standard deviation. They can also be labeled, as for
the per-class precision results. Predictions can be probabilistic, with
a probability for each class, and a final prediction for each instance.
For storing models, we can use existing formats such as PMML.
url
par</p>
        <p>Weka. Weka.</p>
        <p>ARFFLoader data Evaluation
lpog=R!unsl=ohtcrtauttpeio:/n/.=.. data pp==!! SF==110
2:loadData logRuns=true</p>
        <p>3:crossValidate
1:mainFlow
data 8 data</p>
        <p>Weka.Instances</p>
        <p>Weka.SMO</p>
        <p>Weka.RBF
p=! C=0.01 p=! G=0.01
logRuns=false f(x) 5:kernel
4:learner</p>
        <p>evaluations
eval
pred predictions
6 Evaluations
7 Predictions
eval
evalu</p>
        <p>ations
pred
predictions
The final step in our framework (see Fig. 1) is organizing all this
information in searchable databases such that it can be retrieved,
rearranged, and reused in further studies. This is done by
collecting ExpML descriptions and storing all details in a predefined
database. To design such a database, we mapped Expose´ to a
relational database model. In this section, we offer a brief overview of
the model to help interpret the queries in the remainder of this paper.
2.3.1</p>
        <sec id="sec-6-1-1">
          <title>Anatomy of an Experiment Database</title>
          <p>Fig. 5 shows the most important tables, columns and links of
the database model. Runs are linked to their input- and
output data through the join tables InputData and OutputData,
and data always has a source run, i.e., the run that generated
it. Runs can have parent runs, and a specific Setup: either a
Workflow or AlgorithmSetup, which can also be
hierarchical. AlgorithmSetups and FunctionSetups can have
ParameterSettings, a specific Implementation and a
general Algorithm or Function. Implementations and
Datasets can also have Qualities, stored in Algorithm
Quality and DataQuality, respectively. Data, runs and setups
have unique id’s, while algorithms, functions, parameters and
qualities have unique names defined in Expose´.
2.3.2</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>Accessing the Experiment Database</title>
          <p>The experiment database is available at http://expdb.cs.
kuleuven.be. A graphical query interface is provided (see the
examples below) that hides the complexity of the database, but still
supports most types of queries. In addition, it is possible to run standard
SQL queries (a library of example queries is available. Several video
tutorials help the user to get started quickly. We are currently
updating the database, query interface and submission system, and a public
submission interface for new experiments (described in ExpML) will
be available shortly.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Example Queries</title>
      <p>In this section, we illustrate the use of the experiment database.8 In
doing this, we aim to take advantage of the theoretical information
stored with the experiments to gain deeper insights.
3.1</p>
    </sec>
    <sec id="sec-8">
      <title>Comparing Algorithms</title>
      <p>To compare the performance of all algorithms on one specific dataset,
we can plot the outcomes of cross-validation (CV) runs against the
algorithm names. In the graphical query interface, see Fig. 6, this
can be done by starting with the CrossValidation node, which will
be connected to the input Dataset, the outputted Evaluations and the
underlying Learner (algorithm setup). Green nodes represent data,
blue nodes are setups and white nodes are qualities (runs are
hidden). By clicking a node it can be expanded to include other parts of
the workflow setup (see below). For instance, ‘Learner’ expands into
the underlying implementation, parameter settings, base-learners and
sub-functions (e.g. kernels). By clicking a node one can also add a
selection (in green, e.g. the used learning algorithm) or a constraint
(in red, e.g. a preferred evaluation function). The user is always given</p>
      <sec id="sec-8-1">
        <title>8 See [12] for a much more extensive list of possible queries</title>
        <p>a list of all available options, in this case a list of all evaluation
functions present in the database. Here, we choose a specific input dataset
and a specific evaluation function, and we aim to plot the evaluation
value against the used algorithm.</p>
        <p>Running the query returns all known experiment results, which are
scatterplotted in Fig. 7, ordered by performance. This immediately
provides a complete overview of how each algorithm performed.
Because the results are as general as allowed by the constraints written
in the query, the results on sub-optimal parameter settings are shown
as well (at least for those algorithms whose parameters were varied),
clearly indicating the performance variance they create. As expected,
ensemble and kernel methods are dependent on the selection of the
correct kernel, base-learner, and other parameter settings. Each of
them can be explored by adding further constraints.
For instance, we can examine the effect of the used kernel, or even
the parameters of a given kernel. Building on our first query, we zoom
in on these results by adding two constraints: the algorithm should be
an SVM9 and contain an RBF kernel. Next, we select the value of the
‘gamma’ parameter (kernel width) of that kernel. We also relax the
constraint on the dataset by including three more datasets, and ask
for the number of features in each dataset.</p>
        <p>
          The result is shown in Fig. 10. First, note that much of the
variation seen for SVMs on the ‘letter’ dataset (see Fig. 7) is indeed
explained by the effect of this parameter. We also see that its effect on
other datasets is markedly different: on some datasets, performance
increases until reaching an optimum and then slowly declines, while
on other datasets, performance decreases slowly up to a point, after
which it quickly drops to default accuracy, i.e., the SVM is simply
predicting the majority class. This behavior seems to correlate with
the number of features in each dataset (shown in brackets). Further
study shows that some SVM implementations indeed tend to overfit
on datasets with many attributes [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
3.3
        </p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Preprocessing Effects</title>
      <p>The database also stores workflows with preprocessing methods, and
thus we can investigate their effect on the performance of learning
9 Alternatively, we could ask for a specific implementation, i.e.,
‘implemen</p>
      <p>tation=weka.SMO’.</p>
      <p>
        Querying the performance of SVMs with different kernel widths on datasets of different dimensionalities.
algorithms. For instance, when querying for workflows that include
a downsampling method, we can draw learning curves by plotting
learning performance against sample size. Fig. 9 shows the query:
a preprocessing step is added and we query for the resulting
number of instances, and the performance of a range of learning
algorithms (with default parameter settings). The result is shown in Fig.
11. From these results, it is clear that the ranking of algorithm
performances depends on the size of the sample: the curves cross. While
logistic regression is initially stronger than C4.5, the latter keeps
improving when given more data, confirming earlier analysis [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Note
that RandomForest performs consistently better for all sample sizes,
that RacedIncrementalLogitBoost crosses two other curves, and that
HyperPipes actually performs worse when given more data, which
suggests that its initially higher score was largely due to chance.
3.4
      </p>
    </sec>
    <sec id="sec-10">
      <title>Bias-Variance Profiles</title>
      <p>
        The database also stores a series of algorithm properties, many of
them calculated based on large numbers of experiments. One
interesting algorithm property is its bias-variance profile. Because the
database contains a large number of bias-variance decomposition
experiments, we can give a realistic numerical assessment of how
capable each algorithm is in reducing bias and variance error. Fig. 13
shows, for each algorithm, the proportion of the total error that can
be attributed to bias error, calculated according to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], using default
parameter settings and averaged over all datasets. The simple query
is shown in Fig. 12. The algorithms are ordered from large bias (low
variance), to low bias (high variance). NaiveBayes is, as expected,
one of the algorithms whose error consists primarily of bias error,
whereas RandomTree has relatively good bias management, but
generates more variance error than NaiveBayes. When looking at the
ensemble methods, Fig. 13 shows that bagging is a variance-reduction
method, as it causes REPTree to shift significantly to the left.
Conversely, boosting reduces bias, shifting DecisionStump to the right in
AdaBoost and LogitBoost (additive logistic regression).
3.5
      </p>
    </sec>
    <sec id="sec-11">
      <title>Further queries</title>
      <p>
        These are just a few examples the queries that can be answered using
the database. Other queries allow algorithm comparisons using
multiple evaluation measures, algorithm rankings, statistical significance
tests, analysis of ensemble learners, and especially the inclusion of
many more dataset properties and algorithm properties to study how
algorithms are affected by certain types of data. Please see [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and
the database website for more examples.
4
      </p>
    </sec>
    <sec id="sec-12">
      <title>Conclusions</title>
      <p>Experiment databases are databases specifically designed to collect
all the details on large numbers of experiments, performed and shared
by many different researchers, and make them immediately available
to everyone. They ensure that experiments are repeatable and
automatically organize them such that they can be easily reused in future
studies.</p>
      <p>This demo paper gives an overview of the design of the
framework, the underlying ontologies, and the resulting data exchange
formats and database structures. It discusses how these can be used to
share novel experimental results, to integrate the database in
existing data mining toolboxes, and how to query the database through
an intuitive graphical query interface. By design, the database also
calculates and stores a wide range of known or measurable
properties of datasets and algorithms. As such, all empirical results, past
and present, are immediately linked to all known theoretical
properties of algorithms and datasets, providing new grounds for deeper
analysis. This results in a great resource for meta-learning and its
applications.</p>
    </sec>
    <sec id="sec-13">
      <title>Acknowledgements</title>
      <p>We acknowledge the support of BigGrid, the Dutch e-Science Grid,
supported by the Netherlands Organisation for Scientific Research,
NWO. We like to thank Larisa Soldatova and Pance Panov for many
fruitful discussions on ontology design.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P</given-names>
            <surname>Brazdil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C</given-names>
            <surname>Giraud-Carrier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C</given-names>
            <surname>Soares</surname>
          </string-name>
          , and R Vilalta, 'Metalearning: Applications to data mining', Springer, (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>MA</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E</given-names>
            <surname>Frank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G</given-names>
            <surname>Holmes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P</given-names>
            <surname>Reutemann</surname>
          </string-name>
          , and IH Witten, '
          <article-title>The WEKA data mining software: An update'</article-title>
          ,
          <source>SIGKDD Explorations</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ),
          <fpage>10</fpage>
          -
          <lpage>18</lpage>
          , (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M</given-names>
            <surname>Hilario</surname>
          </string-name>
          and
          <string-name>
            <given-names>A</given-names>
            <surname>Kalousis</surname>
          </string-name>
          , '
          <article-title>Building algorithm profiles for prior model selection in knowledge discovery systems'</article-title>
          ,
          <source>Engineering Intelligent Systems</source>
          ,
          <volume>8</volume>
          (
          <issue>2</issue>
          ),
          <fpage>956</fpage>
          -
          <lpage>961</lpage>
          , (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M</given-names>
            <surname>Hilario</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Kalousis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>A</given-names>
            <surname>Woznica</surname>
          </string-name>
          ,
          <article-title>'A data mining ontology for algorithm selection and meta-mining'</article-title>
          ,
          <source>Proceedings of the ECML-PKDD'09 Workshop on Service-oriented Knowledge Discovery</source>
          ,
          <fpage>76</fpage>
          -
          <lpage>87</lpage>
          , (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J</given-names>
            <surname>Kietz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F</given-names>
            <surname>Serban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Bernstein</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S</given-names>
            <surname>Fischer</surname>
          </string-name>
          , '
          <article-title>Towards cooperative planning of data mining workflows'</article-title>
          ,
          <source>Proceedings of the ECML-PKDD'09 Workshop on Service-oriented Knowledge Discovery</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          , (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R</given-names>
            <surname>Kohavi</surname>
          </string-name>
          and
          <string-name>
            <given-names>D</given-names>
            <surname>Wolpert</surname>
          </string-name>
          , '
          <article-title>Bias plus variance decomposition for zeroone loss functions'</article-title>
          ,
          <source>Proceedings of the International Conference on Machine Learning (ICML)</source>
          ,
          <fpage>275</fpage>
          -
          <lpage>283</lpage>
          , (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D</given-names>
            <surname>Michie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D</given-names>
            <surname>Spiegelhalter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C</given-names>
            <surname>Taylor</surname>
          </string-name>
          , '
          <article-title>Machine learning</article-title>
          ,
          <source>neural and statistical classification'</source>
          ,
          <string-name>
            <surname>Ellis</surname>
            <given-names>Horwood</given-names>
          </string-name>
          , (
          <year>1994</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P</given-names>
            <surname>Panov</surname>
          </string-name>
          , LN Soldatova, and S Dzˇeroski, '
          <article-title>Towards an ontology of data mining investigations'</article-title>
          ,
          <source>Lecture Notes in Artificial Intelligence</source>
          ,
          <volume>5808</volume>
          ,
          <fpage>257</fpage>
          -
          <lpage>271</lpage>
          , (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C</given-names>
            <surname>Perlich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F</given-names>
            <surname>Provost</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          J Simonoff, '
          <article-title>Tree induction vs. logistic regression: A learning-curve analysis'</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>4</volume>
          ,
          <fpage>211</fpage>
          -
          <lpage>255</lpage>
          , (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            B
            <surname>Pfahringer</surname>
            , H
            <given-names>B</given-names>
            ensusan
          </string-name>
          , and
          <string-name>
            <given-names>C</given-names>
            <surname>Giraud-Carrier</surname>
          </string-name>
          , '
          <article-title>Meta-learning by landmarking various learning algorithms'</article-title>
          ,
          <source>Proceedings of the International Conference on Machine Learning (ICML)</source>
          ,
          <fpage>743</fpage>
          -
          <lpage>750</lpage>
          , (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>LN</given-names>
            <surname>Soldatova and RD King</surname>
          </string-name>
          , '
          <article-title>An ontology of scientific experiments'</article-title>
          ,
          <source>Journal of the Royal Society Interface</source>
          ,
          <volume>3</volume>
          (
          <issue>11</issue>
          ),
          <fpage>795</fpage>
          -
          <lpage>803</lpage>
          , (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J</given-names>
            <surname>Vanschoren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H</given-names>
            <surname>Blockeel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          , and G Holmes,
          <article-title>'Experiment databases: A new way to share, organize and learn from experiments'</article-title>
          ,
          <source>Machine Learning</source>
          ,
          <volume>87</volume>
          (
          <issue>2</issue>
          ), (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M</given-names>
            <surname>Zakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P</given-names>
            <surname>Kremen</surname>
          </string-name>
          , F Zelezny´, and
          <string-name>
            <surname>N Lavracˇ</surname>
          </string-name>
          , '
          <article-title>Planning to learn with a knowledge discovery ontology'</article-title>
          ,
          <source>Proceedings of the ICML/UAI/COLT'08 Workshop on Planning to Learn</source>
          ,
          <volume>29</volume>
          -
          <fpage>34</fpage>
          , (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>