<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generating Work ow Graphs Using Typed Genetic Programming</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tomas Kren</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Pilat</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Klara Peskova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman Neruda</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University in Prague, Faculty of Mathematics and Physics</institution>
          ,
          <addr-line>Malostranske nam. 25, Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Computer Science, Academy of Sciences of the Czech Republic</institution>
          ,
          <addr-line>Pod Vodarenskou vez 2, 18207 Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In this paper we further develop our research line of chaining several
preprocessing methods with classi ers [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], by generating the complete work ow
schemes. These schemes, represented as directed acyclic graphs (DAGs), contain
computational intelligence methods together with preprocessing algorithms and
various methods of combining them into ensembles.
      </p>
      <p>We systematically generate trees representing work ow DAGs using typed
genetic programing initialization designed for polymorphic and parametric types.
Terminal nodes of a tree correspond to the nodes of the DAG, where each node
contains a computational intelligence method. Function nodes of a tree represent
higher-order functions combining several DAGs in a serial or parallel manner.
We use types to distinguish between input data (D) and predictions (P) so the
generated trees represent meaningful work ows. In order to make the method
general enough to handle methods like k-means (where k a ects the topology of
the DAG) correctly, we had to use the polymorphic type \list of s of size n"
([ ]n) with a natural number parameter n and an element type parameter . The
generating method systematically produces work ow DAGs from simple ones to
more complex and larger ones, working e ciently with symmetries.</p>
      <p>
        To demonstrate our rst results we have chosen the winequality-white [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
and wilt [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] datasets from the UCI repository. They both represent medium
size classi cation problems. The nodes of the work ow DAG contain three types
of nodes; they can be preprocessing nodes (type D ! D) { k-Best (it selects k
features most correlated with the target) or principal component analysis (PCA),
or classi er nodes (D ! P ) { gaussian nave Bayes (gaussianNB), support
vector classi cation (SVC), logistic regression (LR) or decision trees (DT). The
last type of nodes implements ensemble methods { there is a copy node and a
k-means node, which divides the data into clusters by the k-means algorithm
(both D ! [D]n), and two aggregating nodes { simple voting to combine the
outputs of several methods, and merging for k-means node ([P ]n ! P ).
      </p>
      <p>To provide a baseline, we tested each of the four classi ers separately on
the two selected datasets. The parameters of the classi ers were set using an
extensive grid search with 5-fold cross-validation; the classi ers were compared
using the quadratic weighted kappa metric. Next, we generated more than 65,000
di erent work ows using the proposed approach, and evaluated all of them.
All computational intelligence methods used the default settings, or the tuned
settings of the individual methods (denoted as `default' or `tuned' in Fig. 1c).</p>
      <p>Tomas Kren et al.
The best work ows for the two datasets are presented in Figs. 1a and 1b, and
their numerical results are presented in Fig.1c.</p>
      <p>
        We have demonstrated how the valid work ow DAGs can be easily generated
by a typed genetic programming initialization method. The generated work ows
beat the baseline obtained by the hyper-parameter tuning of single classi er
by a grid search, which is not surprising as the single method is also among
the generated DAGs. On the other hand the work ows do not use any
hyperparameter tuning. In our future work, we will extend this approach to a full
genetic programming solution, which will also optimize the hyper-parameters of
the work ows and we intend to include the method in our multi-agent system
for meta-learning { Pikater [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Acknowledgment
This work was supported by SVV project no. 260 224, GAUK project no. 187 115,
and Czech Science Foundation project no. P103-15-19877S.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kaz k</surname>
          </string-name>
          , O.,
          <string-name>
            <surname>Neruda</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Data mining process optimization in computational multiagent systems</article-title>
          .
          <source>In: Agents and Data Mining Interaction. Volume 9145 of Lecture Notes in Computer Science</source>
          . Springer (
          <year>2015</year>
          )
          <volume>93</volume>
          {
          <fpage>103</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cortez</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cerdeira</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almeida</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matos</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reis</surname>
          </string-name>
          , J.:
          <article-title>Modeling wine preferences by data mining from physicochemical properties</article-title>
          .
          <source>In Decision Support Systems, Elsevier</source>
          <volume>47</volume>
          (
          <issue>4</issue>
          ) (
          <year>2009</year>
          )
          <volume>547</volume>
          {
          <fpage>553</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>B.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tateishi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoan</surname>
          </string-name>
          , N.T.:
          <article-title>A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees</article-title>
          .
          <source>Int. J. Remote Sens</source>
          .
          <volume>34</volume>
          (
          <issue>20</issue>
          ) (
          <year>October 2013</year>
          )
          <volume>6969</volume>
          {
          <fpage>6982</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Peskova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sm d</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Pilat</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaz k</surname>
          </string-name>
          , O.,
          <string-name>
            <surname>Neruda</surname>
          </string-name>
          , R.:
          <article-title>Hybrid multi-agent system for metalearning in data mining</article-title>
          . In Vanschoren, J.,
          <string-name>
            <surname>Brazdil</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soares</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kottho</surname>
          </string-name>
          , L., eds.
          <source>: Proc. of the MetaSel@ECAI 2014. Volume 1201 of CEUR Workshop Proceedings., CEUR-WS.org</source>
          (
          <year>2014</year>
          )
          <volume>53</volume>
          {
          <fpage>54</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>