<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Ontology-based Approach to Adaptive Data Processing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Haokun Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaowang Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhiyong Feng</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science and Technology, Tianjin University</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Software, Tianjin University</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Tianjin Key Laboratory of Cognitive Computing and Application</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present an ontology-based approach to generate work ows which are core in adaptive data processing by taking advantage of ontologies in characterizing implicit relations among tasks of data processing. Moreover, compared with manual con gurations of current approaches, ontology reasoning can automatically infer preference orders of tasks and overcome the limitation of manual con gurations. Experimental results show that our proposal is e ective and e cient.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Ontology Construction for Generating Work ows</title>
      <p>
        The rst step of our proposal is constructing ontologies in Protege [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In the
following, we introduce the four steps of ontology construction.
      </p>
      <p>Cla
and
{
{
{
ss De nition There are three classes, namely, ML Process, ML Process
Parameters. Each of them has some sub classes.</p>
      <p>The ML Process class represents the whole pro cedure of data pro cessing and
it contains four main sub classes, namely, Data Preprocess, Feature Preprocess,
Modeling, and Evaluation.</p>
      <p>The ML Process Service class represents a set of all services and it contains
four sub classes, namely, Data Preprocess Service, Feature Preprocess Service,
Modeling Service, and Evaluation Service.</p>
      <p>The Parameters class represents a set of all parameters and it contains many
sub classes of parameters w.r.t. variable mo dels.</p>
      <p>Service,
Property De nition There are two kinds of prop erties, namely, object (relations
among classes) and data (relations b etween entities and datatyp e). There are 14
ob ject prop erties as well as 7 data prop erties.</p>
      <p>Note that getModelRequirement and getParameters contain many subprop
erties which are used to select mo dels and parameters. Besides, three subprop erties
hasC, hasGamma and hasClass Weight of getParameters are used to select
parameters which are imp ortant to SVM.</p>
      <p>
        Entity De nition According to a sp eci c application scenario, we
entities as normal [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and these entities can b e used with some rules
reasoning to get more facts ab out the sp eci c application scenarios.
of the entity is shown at Section 3 clearly.
create
to do
The
exam
ple
of
ontology
for
      </p>
      <p>Citi-bike
pre
dictio
n.</p>
      <p>
        Rule De nition We adopte SWRL [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] that is intended to b e the rule language of
semantic web to express the statements that can not b e achieved with OWL[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Some SWRL rules generate the work ow according to the user requirements and
the prop erties of the data set. Part of the rules are shown in Section 3.
Ontology Construction Finally, we apply Protege [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in constructing ontology
and employ VOWL[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to visualize the ontology schema shown in Fig. 1.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experiments and evaluations</title>
      <p>New York Citi-bike parking quanlity predication
(www.citibikenyc.com/systemdata): As the rents/returns of bikes at di erent stations in di erent periods are
unbalanced, it is interesting to predict the number of bike parking spots to be
rent from/returened to each station cluster.
There are 3 steps to generate an optimal work ow as follows:
Step 1 Instantiating a Citi-bike parking quanlity predication, named citibike,
which have three properties, namely, isCase (to a string \citibike "),
formatOfData (to a string \SQL"), and accuracy in general (to a boolean value
\ture ") shown in Fig. 2 as well as SWRL rules (e.g., rules in Table 1).
Step 2 Generating a work ow by ontology reasoning via Protege as follows:
data preprocess ! feature preprocess ! modeling ! evaluation shown Fig. 2.
Step 3 Executing the work ow through properties in red from data preprocess
to feature preprocess, modeling, and evaluation.</p>
      <p>As we can see, the results of Citi-bike parking quanlity predication via the
generated work ow are the same as the results via optimally manual con
gurations shown in Fig. 3.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>In this paper, we presente an ontology-based approach to generate work ows
so that optimal models and parameters can be automatically selected in data
processing. Our proposal provides a novel way for adaptive data processing via
ontologies and is helpful to apply ontology techniques for data processing.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work is supported by the Key Technology Research and Development
Program of Tianjin (16YFZCGX00210), the National Natural Science Foundation of
China (61502336), the National Key R&amp;D Program of China (2016YFB1000603),
and the Seed Foundation of Tianjin University (2018XZC-0016).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Fernandez-Delgado</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cernadas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barro</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>Gomes</given-names>
            <surname>Amorim</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Do we need hundreds of classi ers to solve real world classi cation problems</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>15</volume>
          (
          <issue>1</issue>
          ):
          <volume>3133</volume>
          {
          <fpage>3181</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Horrocks</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Patel-Schneider</surname>
          </string-name>
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Boley</surname>
          </string-name>
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Tabet</surname>
          </string-name>
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Grosof</surname>
          </string-name>
          <string-name>
            <given-names>B.</given-names>
            , and
            <surname>Dean</surname>
          </string-name>
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>SWRL: A semantic web rule language combining OWL and RuleML</article-title>
          .
          <source>W3C Member submission, 21 May</source>
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lohmann</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Negru</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haag</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ertl</surname>
            <given-names>T.</given-names>
          </string-name>
          <article-title>Visualizing ontologies with VOWL</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>7</volume>
          (
          <issue>4</issue>
          ):
          <volume>399</volume>
          {
          <fpage>419</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Musen</surname>
            <given-names>M.A.</given-names>
          </string-name>
          <article-title>The Protege project: A look back and a look forward</article-title>
          .
          <source>AI Matters</source>
          . Association of Computing Machinery Speci c Interest Group in
          <source>Arti cial Intelligence</source>
          ,
          <volume>1</volume>
          (
          <issue>4</issue>
          ),
          <year>June 2015</year>
          . DOI:
          <volume>10</volume>
          .1145/2557001.25757003.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Oozie:
          <article-title>Apache work ow scheduler for Hadoop</article-title>
          .
          <source>The Apache Software Foundation, September</source>
          ,
          <year>2010</year>
          . http://oozie.apache.org/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pan</surname>
            <given-names>J.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vetere</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez-Perez</surname>
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            <given-names>H</given-names>
          </string-name>
          . (Eds.)
          <article-title>Exploiting linked data and knowledge graphs in large organisations</article-title>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>