<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhanced Rule-based OWL Reasoning on Spark</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhihui Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Weimin Ge</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaowang Zhang</string-name>
          <email>xiaowangzhang@tju.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhiyong Feng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science and Technology, Tianjin University</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Software, Tianjin University</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Tianjin Key Laboratory of Cognitive Computing and Application</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The rule-based OWL reasoning is to compute the deductive closure of an ontology by applying RDF/RDFS and OWL entailment rules. In previous work, we present an approach to enhancing the performance of the rule-based OWL reasoning on Spark based on a locally optimal executable strategy. However, some key optimizations that based on LUBM do not generalize to more diverse datasets. In this paper, we analyze these problems, and demonstrate the inference engine. We have evaluated the approach using the real-world datasets. The experimental results show that our approach also achieve better performance as compared to Kim &amp; Park's algorithm (2015).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The Web Ontology Language [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (OWL) is a semantic web language designed
to represent rich and complex knowledge about things, groups of things, and
relations between things. It provides the means for ontology to verify the
consistency and to make implicit knowledge explicit. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] we already presented
an method to enhancing the performance of the rule-based OWL reasoning on
Spark.
      </p>
      <p>In this paper, we extend our approach to deal with more diverse datasets
and demonstrate the high e ciency. The major contributions and novelties of
our work are summarized as follows: the OWL-Horst rules are classi ed into
four classes and we provide the optimal executable strategies for each class. The
rest of this paper is organized as follows. Section 2 presents our locally optimal
strategies. Section 3 implements our proposed strategy on Spark and Section 4
evaluates our method on the standard dataset. In Section 5, we summarize this
paper.</p>
    </sec>
    <sec id="sec-2">
      <title>Locally optimal strategy</title>
      <p>
        OWL-Horst [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has a powerful expression and reasoning ability. There
exists interdependency among the rules. If the input to a rule Ri depends on the
output of another rule Rj , then the rule Ri is dependent on Rj . The rule Rj
should be executed before the rule Ri. A rule dependency graph can be
constructed based on the interdependency among the rules. There are millions of
executable strategies among the rules that adjacent rules satisfy dependency.
And the longest strategies contain 22 rules that cannot fully execute all rules.
The number is so huge that it is challenge to test every strategy. The OWL
reasoning has close relationship with the characteristic of datasets. Therefore,
we prefer to make a analysis of dataset.
      </p>
      <p>
        LUBM [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a widely used standard benchmark for evaluating the
performance of ontology reasoning. We divide the dataset into three classes as follows:
{ Triples whose predicate is rdf:type. We call this class as type.
{ Triples whose predicate is owl:sameAs. We call this class as sameAs .
{ The remainder is classi ed as a class. We call this class as SPO.
The data generator (UBA) uses speci c rules to generate data so that all datasets
generated by it have same data characteristic. Here we use the LUBM-50 as the
sample and analyze the proportion of each class. The result is listed in Table 1.
      </p>
      <p>
        From the Table 1, we can see that the SPO triples account for absolute
proportion, about 80% of the total. The type triples are in the second place,
but the number of sameAs triples is zero. Therefore, the OWL reasoning should
focus on the reasoning of SPO triples and type triples. Based on the statistics
of dataset, we divide the OWL-Horst rules[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] into four classes as follow:
{ Rules whose condition or consequences have triples whose predicates are
rdf:type, including R4, R5, R6, O13, O14, O15, O16. The dependency graph
of type rules is shown in Figure 2.
{ Rules whose condition or consequences have triples whose predicates are
owl:sameAs, including O1, O2, O5, O6, O8, O9, O10. The dependency graph
of sameAs rules is shown in Figure 3.
{ Rules whose condition or consequences have triples of SPO class, including
R3, O3, O4, O7a, O7b. The dependency graph of SPO rules is shown in
Figure 1, which the rule O7a and O7b are classi ed as O7.
{ The remainder is classi ed as a class, including R1, R2, O11a, O11b, O11c,
O12a, O12b, O12c. The dependency graph of schema rules is shown in Figure
4.
      </p>
      <p>O15 O16</p>
      <p>R4 R5 O13
O7</p>
      <p>R3
O1 O2</p>
      <p>O10
O3
O5</p>
      <p>O4</p>
      <p>O6</p>
      <p>For each class, based on the dependency of rules, we use the depth- rst-search
(DFS) algorithm to nd all possible executable orders among the rules. Through
the experiments, there is a large number of orders for each class, but we choose
the longest orders that contain rules as many as possible. We pick out the orders
that spend the shortest time to infer same dataset as the optimal executable
orders. The optimal executable orders for each class are listed in Table 2.</p>
    </sec>
    <sec id="sec-3">
      <title>Distributed rule-based OWL reasoning on Spark</title>
      <p>In this section, we design the overall strategy of the OWL reasoning. The
reasoning of instance triples will use the output of schema triples, so the schema
reasoning should be executed rstly. There are several orders for each class of
instance triples. We can choose any one order and then combine an new
executable strategy. In this paper, we select the rst order for each class to perform
reasoning. The general work ow of the parallel OWL reasoning is described in
gure 5.
schema rules order</p>
      <p>SPO rules order
type rules order</p>
      <p>sameAs rules order</p>
    </sec>
    <sec id="sec-4">
      <title>EVALUATION</title>
      <p>
        In this section, we conduct a series of experiments to compare the performance
of the proposed approach with the KP [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] under the same environment. We set
up a cluster with one master and four worker nodes. Each node has 48 Xeon
E5 4607 2.20GHz processors, 64GB memory and 10TB 7200 RPM SATA hard
disk. The nodes are connected with Gigabit Ethernet. All the nodes run on a
64-bit Ubuntu 12.04 LTS operating system. The version of the Spark is 1.0.2.
And the corresponding Hadoop v2.2 with Java 1.7 is installed on this cluster.
We use both the synthetic and real-world benchmarks.
      </p>
      <p>DBpedia is a widely used standard benchmark for semantic programs. We
used four sets of DBpedia data: DBpedia-10(10 million triples), DBpedia-15(15
million triples), DBpedia-20(20 million triples), DBpedia-25(25 million triples)
in our experiments. All experiments run three times and the average value is
listed as follows.</p>
      <p>Experimental results are shown in gure 6. We can see that the running time
increases almost linearly with the growth of data size, our approach performs
better than KP on DBpedia, which is improved by about 30%.</p>
      <p>5;000
1;000</p>
      <p>Our Approach</p>
      <p>KP
10
25</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper, we have shown an approach to enhancing the performance of the
rule-based OWL reasoning on Spark and demonstrated inference over both
standard and real-world dataset. In terms of reasoning time, Our method performs
much better than KP in the reasoning strategy.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by the program of the National Natural Science
Foundation of China (NSFC) under grant number 61502336. Xiaowang Zhang is
supported by Tianjin Thousand Young Talents Program.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. L.
          <string-name>
            <surname>McGuinness D.</surname>
            ,
            <given-names>van Harmelen F.</given-names>
          </string-name>
          (
          <year>2004</year>
          )
          <article-title>Web Ontology Language</article-title>
          . W3C Recommendation.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Liu</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qi</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>H.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Yu</surname>
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2011</year>
          )
          <article-title>Large scale fuzzy pD* reasoning using MapReduce</article-title>
          .
          <source>In: Proc. of ISWC 2011</source>
          , Springer, pp.
          <volume>405</volume>
          {
          <fpage>420</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Guo</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            <given-names>Z.</given-names>
          </string-name>
          , &amp; He in
          <string-name>
            <surname>J</surname>
          </string-name>
          .(
          <year>2005</year>
          )
          <article-title>LUBM: A benchmark for OWL knowledge base systems</article-title>
          .
          <source>J. Web Sem</source>
          .,
          <volume>3</volume>
          (
          <issue>2</issue>
          -3):
          <volume>158</volume>
          {
          <fpage>182</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kim</surname>
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Park</surname>
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2015</year>
          )
          <article-title>Scalable OWL-Horst ontology reasoning using Spark</article-title>
          .
          <source>In: Proc. of BigComp</source>
          <year>2015</year>
          , pp.
          <volume>79</volume>
          {
          <fpage>86</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Liu
          <string-name>
            <surname>Z.</surname>
          </string-name>
          , Feng
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Zhang</surname>
          </string-name>
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          <string-name>
            <given-names>X.</given-names>
            , &amp;
            <surname>Rao</surname>
          </string-name>
          <string-name>
            <surname>G.</surname>
          </string-name>
          (
          <year>2016</year>
          )
          <article-title>RORS: Enhanced Rule-</article-title>
          based
          <source>OWL Reasoning on Spark In: Proc. of APWeb</source>
          <year>2016</year>
          , pp.
          <volume>1</volume>
          {
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>