<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology Schema-specific Rule Materialization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Seungwoo Lee</string-name>
          <email>swlee@kisti.re.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chang-Hoo Jeong</string-name>
          <email>chjeong@kisti.re.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jung-Ho Um</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Taehong Kim</string-name>
          <email>kimtaehong@kisti.re.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hanmin Jung</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept of Computer Intelligence Research</institution>
          ,
          <addr-line>KISTI 245 Daehak-ro, Yuseong-gu, Daejeon, 305-806</addr-line>
          ,
          <country country="KR">Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The reasoning should tackle big data issues like other domains as the size of ontology grows bigger and bigger. Especially, rule-based reasoning should overcome the following challenges: duplicate elimination and rule matching efficiency. To deal with these challenges, we introduce a new rulebased reasoning method which materializes each generic instance rule into several schema-specific instance rules and combines with Hadoop framework to deal with billions of triples. The experiment shows the materialization remarkably improves the efficiency of rule-based reasoning by reducing the amount of required memory and making it linear to the data size.</p>
      </abstract>
      <kwd-group>
        <kwd>rete reasoning</kwd>
        <kwd>rule materialization</kwd>
        <kwd>RDFS rule</kwd>
        <kwd>OWL Horst rule</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Rule-based reasoning is a process that derives new knowledge – it is represented in
triples composed of subject, predicate and object in ontology reasoning – from given
set of knowledge by matching more than one rules. However, the reasoning process is
also suffering from big data issues like other domains as the size of ontology has
become bigger and bigger. To achieve efficient reasoning with overcoming big data
issues, we have several challenges and two of them are follows: duplicate elimination
and rule matching efficiency.</p>
      <p>
        First, separate input sets of triples may derive same – i.e., duplicate – triples by one
rule. Even different rules may derive duplicate triples. Urbani et al.[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] pointed out that
reasoning might derive 50 times more duplicates than unique derived triples in their
preliminary simulation. So, we need an efficient mechanism for eliminating
duplicates and this challenge should be overcome by all means to achieve the scalability of
reasoning process. Second, some parts of reasoning rules are often so generic to cause
too many matches of triples. Rules are generally defined from the semantics of
vocabularies of ontology description languages such as RDF (Resource Description
Framework), RDFS (RDF Schema) and OWL (Web Ontology Language). Therefore,
these rules are always valid independently of any specific ontology and have generic
triple patterns in their condition. These rules often cause inefficiency in matching
them to given set of triples because the generic patterns could be matched to too many
given triples, most of which are eventually filtered out when joining with triples
matched to remaining patterns. So, we need an efficient mechanism for reducing such
unnecessary triple matches.
      </p>
      <p>In this paper, we present a method that removes unnecessary pattern matches,
eventually reduces join operations in rule-based reasoning selectively fetches input
triples, and efficiently eliminates duplicates derived by reasoning rules.</p>
      <p>The remaining part of this paper is as follows: in section 2, some related works are
explored and in section 3, our main approaches are described. Some experiments
justifying our approaches are given in section 4, which is followed by conclusion in
section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Rule-based reasoning is implemented widely based on rete algorithm[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] due to its
efficiency in pattern matching. This algorithm achieves efficiency of pattern matching
by enabling more than one rule to share triples matched to their common triple
patterns. Most time-consuming operation in rete occurs when joining triples from more
than one pattern because join operation causes repeated search and comparison of
corresponding values to common variables. To perform this efficiently, indexing
mechanisms such as hashing or Pyramid technique are usually adopted[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Rete
has a big advantage in efficiency but also has a severe disadvantage in scalability. It
requires quite large memory spaces because it maintains all triples matched to each
pattern and joined by more than one pattern in main memory. This makes it
impossible for rete-based reasoning to process billions of triples.
      </p>
      <p>
        To achieve scalability of rule-based reasoning, recent research such as WebPIE[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
utilized Hadoop, a distributed and parallelized computing framework. It showed that
the performance of Hadoop-based reasoning is highly dependent on how to design
mappers and reducers for each rule. So, it designed rule-specific mappers and
reducers and succeeded to achieve web-scale reasoning. To eliminate duplicate derivation
in early stage, it tried to get mappers to group input triples by considering the output
of the rule, not joining key. It, in addition, maintained schema triples in main memory
to improve load balances between parallelized computing nodes.
      </p>
      <p>In this paper, we describe a new approach that first removes unnecessary pattern
matches in rete framework by materializing rules based on given ontology schema,
next fetches input triples selectively by implementing rule-specific input formats, and
finally eliminates duplicate derivations by grouping rules having same output forms.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Rule Materialization</title>
      <p>
        In our previous work[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we applied dynamic materialization on some rules having
genric patterns in RDFS and OWL semantics[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ][
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and in this paper, we extend the
materialization to all rules having schema triple patterns in RDFS and OWL
semantics. Triples can be divided into two types: schema and instance triples. Schema
indicates triples defining classes, relationships between them, and attributes related to
them while instance means triples describing individuals, relationship between them,
r
d
f
r
d
f
s
id
1
      </p>
      <p>entailment rules
(u p v)  (p rdf:type rdf:Property)
(u p v) (if v is a XML literal and _:n is a bland node allocated to v)  (_:n rdf:type
2
rdf:XMLLiteral)
(u p v) (if v is a plain literal and _:n is a bland node allocated to v)  (_:n rdf:type
1</p>
      <p>rdfs:Literal)
2 (p rdfs:domain c) (u p v)  (u rdf:type c)
3 (p rdfs:range c) (u p v)  (v rdf:type c)
4a (u p v)  (u rdf:type rdfs:Resource)
4b (u p v)  (v rdf:type rdfs:Resource)
5 (p rdfs:subPropertyOf q) (q rdfs:subPropertyOf r)  (p rdfs:subPropertyOf r)
6 (p rdf:type rdf:Property)  (p rdfs:subPropertyOf p)
7 (p rdfs:subPropertyOf q) (u p v)  (u q v)
8 (c rdf:type rdfs:Class)  (c rdfs:subClassOf rdfs:Resource)
9 (c rdfs:subClassOf d) (u rdf:type c)  (u rdf:type d)
10 (c rdf:type rdfs:Class)  (c rdfs:subClassOf c)
11 (c rdfs:subClassOf d) (d rdfs:subClassOf e)  (c rdfs:subClassOf e)
12 (p rdf:type rdfs:ContainerMembershipProperty)  (p rdfs:subPropertyOf
rdfs:member)
13 (c rdf:type rdfs:Datatype)  (c rdfs:subClassOf rdfs:Literal)
and attributes related to them. Similarly, triple patterns forming rules can also be
divided into two types: schema and instance triple patterns. Schema triples are generally
small and static to a given ontology so as to be maintained in main memory while
instance triples may continue to grow as much as not to be maintained in main
memory. So, schema-only rules could be processed sufficiently on rete framework,
but rules having instance triple patterns could not be processed on rete when the
patterns are too generic.</p>
      <p>
        For example, the generic triple pattern (u p v) of rdfs2 in Table 1 could be
matched to all given triples, but only small part of them could be joined with specific
triples matched to the remaining triple pattern (p rdfs:domain c) due to the common
variable ‘p’. Indexing mechanisms such as hashing are usually applied to check such
consistency efficiently, but they also require large memory spaces. More badly, as the
target ontology grows, such indexing size also grows and may not be maintained in
main memory. To solve this issue, we take following approaches according to types
of rules:
 Schema-only rules (i.e., rdfs 5, 6, 8, 10, 11, 12, and 13, and owl-horst 9, 10, 12a,
12b, 12c, 13a, 13b, and 13c) are processed fully on rete framework.
 Generic-only rules (i.e., rdf 1 and 2, rdfs 1, 4a, and 4b, owl-horst 5a and 5b) are
replaced and processed fully using dictionary which encodes all unique terms of
input triples.
 Rules related to ‘owl:sameAs’ (i.e., owl-horst 6, 7, and 11) are replaced with
sameAs table storing all same terms, defining a canonical term among them, and
replacing all same term occurrences with their canonical ones.
 Remaining rules having combination of schema and instance triple patterns (i.e.,
rdfs 2, 3, 7, and 9, owl-horst 1, 2, 3, 4, 8a, 8b, 14a, 14b, 15, and 16) are processed
first on rete to be materialized into schema-specific rules and then each
materialized rule is processed on distributed and parallelized Hadoop framework.
The first and second ones are straightforward and the third one is similar to the
approach of WebPIE[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. So, the detailed explanation of them is omitted here. For the
last one, our previous work[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] introduced rete-based framework that materializes
some of the rules (i.e., rdfs 2, 3, 7, and 9, owl-horst 4, and 8a) into schema-specific
rules and this paper extends the work into other rules in OWL Horst and incorporates
Hadoop framework[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] additionally to deal with billions of triples.
      </p>
      <p>
        For example, when a given ontology defines n functional properties, p1,…,pn, our
rete-based reasoning framework will materialize the rule owl-horst1 into following n
rules: (u pi v) (u pi w)  (v owl:sameAs w) (here, i = 1,…,n). These rules can be
implemented in one pair of a mapper and a reducer as in WebPIE[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] but having pi as
parameters. In addition, when input triples are stored and indexed with six possible
combinations of subject, predicate and object using Hbase[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], one of column-based,
no-SQL databases, the input format of the mapper can be implemented to selectively
fetch triples only matched to the corresponding pattern. Finally, we can combine rules
having a common conclusion (e.g., rdfs 2, 3, and 9) and implement one pair of a
mapper and a reducer to efficiently eliminate duplicates derived from different rules.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>
        To demonstrate the feasibility of the proposed materialization approach, we first
checked memory usages according to materialization. Fig. 1 shows that the memory
without materialization is exhausted quickly even though the size of data is quite
small, while the memory with materialization is consumed smoothly. We also
compared the elapsed time in reasoning with and without materialization using
LUBM[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The result in Fig. 2 shows that materialization improves the reasoning
remarkably and even makes it linear to the size of data.
      </p>
      <p>Especially, rete-reasoning without materialization consumed and exhausted
memories very quickly even for small size of data. However, materialization solved this
issue effectively by maintaining only schema triples in memory and leaving reasoning
of instance rules to Hadoop framework.</p>
      <p>(a) without materialization, LUBM(20)</p>
      <p>(b) with materialization, LUBM(1000)</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This paper explained a rete-based reasoning method that materializes RDFS and
OWL-Horst rules when an ontology schema is given and then can combine with
Hadoop framework to deal with billions of triples. Each generic instance rule is
materialized into several schema-specific rules, which can be implemented in a pair of a
mapper and a reducer. Each mapper can selectively fetch input triples matched to its
pattern using Hbase and rules having a common conclusion can be combined into a
reducer to efficiently eliminate duplicate derivations from different rules.</p>
      <p>
        The combination with Hadoop framework is being implemented and will be tested
to check how much our method could improve reasoning performance, comparing to
WebPIE[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in near future.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>This work was supported by the IT R&amp;D program of MSIP/KEIT. [2014044034002,
High performance database solution development for integrated big data monitoring
and analysis]</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Urbani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maassen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bal</surname>
          </string-name>
          , H.:
          <article-title>WebPIE: A Web-scale Parallel Inference Engine using MapReduce</article-title>
          .
          <source>Journal of Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <volume>10</volume>
          ,
          <fpage>59</fpage>
          --
          <lpage>75</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Forgy</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          :
          <article-title>Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>19</volume>
          (
          <issue>1</issue>
          ),
          <fpage>17</fpage>
          --
          <lpage>37</lpage>
          (
          <year>1982</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Doorenbos</surname>
          </string-name>
          , R.B.:
          <article-title>Production Matching for Large Learning Systems</article-title>
          .
          <source>Ph.D Thesis</source>
          , Carnegie Mellon University, Pittsburgh, PA (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Özacar</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Öztürk</surname>
          </string-name>
          , Ö.,
          <string-name>
            <surname>Ünalir</surname>
            ,
            <given-names>M.O.</given-names>
          </string-name>
          :
          <article-title>Optimizing a Rete-based Inference Engine using a Hybrid Heuristic and Pyramid based Indexes on Ontological Data</article-title>
          .
          <source>Journal of Computers</source>
          <volume>2</volume>
          (
          <issue>4</issue>
          ),
          <fpage>41</fpage>
          --
          <lpage>48</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jung</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>You</surname>
          </string-name>
          , B.-J.:
          <article-title>Dynamically Materializing Wild Pattern Rules Referring to Ontology Schema in Rete Framework</article-title>
          .
          <source>In Proceedings of the 1st Asian Workshop on Scalable Semantic Data Processing (AS2DP)</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>RDF</given-names>
            <surname>Semantics</surname>
          </string-name>
          , available at: http://www.w3.org/TR/rdf-mt
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>OWL</given-names>
            <surname>Web Ontology Language Semantics</surname>
          </string-name>
          and Abstract Syntax, available at: http://www.w3.org/TR/owl-semantics
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Horst</surname>
          </string-name>
          , H.J.: Completeness,
          <article-title>Decidability and Complexity of Entailment for RDF Schema and a Semantic Extension Involving the OWL Vocabulary</article-title>
          .
          <source>Journal of Web Semantics</source>
          <volume>3</volume>
          (
          <issue>2- 3</issue>
          )
          <fpage>79</fpage>
          --
          <lpage>115</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Apache</given-names>
            <surname>Hadoop</surname>
          </string-name>
          , available at http://wiki.apache.org/hadoop.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Um</surname>
            ,
            <given-names>J.-H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
          </string-name>
          , T.-H.,
          <string-name>
            <surname>Jeong</surname>
          </string-name>
          , C.-H.,
          <string-name>
            <surname>Seo</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jung</surname>
          </string-name>
          , H.:
          <article-title>MapReduce-based Bulk-Loading Algorithm for Fast Search for Billions of Triples</article-title>
          ,
          <source>In Proceedings of the 9th KIPS International Conference on Ubiquitous Information Technologies and Applications (CUTE</source>
          <year>2014</year>
          ), (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heflin</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>LUBM: A Benchmark for OWL Knowledge Base Systems</article-title>
          .
          <source>Journal of Web Semantics</source>
          <volume>3</volume>
          (
          <issue>2</issue>
          ),
          <fpage>158</fpage>
          --
          <lpage>182</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>