<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reasoning in RDFS is Inherently Serial, at least in the worst case</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter F. Patel-Schneider</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>There have recently been several papers presenting scalable distributed inference systems for the W3C Resource Description Framework Schema language (RDFS). These papers have made claims to the e ect that they can produce the RDFS closure for billions of RDF triples using parallel hardware. For example, Urbani et al claim \a distributed technique to perform materialization under the RDFS [. . . ] semantics using the MapReduce programming model" [4]. Their initial algorithm used a small number of simple MapReduce passes in sequence to perform distributed materialization. In each pass all all triples are distributed according to the processing performed during that pass, but as well all schema triples (triples with predicate rdfs:domain, rdfs:range, rdfs:subPropertyOf, or rdfs:subClassOf) are sent to all reduce nodes. To handle some of the issues described here, later versions of their algorithm looped through the passes until no new inferences were made. Similarly, Weaver and Hendler claim to be \materializing the complete [emphasis added] nite RDFS closure in a scalable manner" [6]. Their algorithm uses a single MapReduce pass and sends ontological triples (schema triples plus triples with predicate rdf:type and object rdfs:Datatype, rdfs:Class, or rdfs:ContainerMembershipProperty) to all reduce nodes. However, the algorithm distributes the other triples evenly, without regard to contents. Each reduce node then produces the nite RDFS closure of its input and these results are nally combined. They call their algorithm \embarrassingly parallel" because the inference part can be split into many independent pieces.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>These claims seem believable, as on rst glance there appears to be nothing to
prevent the e ective parallelization of RDFS inference. However, RDFS inference
does have some aspects that make it less easy than one might expect.</p>
      <p>
        First, RDFS has a formal model-theoretic semantics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The model-theoretic
semantics provides an unequivocal de nition for inference and related notions
in RDFS. The de nition is not, however, in terms of a simple set of rules of
inference, but is instead in terms of entailment, a model-theoretic notion. Claims
of computing RDFS entailment, inference, closure, or materialization must thus
be compared with this model-theoretic de nition.
      </p>
      <p>
        Second, RDFS has a in nite number of axioms. This trivially makes the
RDFS closure of any RDF graph in nite. Necessarily these axioms are very
simple and ter Horst [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] has discovered how to produce a nite subset of the
RDFS closure that contains all the useful triples, and from which the remaining
triples can be very easily computed.
      </p>
      <p>Third, RDFS inherits the peculiarities of RDF, particularly that its internal
vocabulary for de ning ontologies can be manipulated in the same way as
domain vocabularies can. This provides some power, but also provides considerable
opportunities for mischief. One particularly mischievous way of manipulating the
RDF and RDFS internal vocabulary is to use it to make extra connections within
the internal vocabulary itself, as in</p>
      <p>rdf:type rdfs:subPropertyOf rdfs:subPropertyOf .</p>
      <p>Although this kind of manipulation is allowable, the problems it produces wildly
outweigh its utility. Another mischievous way of manipulating the RDFS internal
vocabulary is to subordinate it to non-RDFS vocabulary, as in</p>
      <p>rdfs:subClassOf rdfs:subPropertyOf ex:interclassRelationship .</p>
      <p>Although it may be interesting to create such \higher-level" vocabulary, such
pursuits are generally best left to philosophers.</p>
      <p>However, some manipulation of the RDF and RDFS vocabulary can be
useful under certain circumstances. For example, one might want to make some
di erentiation between di erent kinds of properties and use a speci c property
for the sub-property relationship for them, as in
ex:partProperty rdfs:subClassOf rdf:Property .
ex:partSubPropertyOf rdfs:subPropertyOf rdfs:subPropertyOf .
ex:partSubPropertyOf rdfs:domain ex:partProperty .</p>
      <p>ex:partSubPropertyOf rdfs:range ex:partProperty .</p>
      <p>Although this ability to extend the RDF and RDFS ontology vocabulary does
not appear to be much used in practice, with only two sub-properties of
rdfs:subPropertyOf and three sub-properties of rdfs:subClassOf in the 2011 Billion Triple
Challenge Dataset, it nonetheless can be useful.</p>
      <p>When one is claiming to perform RDFS inference one needs to be very clear
how one is handling these tricky issues. If one does not explicitly disclaim
completeness then the RDF model-theoretic semantics provides the standard that
must be adhered to. If one is claiming to produce RDF closures or perform
RDFS materialization then one has to provide some way of handing the in nite
axiomatization of RDFS. If one does not explicitly state which RDF graphs are
not permitted as input then all RDF graphs must be allowed.</p>
      <p>
        As noted above Urbani et al prominently claim to be performing RDFS
materialization, and do so without any caveats in several places. It is only on close
examination of their 2009 paper [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that one nds that they ignore several
inferences where domain classes or properties are made superclasses or
superproperties of RDF or RDFS classes or properties and some inferences on container
membership properties. These are the only quali cations related to RDFS
materialization to be found in this paper. In their 2011 paper [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], there are some
comments in the body of the paper about ignoring hijacking cases. There is
also a comment about adding one extra step to handle special cases related to
container membership properties and datatypes. Finally, there is a comment
indicating that they also do not handle RDF and RDFS ontology vocabulary
extension, but that it could be somehow handled by adding an extra loop. In
essence in this latter paper Urbani et al eliminate the possibility of modifying
or extending the RDF and RDFS vocabulary and thus excluding the situations
that make RDF and RDFS di erent from a very cut down version of OWL 2 DL
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. What they do end up computing, provided that one ignores several bugs in
the algorithms and lls in a lot of missing information, is fairly close to complete
when their exclusions are taken into account. Their work does point the way to
using a xed number of simple MapReduce passes to materialize the nite RDFS
closure when the RDF and RDFS vocabulary is neither modi ed nor extended.
      </p>
      <p>Weaver and Hendler do state very clearly that they are only producing nite
RDFS closures, but then very clearly state that they are complete. It is only
in the middle of the paper that they state that they assume that there are
no modi cations or extension of the RDF and RDFS vocabulary. So, although
Weaver and Hendler are up-front about the nite closure, counter to their claims
of completeness they only consider the RDFS fragment of OWL 2 DL. The result
of this algorithm appears to be nearly complete when their exclusions are taken
into account, although this is somewhat hard to tell, as there are some bugs in
the arguments in the paper. This work points the way to a simple distributed,
single-pass method to materialize the nite RDFS closure when the RDF and
RDFS vocabulary is neither modi ed nor extended.</p>
      <p>So if the RDF and RDFS ontology vocabulary is neither modi ed nor
extended it is possible to easily compute RDFS closures. However, just extending
the RDF and RDFS ontology vocabulary produces problems. Consider
ex:physicalSubPropertyOf rdfs:subPropertyOf rdfs:subPropertyOf .
ex:physicalPartOf ex:physicalSubPropertyOf ex:partOf .</p>
      <p>ex:wheelOf ex:physicalSubPropertyOf ex:physicalPartOf .</p>
      <p>The closure of this RDF graph includes</p>
      <p>ex:wheelOf rdfs:subPropertyOf ex:partOf .
which is not necessarily produced by Urbani et al or Weaver and Hendler. The
basic problem here is that it is possible to infer new schema or ontological triples,
and these triples can then produce further triples, even further ontological triples.</p>
      <p>It might be the case that these failures are simply due to a bug or limitation
in the algorithms, and that the general approach in these papers, or a slight
generalization of it, can be used to perform nite RDFS materialization. However,
this is not the case. Instead, it turns out that there are proofs in RDFS that are
arbitrarily large but only depend on a xed number of triples that contain any
of the RDF or RDFS vocabularies.</p>
      <p>An example of this comes from RDF graphs of the following form:
sp1 rdfs:subPropertyOf rdfs:subPropertyOf .
sp2 sp1 sp1 .
sp3 sp2 sp2 .
...</p>
      <p>spn spn 1 spn 1 .</p>
      <p>This graph RDFS-entails
spn rdfs:subPropertyOf rdfs:subPropertyOf .</p>
      <p>Any proof process that produces these results will need to perform an
arbitrarily large amount of work involving triples that do not mention the RDF
or RDFS vocabularies. Even worse, there is no commonality between the triples
beyond the chaining from one triple to the next. This means that any
distribution process will not be able to combine any signi cant fraction of the triples
together, except by using this chain. The end result is that it is not possible to
produce these proofs in a xed number of nite parallel steps, even if
determining the closure of all triples that include any RDF or RDFS vocabulary can be
done in a xed amount of time.</p>
      <p>In fact, a simple reduction from the Monotone Circuit Problem shows that
RDFS entailment is inherently serial, so it is unlikely that any signi cant speedup
can be achieved using a polynomial number of processors, at least in the worst
case. In cases like the one above a parallel implementation will have to produce
inferences one by one down the chain. Of course, this example is rather contrived,
but a system that performs RDFS materialization is not allowed to pick and
choose its inputs; it has to handle all valid RDF graphs that have not been
explicitly excluded.</p>
      <p>So what then does this say about the empirical di culty of computing
nite RDFS closures using distributed computing, as opposed to the di culty
of describing just which are the problematic cases? Actually very little, as the
example above demonstrates the worst case, and the general pattern is very
unlikely. (The situation is, however, very di erent for OWL, as chaining of
information along domain properties is quite common in OWL.) The investigations
by Urbani et al and Weaver and Hendler show that computing the RDFS closure
could be usually quite easy, particularly when excluding the mischievous cases
that no sane person should argue for. Further, it would not be hard to detect
cases where the RDF and RDFS vocabulary is being extended and to handle
these cases specially. Although this would be slow in comparison to the normal
processing, this is the price that users of the extension facility must pay.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Hayes</surname>
          </string-name>
          .
          <source>RDF semantics. W3C Recommendation</source>
          , http://www.w3.org/TR/rdf-mt/,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Boris</given-names>
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter F. Patel-Schneider</surname>
          </string-name>
          ,
          <article-title>and Bijan Parsia. OWL 2 web ontology language: Structural speci cation and functional-style syntax</article-title>
          .
          <source>W3C Recommendation</source>
          , http://www.w3.org/TR/owl2-syntax/,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Herman</surname>
          </string-name>
          J. ter Horst.
          <article-title>Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          -3):
          <volume>79</volume>
          {
          <fpage>115</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Jacopo</given-names>
            <surname>Urbani</surname>
          </string-name>
          , Spyros Kotoulas, Jason Maassen, Frank van Harmelen, and Henri Bal.
          <article-title>WebPIE: A web-scale parallel inference engine using MapReduce</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>10</volume>
          :
          <fpage>59</fpage>
          {
          <fpage>75</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Jacopo</given-names>
            <surname>Urbani</surname>
          </string-name>
          , Spyros Kotoulas, Eyal Oren, and Frank van Harmelen.
          <article-title>Scalable distributed reasoning using MapReduce</article-title>
          .
          <source>ISWC-2009</source>
          , pages
          <fpage>634</fpage>
          {
          <fpage>649</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Jesse</given-names>
            <surname>Weaver</surname>
          </string-name>
          and
          <article-title>James A. Hendler. Parallel materialization of the nite RDFS closure for hundreds of millions of triples</article-title>
          .
          <source>ISWC-2009</source>
          , pages
          <fpage>682</fpage>
          {
          <fpage>697</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>