<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards In-Memory RDFS Entailment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Javier D. Fernandez</string-name>
          <email>jdfernandez@fi.upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Gutierrez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel A. Mart nez-Prieto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jorge Perez</string-name>
          <email>jperezg@dcc.uchile.cl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DataWeb Research, Department of Computer Science</institution>
          ,
          <addr-line>Univ. de Valladolid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science</institution>
          ,
          <addr-line>Univ. de Chile</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Ontology Engineering Group (OEG)</institution>
          ,
          <addr-line>Univ. Politecnica de Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>
        Massive publication e orts have enriched the Web with huge amounts of semantic
data represented in RDF [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and reasoning tasks at such scale are a formidable
challenge. RDF Schema (RDFS) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] de nes the most simple inference in RDF
introducing a vocabulary with prede ned semantics to describe relationships
such as typing of entities and hierarchy relations in classes and properties. This
vocabulary allows one to infer new facts, not originally explicit in the RDF
graph, by means of a process called RDFS entailment.
      </p>
      <p>
        Traditionally, two types of solutions tackle RDFS entailment. On the one
hand, all facts which can be inferred from an RDF graph can be materialized
and added to the graph. This is referred to as the graph closure, which allows to
easily check if a triple is inferred from the graph. However, the closure can be of
size quadratic in the size of the initial graph which is not a practical bound from a
database point of view [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Although some approaches can compute huge closures
on the basis of distributed systems [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ], the nal (potentially massive) closure
has to be still managed and queried, paying costly latencies. On the other hand,
one could maintain the original graph and check the entailment on-demand.
Unfortunately, these solutions have to pay a potentially large number of I/O
accesses at huge scale [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which disregard a broader adoption whenever a fast
response prevails.
      </p>
      <p>This scenario claims for forms of lightweight entailment which could make
reasoning feasible at Web scale. Solutions will necessarily involve nding space/
time tradeo s that (i) save storage requirements, and (ii) minimize I/O costs.
Some clever form of compression would address both issues which are closely
related. In fact, an in-memory solution enables these two objectives to be achieved
if the compressed data can be directly accessed without prior decompression,
optimizing the memory footprint. This solution would dissuade to maintain graph
closures and allows to design an e cient on demand algorithm which performs
triple checking in main memory.</p>
      <p>
        Our ongoing work, described in the next section, follows the above ideas by
compressing the RDF graph using RDF/HDT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a data structure known to
assure a reduced memory footprint while providing fast triple pattern resolution
in main memory [
        <xref ref-type="bibr" rid="ref4 ref8">8, 4</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>Ongoing Research</title>
      <p>
        In our work we consider the minimal RDFS subset proposed by Mun~oz, et al.
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which includes all the relevant RDFS keywords rdfs:subPropertyOf (sp),
rdfs:subClassOf (sc), rdf:type (type), rdfs:domain (dom), and rdfs:range
(range). This fragment preserves the original RDFS semantics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and allows
RDFS inference by just checking the existence of some paths in the original
graph, obtaining a good theoretical upper-bound on the cost of RDFS
entailment [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]. Although some proposals have implemented this minimal RDFS
inference [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], their performance is far from matching the theoretical complexity
bounds mainly because of the I/O cost aspects. Thus, we address an in-memory
compressed solution for huge graphs, minimizing I/O costs and approaching the
promised theoretical optimal performance.
      </p>
      <p>
        RDFS-HDT. HDT (Header-Dictionary-Triples) is a succinct data structure,and
serialization format, designed for storing, exchanging and basic querying of RDF
compressed data [
        <xref ref-type="bibr" rid="ref2 ref8">8, 2</xref>
        ]. It is based on two main components: the HDT Dictionary
maps each term in an RDF graph to a unique identi er (ID), enabling the HDT
Triples to manage RDF as a more e cient graph of IDs. Both components are
represented and indexed succinctly on the basis of compressed string
dictionaries and compact graph encodings. For instance, the standard corpus of DBpedia
3.8 (which comprises more than 431 million triples and takes more than 60GB
of space in plain format) can be managed in less than 7 GB of memory, and
resolves SPARQL triple patterns at the level of microseconds (see [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for further
details).
      </p>
      <p>Our ongoing approach for RDFS entailment is referred to as RDFS-HDT.
Given an RDF graph G, RDFS-HDT maintains the original HDT Dictionary
concept, but it divides the graph encoded in the HDT triples in three di erent
subgraphs:
{ The subproperty subgraph, G(sp) comprises all triples (subject, sp, object).
{ The subclass subgraph, G(sc) comprises all triples (subject, sc, object).
{ All the remaining triples are managed in the general subgraph G0.</p>
      <p>
        First, this decision assures that the space required by RDFS-HDT is, in the
worst case, similar to that required by a general HDT. Then, this data
reorganization enables RDFS entailment to be performed in main memory following the
approach in [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]:
(a) The original HDT retrieval features are used to resolve those triple patterns
involving dom, or range properties. In order to speed up these queries, the
triples involving dom and range can be extracted from G0 and indexed
independently.
(b) The location of elements related by paths of sc or sp properties is performed
over the speci c G(sp) or G(sc) subgraphs respectively; RDFS-HDT resolves
successive patterns of the form (subject, sc, ?subclass) (similar for sp)
until the required triple is entailed, or the corresponding path is nished
without success.
(c) Addressing type properties, and general triples of the form (a; p; b), with p
not an RDFS keyword, needs a combination of both approaches, using the
original HDT retrieval plus sc and sp path inference.
      </p>
      <p>
        The next step in our ongoing project is to leverage compressed tree
representations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for encoding the sp and sc hierarchies. This would allow us to store
G(sp) and G(sc) in minimal space while e ciently supporting ancestor-queries
over these hierarchies, which is the main feature needed in the algorithms
proposed in [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]. These advances will report some space/time tradeo s allowing
RDFS-HDT adoption under di erent computational con gurations at Web scale.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>Claudio Gutierrez and Jorge Perez are funded by the Millennium Nucleus Center
for Semantic Web Research, Grant NC120004.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D.</given-names>
            <surname>Arroyuelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Canovas</surname>
          </string-name>
          , G. Navarro, and
          <string-name>
            <given-names>K.</given-names>
            <surname>Sadakane</surname>
          </string-name>
          .
          <article-title>Succinct Trees in Practice</article-title>
          .
          <source>In Proc. of the 12th Workshop on Algorithm Engineering and Experiments (ALENEX)</source>
          , pages
          <fpage>84</fpage>
          {
          <fpage>97</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J.D.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          .
          <article-title>Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data</article-title>
          .
          <source>PhD thesis</source>
          , University of Valladolid, Spain,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J.D.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Mart</surname>
          </string-name>
          nez-Prieto,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          .
          <article-title>Binary RDF Representation for Publication and Exchange (HDT)</article-title>
          .
          <source>W3C Member Submission</source>
          ,
          <year>2011</year>
          . http://www.w3.org/Submission/2011/03/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.D.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Mart</surname>
          </string-name>
          nez-Prieto,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Arias</surname>
          </string-name>
          .
          <article-title>Binary RDF Representation for Publication and Exchange</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>19</volume>
          :
          <fpage>22</fpage>
          {
          <fpage>41</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>W.J. Ha mans and G.H.L.</given-names>
            <surname>Fletcher</surname>
          </string-name>
          .
          <article-title>E cient RDFS Entailment in External Memory</article-title>
          .
          <source>In Proc. of the OnTheMove Workshops (OTM)</source>
          , pages
          <fpage>464</fpage>
          {
          <fpage>473</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>P.</given-names>
            <surname>Hayes</surname>
          </string-name>
          .
          <source>RDF Semantics. W3C Recommendation</source>
          ,
          <year>2004</year>
          . http://www.w3.org/TR/rdf-mt/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>F.</given-names>
            <surname>Manola</surname>
          </string-name>
          and E. Miller, editors.
          <source>RDF Primer. W3C Recommendation</source>
          ,
          <year>2004</year>
          . http://www.w3.org/TR/rdf-primer/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Mart</surname>
          </string-name>
          nez-Prieto,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arias</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.D.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          .
          <article-title>Exchange and Consumption of Huge RDF Data</article-title>
          .
          <source>In Proc. of the 9th Extended Semantic Web Conference (ESWC)</source>
          , pages
          <fpage>437</fpage>
          {
          <fpage>452</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. S. Mun~oz, J. Perez, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <article-title>Simple and E cient Minimal RDFS</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <volume>220</volume>
          {
          <fpage>234</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>J. Perez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Arenas</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
          </string-name>
          . nSPARQL:
          <article-title>A navigational language for RDF</article-title>
          .
          <source>Journal of Web Semanticas</source>
          ,
          <volume>8</volume>
          (
          <issue>4</issue>
          ):
          <volume>255</volume>
          {
          <fpage>270</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>J. Urbani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Maassen</surname>
            ,
            <given-names>F. Van Harmelen</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>and H.</given-names>
            <surname>Bal</surname>
          </string-name>
          .
          <article-title>OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples</article-title>
          .
          <source>In Proc. of the 7th Extended Semantic Web Conference (ESWC)</source>
          , pages
          <fpage>213</fpage>
          {
          <fpage>227</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>J.</given-names>
            <surname>Weaver</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.A.</given-names>
            <surname>Hendler</surname>
          </string-name>
          .
          <article-title>Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples</article-title>
          .
          <source>In Proc. of the 8th International Semantic Web Conference (ISWC)</source>
          , pages
          <fpage>682</fpage>
          {
          <fpage>697</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>