<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multiple Inheritance of Ontology Concepts in a Semantic-Aware Encoding Scheme</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Weiqin Xu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olivier Cure</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philippe Calvez</string-name>
          <email>philippe.calvez1@engie.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ENGIE CRIGEN CSAI Lab</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIGM (UMR 8049)</institution>
          ,
          <addr-line>CNRS, UPEM, F-77454, Marne-la-Vallee</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Litemat[1] is semantic-aware encoding scheme that was originally designed to support e cient inferences over the RDFS ontology language. Intuitively, it encodes, with integer values, the elements of the TBox, i.e., concepts and properties, in such a way that reasoning over these hierarchies can be reduced to some computation over identi er intervals. It has recently been extended to support owl:sameAs[4], owl:inverseOf and owl:transitiveProperty[2] by proposing a similar form of encoding for elements of the ABox, i.e., individuals. We can hence consider that Litemat now supports inferences in the ontology language frequently referred to as RDFS++. The main advantages provided by this encoding scheme is to considerably reduce the need for inference materialization and to provide an adapted query rewriting and optimization approach that improve the performance of query answering. Considering materialization, Litemat eliminates the need to store inferred facts since the semantics of attributed identi ers to concepts, properties and individuals are su cient for RDFS++ reasoning services. Moreover, Litemat's query rewriting approach replaces costly union of SPARQL basic graph patterns (BGP) with query lters over integer intervals. Litemat hence proposes a trade-o between the two most popular inference solutions encountered in RDF stores. These nice properties only come with an overhead of maintaining slightly larger dictionaries, due to larger identi er values and additional metadata, than the standard case of most RDF stores. In this paper, we present Litemat's support for multiple inheritance as well as the query processing and optimization that comes with it. 3 In this section, we present LiteMat's encoding scheme using the following ontology concept hierarchy (but this method can also be applied to property hierarchies): A v &gt;, B v &gt;, C v &gt;, D v &gt;, A1 v A, C1 v C, F 1 v F , E1 v E, E2 v E, F v B, F v C, E v A1, E v B and E v C.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>LiteMat's encoding approach and query processing</title>
      <p>We can see that the last 5 concept subsumptions describe some multiple
inheritance, i.e., the concepts E (resp. F ) have 3 (resp. 2) super classes.</p>
      <p>In Figure 1(a), we present LiteMat's encoding for this TBox. In a rst step,
the assignment of an identi er, using a binary representation, for each concept
is performed in a top-down recursive manner, i.e., it starts by setting the top
concept (&gt;) at 1, and proceeds level-wise on the element hierarchy until all leaves
have been processed. Intuitively, for each concept , we count the number N of
direct sub concepts (including ), e.g., in our running example &gt; has 5 direct
sub concepts. At this level, d log(N) e provides the number of bits necessary to
represent each sub concept. Then, these sub concepts (excluding ) are pre xed
by the binary identi er of and uniquely get a binary representation of a value
2 [1; N 1]. For instance, the concept A is pre xed with '1' (&gt;'s identi er and
is assigned the value 1 on 3 bits, i.e., '001', yielding the binary string '1001').</p>
      <p>Finally, a normalization step makes sure that all identi ers are encoded on the
same binary string length. This is performed as follows: once all concepts have
been encoded in the rst step, we get the size L of the longest encoding string
(i.e., 8 in our running example). Then all concept identi ers with an encoding
length lower than L are appended with '0' until their length reaches L. This
normalization step is represented with red '0' in Figure 1(a). The last column of
this gure provides the integer identi er corresponding to each concept.</p>
      <p>With this encoding scheme, it is clear that multiple inheritance poses a
problem since each ontology concept must have a single identi er. For instance, in our
running example, we can provide three di erent identi ers to E: one computed
from A1, another one from B and a last one from C. In the next section, we
propose a solution to this issue.</p>
      <p>LiteMat proposes an e cient query processing approach that takes advantage
of the semantic-aware encoding. In fact, whenever a query requires to reason over
the concept or property hierarchies, the system simply introduces new variables
that are ltered on the identi ers of our encoding scheme.</p>
      <p>Consider the following BGP of a SPARQL query: f?x type Eg. It is generally
rewritten into f?x type Eg UNION f?x type E1g UNION f?x type E2g. In
LiteMat, the system would identify that E has several sub concepts, replace
E with a new variable and add a FILTER clause that restricts the accepted
values of this new variable. This restriction corresponds to an interval of integer
values where the lower bound is the identi er of E and the upper bound is easily
computing (i.e., using 2 bit shift operations and an addition) from the identi er
of E. This computation requires an identi er metadata stating the index on the
bit string where the normalization has started. Considering that the identi ers
of E, E1 and E2 are resp. 156, 157 and 158 (to be explained in the next section),
our LiteMat rewriting would be: f?x type ?y. FILTER (?y 156 &amp;&amp; ?y&lt;157)g.
This rewriting also applies when sub property relationships are used and the
more complex the query, the more e cient our rewriting.</p>
    </sec>
    <sec id="sec-3">
      <title>Multiple inheritance support in LiteMat</title>
      <sec id="sec-3-1">
        <title>Encoding scheme</title>
        <p>Our support of multiple inheritance is based on the notion of a representative.
A representative Cr is selected among the super concepts C1,..,Cn of a concept
SC. That is the integer identi er of SC will be computed following Litemat's
approach based on the Cr identi er. It is obvious that with this approach SC
loses any connection to its non-representative direct and indirect super concepts.
Hence it is necessary to keep track of these super concepts. In the following,
the remaining super classes of SC, i.e., fC1,..,Cngn Cr are considered as
nonrepresentative of SC.</p>
        <p>Let consider that the representative of the concept E is A1. Hence, the
nonrepresentatives of E are the concepts B and C. In Figure 1(b)(where each concept
has its identi er in subscript), a representative is pointed by a dashed arrow and
we can observe that the identi ers of concepts E, E1 and E2 (resp. 156, 157 and
158) are computed from A1's identi er (i.e., 152).</p>
        <p>To support an e cient query processing, we require a key/value data
structure, denoted nonRep. Intuitively, each non representative of the ontology is an
entry in that data structure and the value associated to a key corresponds to a
set containing all the sub concepts involved in a multiple inheritance with the
key concept as one of its super concept. In our running example, nonRep(B) =
fEg and nonRep(C) = fE,Fg.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Query processing and optimization</title>
        <p>The query processing presented in Section 2 is extended to produce complete and
correct result sets for ontologies involving multiple inheritance. The extension
coincides to the addition of disjunction in the generated FILTER clause of the
rewritten SPARQL queries. Like in the original rewriting approach of Section
2, the disjuncts correspond to interval descriptions for a given query variable.
An interval is computed using the nonRep data structure. Note that queries
involving representatives do not necessarily involve this form of query processing.
Due to space limitation, we present this processing and a simple optimization
through the following example.</p>
        <p>We consider the following BGP of a simple query, denoted Q, which involves
a multiple inheritance: f?x rdf:type Cg. The query rewriting denoted Q0
corresponds to: f ?x rdf:type ?y. lter((?y 176 &amp;&amp; ?y&lt;192) k (?y 156 &amp;&amp; ?y&lt;160)
k (?y 168 &amp;&amp; ?y&lt;176)) where the rst disjunct corresponds to the standard
interval de ned in Section 2 and last two are computed using the nonRep data
structures. That is, we search whether C is involved in a multiple inheritance
by checking its presence as a key in nonRep. If it is the case, it will return a set
of concepts and for each of these concepts a disjunct is added to the FILTER
clause over that variable. In our running example, nonRep(C) returns a set with
concepts E and F , resp. the identi ers 156 and 168. These values correspond to
the lower bounds of the intervals and upper bounds are computer as stated in
LiteMat.</p>
        <p>Based on the intervals present in this FILTER clause, a simple optimization
can be performed. It has the possible e ect of reducing the number of disjuncts
in a query rewriting. The optimized queries are Q00 : f?x 0 ?y. lter((?y 168 &amp;&amp;
?y&lt;192) k (?y 156 &amp;&amp; ?y&lt;160)), i.e., it contains one less disjunct.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>
        An encoding scheme, based on the semantic-aware approach of LiteMat, has been
extended to support multiple inheritance. We emphasized that it can be used
to e ciently to answer SPARQL queries. The overhead of the encoding scheme
is relatively low and as been used in use case involving static and streaming
data[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Olivier</given-names>
            <surname>Cure</surname>
          </string-name>
          , Hubert Naacke, Tendry Randriamalala, and Bernd Amann.
          <article-title>LiteMat: A scalable, cost-e cient inference encoding scheme for large RDF graphs</article-title>
          .
          <source>In 2015 IEEE International Conference on Big Data, Big Data</source>
          <year>2015</year>
          , pages
          <year>1823</year>
          {
          <year>1830</year>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Olivier</given-names>
            <surname>Cure</surname>
          </string-name>
          , Weiqin Xu,
          <string-name>
            <given-names>Hubert</given-names>
            <surname>Naacke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Philippe</given-names>
            <surname>Calvez</surname>
          </string-name>
          .
          <article-title>Extending LiteMat toward RDFS++</article-title>
          . In Extended Semantic Web Conference, LASCAR workshop,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Jeremy</given-names>
            <surname>Lhez</surname>
          </string-name>
          , Xiangnan Ren, Badre Belabbess, and
          <string-name>
            <given-names>Olivier</given-names>
            <surname>Cure</surname>
          </string-name>
          .
          <article-title>A compressed, inference-enabled encoding scheme for RDF stream processing</article-title>
          .
          <source>In The Semantic Web - 14th International Conference, ESWC 2017</source>
          , pages
          <fpage>79</fpage>
          {
          <fpage>93</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Xiangnan</given-names>
            <surname>Ren</surname>
          </string-name>
          , Olivier Cure, Hubert Naacke, Jeremy Lhez,
          <string-name>
            <given-names>and Li</given-names>
            <surname>Ke</surname>
          </string-name>
          .
          <article-title>Striderr: Massive and distributed RDF graph stream reasoning</article-title>
          .
          <source>In 2017 IEEE International Conference on Big Data, BigData</source>
          <year>2017</year>
          , pages
          <fpage>3358</fpage>
          {
          <fpage>3367</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>