<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Query Answering on Expressive Datalog Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mostafa Milani</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Cal</string-name>
          <email>andrea@dcs.bbk.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leopoldo Bertossi</string-name>
          <email>bertossig@scs.carleton.ca</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept of Computer Science Birkbeck, Univ. of London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Oxford-Man Institute of Quantitative Finance University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Computer Science Carleton University</institution>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The Datalog family of ontology languages [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which extends Datalog with
existential quanti cation, has been gaining importance in the area of Ontology
Querying due to its capability of capturing several prominent Semantic Web
languages as well as o ering e cient query answering services in many variants
relevant for applications. The core feature of Datalog languages are so-called
tuple-generating dependencies (TGDs), which are the main form of rules. Such
rules allow the inference of new atoms from an initial set (a database), which
is captured by the notion of chase procedure. For example, consider the TGD
r(X; Y ) ! 9Z s(X; Z) and a database D constituted by a single atom r(a; b).
A chase step will generate the atom s(a; ), where is a labelled null, that
is, a placeholder for an unknown value; notice that the constant b is lost in
this step as it doesn't appear in the new atom. Conjunctive query answering
under general TGDs is undecidable; languages of the Datalog family impose
therefore restrictions on the form of rules so as to guarantee decidability and good
computational properties. Guarded (extended by weakly-guarded ) Datalog were
the rst form of decidable Datalog languages, inspired by guarded logic and
characterised by the presence of a guard atom in each rule, that contains all
variables of that rule.
      </p>
      <p>
        The language called sticky Datalog [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] was introduced in order to capture
a \proper" notion of join in rules, that is, the occurrence of variables in two
distinct atoms of a rule in the absence of a guard for that rule. Weakly-sticky
Datalog is an extension of sticky Datalog that extends sticky Datalog by
also capturing the well-known class of weakly-acyclic TGDs.
      </p>
      <p>
        Let be the following set of rules, where some body variables are marked
(denoted by a hat sign, e.g. X^ ; see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) as the result of a procedure that identi es
positions (arguments of predicates; for instance, p[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is the position
corresponding to the rst argument of the predicate p) where, in the chase procedure, values
can appear that can eventually be lost in a subsequent chase step.
v(X) ! 9Y r(X; Y )
p(X^ ; Y^ ) ! 9Z p(Y; Z)
r(X; Y^ ); r(Y^ ; Z) ! p(X; Z):
p(X^ ; Y ); p(Y; Z) ! t(Y; Z)
A set of rules is sticky if there is no marked variable in a rule that appears
more than once in the body of the rule. Intuitively, stickiness can be de ned by
means of the following property: during a chase step according to a rule , each
value corresponding to a variable appearing more than once in is not lost in
the chase step, and it is also never lost in any subsequent step involving atoms
where it appears. is not sticky, as easily seen.
      </p>
      <p>
        Weak-acyclicity is de ned using the notion of rank of a position [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A position
has nite (resp., in nite) rank if the number of labelled nulls that can appear in
that position in the chase procedure is nite (in nite). In the Datalog program
above, F = fv[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]; r[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]; r[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]g is the set of positions with nite rank and
1 = fp[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]; p[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]g is the set of positions with in nite rank. A Datalog program
is weakly-acyclic if all positions of predicates have nite rank. The program
is not weakly-acyclic.
      </p>
      <p>
        A set of rules is weakly-sticky if every marked variable that is repeated in
the body of a rule appears at least once in a position with nite rank. This
generalizes stickiness with acyclicity because the stickiness is only applied on
values that appear in the positions with in nite rank. Here, is weakly-sticky.
Speci cally in the second rule, the repeated variable Y appears in the positions
r[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and r[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] in F . In the last rule, Y is a repeated variable but not marked.
      </p>
      <p>So far, no non-trivial deterministic algorithm for CQ answering has been
devised for weakly-sticky Datalog . In this paper we set the basis for the
implementation of conjunctive query (CQ) answering by the following contributions.
1. We propose a bottom-up technique, where we ground the rules of
according to the database D in a suitable way. The expansion instance terminates
at a point dependent on the size of the query; the query can be then evaluated
directly on the expanded instance.
2. We propose a hybrid approach to CQ answering as follows. First, we
transform all positions in F into positions of rank 0 (that is, positions where only
constants can appear); then certain variables in such positions are grounded,
that is, the variables are replaced by selected constants of the initial
instance. The obtained program is a sticky program, for which a well-known
query rewriting technique for CQ answering can be applied.
2</p>
    </sec>
    <sec id="sec-2">
      <title>A Grounding Approach</title>
      <p>
        A rst deterministic algorithm for query answering on a weakly-sticky Datalog
program and a database is obtained by grounding the rules of in a
bottomup fashion. The procedure is inspired by the version of the chase procedure
in [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. We illustrate this technique by means of an example. Consider a
database D = fp(a; b); v(b)g, a Boolean conjunctive query (BCQ) q = fu(X)g,
and a set of WS rules as follows: 1 : p(X; Y ) ! 9Z p(Y; Z) and
2 : v(X); p(X; Y ); p(Y; Z) ! u(Y ).
      </p>
      <p>
        We start from D and iteratively generate ground rules by mapping via
homomorphism the body of the rules in into D or the head of the rules in 0
(the current set of ground rules). The basic procedure is as follows: we iteratively
add a ground rule to 0 if its head atom is not homomorphic to the head of a
rule already in 0, until no new rule can be added. This \cautious" procedure,
similar to the chase procedure in [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], guarantees its termination. In our
example, the algorithm stops after adding only one ground rule p(a; b) ! p(b; 1) to
0, where 1 is a labelled null.
      </p>
      <p>In order to complete our grounding, we resume the above basic procedure
Mq times, where Mq is the number of existential variables in q. Before each
resumption, we freeze the labelled nulls, which after having been frozen are
considered as constants. In our example, we have only one more resumption,
which adds the rules, p(b; 1) ! p( 1; 2) and v(b); p(b; 1); p( 1; 2) ! u( 1).
Resumptions are needed, intuitively, to capture applications of rules (in a chase
procedure) where a join variable, appearing in two or more distinct atoms, are
mapped to a labelled null.</p>
      <p>Now, the number of resumptions depends on the query, which makes our
grounding dependent on the query; however, for practical purposes, we could
ground with N resumptions so as to be able to answer queries with up to N
existential variables, and if a query with more than N existential variables is to
be answered, we can incrementally retake the already-computed grounding and
add the required number of resumptions.
3</p>
    </sec>
    <sec id="sec-3">
      <title>A Hybrid Approach</title>
      <p>
        In this section, we propose an algorithm based on a hybrid approach that
combines the grounding of speci c variables in rules with the query rewriting
algorithm from [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Given a set of WS rules and a database D, our algorithm
transforms according to D into a sticky set of rules 0. Sticky Datalog
enjoys rst-order rewritability, that is, each CQ can be rewritten into a rst-order
query and directly answered on the database [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], providing the correct answer.
      </p>
      <p>As an example, consider D = fv(a); s(a; b)g and a set of WS rules that
includes 1 and 2 from the example in Section 2 as well as the following rules:
3 : v(X) ! 9Y r(X; Y ):
4 : r(X; Y ); s(X; Z) ! t(Y; Z):
5 : r(X; Y ); t(Y; Z) ! p(X; Z):</p>
      <p>Let us call weak rules the rules in which some repeated marked body variables
appear at least once in a position with nite rank; we call such variables weak
variables. We aim at grounding the weak variables only, thus turning the WS
program into a sticky program.</p>
      <p>In our example, 4 and 5 are weak rules with X and Y as their weak variables
respectively. 4 is partially grounded as a sticky rule r(a; Y ); s(a; Z) ! r(Y; Z)
since X can only take the value a from D. To (partially) ground 5 we would
need Y to be replaced with a null value invented by 3. Notice that the variables
we are to ground can only be in positions with nite rank, which guarantees the
niteness of the partial grounding. Our algorithm consists of two phases.</p>
      <p>
        (Phase 1). We remove the existential variables that are in the positions with
nite rank with a method inspired by [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In the context of our example, we rst
Skolemize 3, that is the only rule with an existential variable in a position with
nite rank. The result is v(X) ! r(X; f (X)). Next, we replace r(X; f (X))
with an expanded predicate r0(X; f; X) with the higher arity that results into
a rule v(X) ! r0(X; f; X) in which f is a fresh constant that represents the
replaced function. Then, we iteratively propagate the expanded predicate in the
body and head of the rules, possibly expanding other predicates e.g. t, obtaining
the new rules: 30 : v(X) ! r0(X; f; X), 40 : r0(X; Y; Y 0); s(X; Z) ! t0(Y; Y 0; Z),
and 50 : r0(X; Y; Y 0); t0(Y; Y 0; Z) ! p(X; Z). The result is a set of rules 0;1 =
f 1; 2; 30; 40; 50g with positions of rank either zero or in nite.
      </p>
      <p>(Phase 2). Now, we proceed with the grounding step where we replace the
weak variables (X in 40 and Y; Y 0 in 50) with values from D and the new
constant f. The result is a set of sticky rules 0 that includes 1, 2 and 30
as well as the following rules, 400 : r0(a; Y; Y 0); s(a; Z) ! t0(Y; Y 0; Z), and 500 :
r0(X; f; a); t0(f; a; Z) ! p(X; Z).</p>
      <p>
        To compute the answers to the query q, 0 and q are rewritten into a
rstorder query using the rewriting method for sticky rules [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The rst-order query
is then evaluated on D.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>Weakly-sticky Datalog is an expressive ontology language with good
computational properties and capable of capturing the most prominent Semantic Web
languages. We proposed two deterministic algorithms for answering conjunctive
queries on weakly-sticky Datalog . This, we believe, sets the basis for practical
query answering algorithms for real-world scenarios. We plan to continue our
work by running experiments on large data sets. We also intend to re ne the
hybrid algorithm by devising solutions to reduce the number of constants of the
database used to ground the weak variables; this would improve the e ciency
of our approach.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Cali</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Gottlob, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pieris</surname>
          </string-name>
          .
          <article-title>Towards more expressive ontology languages: The query answering problem</article-title>
          .
          <source>Arti cial Intelligence</source>
          ,
          <year>2012</year>
          ,
          <volume>193</volume>
          :
          <fpage>87</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>R.</given-names>
            <surname>Fagin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Kolaitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Popa</surname>
          </string-name>
          .
          <article-title>Data Exchange: Semantics and Query Answering</article-title>
          . TCS,
          <year>2005</year>
          ,
          <volume>336</volume>
          :
          <fpage>89</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>G.</given-names>
            <surname>Gottlob</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Orsi, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pieris</surname>
          </string-name>
          .
          <article-title>Query Rewriting and Optimization for Ontological Databases</article-title>
          .
          <source>Proc. TODS</source>
          ,
          <year>2014</year>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>25</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Krotzsch, M. and
          <string-name>
            <surname>Rudolph</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Extending Decidable Existential Rules by Joining Acyclicity and Guardedness</article-title>
          .
          <source>Proc. IJCAI</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>963</fpage>
          -
          <lpage>968</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>N.</given-names>
            <surname>Leone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Manna</surname>
          </string-name>
          , G. Terracina, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Veltri. E ciently Computable</surname>
          </string-name>
          <article-title>Datalog Programs</article-title>
          .
          <source>Proc. KR</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Milani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bertossi</surname>
          </string-name>
          .
          <article-title>Tractable Query Answering and Optimization for Extensions of Weakly-Sticky Datalog+-</article-title>
          .
          <source>Proc. AMW</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>