=Paper= {{Paper |id=Vol-1644/paper18 |storemode=property |title=Query Answering on Expressive Datalog+/- Ontologies |pdfUrl=https://ceur-ws.org/Vol-1644/paper18.pdf |volume=Vol-1644 |authors=Leopoldo Bertossi,Andrea Calì,Mostafa Milani |dblpUrl=https://dblp.org/rec/conf/amw/BertossiCM16 }} ==Query Answering on Expressive Datalog+/- Ontologies== https://ceur-ws.org/Vol-1644/paper18.pdf
     Query Answering on Expressive Datalog±
                   Ontologies

            Mostafa Milani1 , Andrea Calı̀2,3 , and Leopoldo Bertossi1
        1                                       2
        School of Computer Science                Dept of Computer Science
        Carleton University, Canada           Birkbeck, Univ. of London, UK
                   3
                       Oxford-Man Institute of Quantitative Finance
                               University of Oxford, UK

                         {mmilani, bertossi}@scs.carleton.ca
                                andrea@dcs.bbk.ac.uk




1    Introduction

The Datalog± family of ontology languages [1], which extends Datalog with
existential quantification, has been gaining importance in the area of Ontology
Querying due to its capability of capturing several prominent Semantic Web
languages as well as offering efficient query answering services in many variants
relevant for applications. The core feature of Datalog± languages are so-called
tuple-generating dependencies (TGDs), which are the main form of rules. Such
rules allow the inference of new atoms from an initial set (a database), which
is captured by the notion of chase procedure. For example, consider the TGD
r(X, Y ) → ∃Z s(X, Z) and a database D constituted by a single atom r(a, b).
A chase step will generate the atom s(a, ζ), where ζ is a labelled null, that
is, a placeholder for an unknown value; notice that the constant b is lost in
this step as it doesn’t appear in the new atom. Conjunctive query answering
under general TGDs is undecidable; languages of the Datalog± family impose
therefore restrictions on the form of rules so as to guarantee decidability and good
computational properties. Guarded (extended by weakly-guarded ) Datalog± were
the first form of decidable Datalog± languages, inspired by guarded logic and
characterised by the presence of a guard atom in each rule, that contains all
variables of that rule.
    The language called sticky Datalog± [1] was introduced in order to capture
a “proper” notion of join in rules, that is, the occurrence of variables in two
distinct atoms of a rule in the absence of a guard for that rule. Weakly-sticky
Datalog± is an extension of sticky Datalog± that extends sticky Datalog± by
also capturing the well-known class of weakly-acyclic TGDs.
    Let Σ be the following set of rules, where some body variables are marked
(denoted by a hat sign, e.g. X̂; see [1]) as the result of a procedure that identifies
positions (arguments of predicates; for instance, p[1] is the position correspond-
ing to the first argument of the predicate p) where, in the chase procedure, values
can appear that can eventually be lost in a subsequent chase step.

           v(X) → ∃Y r(X, Y )              r(X, Ŷ ), r(Ŷ , Z) → p(X, Z).
          p(X̂, Ŷ ) → ∃Z p(Y, Z)           p(X̂, Y ), p(Y, Z) → t(Y, Z)
A set of rules is sticky if there is no marked variable in a rule that appears
more than once in the body of the rule. Intuitively, stickiness can be defined by
means of the following property: during a chase step according to a rule σ, each
value corresponding to a variable appearing more than once in σ is not lost in
the chase step, and it is also never lost in any subsequent step involving atoms
where it appears. Σ is not sticky, as easily seen.
    Weak-acyclicity is defined using the notion of rank of a position [2]. A position
has finite (resp., infinite) rank if the number of labelled nulls that can appear in
that position in the chase procedure is finite (infinite). In the Datalog± program
Σ above, ΠF = {v[1], r[1], r[2]} is the set of positions with finite rank and
Π∞ = {p[1], p[2]} is the set of positions with infinite rank. A Datalog± program
is weakly-acyclic if all positions of predicates have finite rank. The program Σ
is not weakly-acyclic.
    A set of rules is weakly-sticky if every marked variable that is repeated in
the body of a rule appears at least once in a position with finite rank. This
generalizes stickiness with acyclicity because the stickiness is only applied on
values that appear in the positions with infinite rank. Here, Σ is weakly-sticky.
Specifically in the second rule, the repeated variable Y appears in the positions
r[1] and r[2] in ΠF . In the last rule, Y is a repeated variable but not marked.
    So far, no non-trivial deterministic algorithm for CQ answering has been
devised for weakly-sticky Datalog± . In this paper we set the basis for the imple-
mentation of conjunctive query (CQ) answering by the following contributions.
 1. We propose a bottom-up technique, where we ground the rules of Σ accord-
    ing to the database D in a suitable way. The expansion instance terminates
    at a point dependent on the size of the query; the query can be then evaluated
    directly on the expanded instance.
 2. We propose a hybrid approach to CQ answering as follows. First, we trans-
    form all positions in ΠF into positions of rank 0 (that is, positions where only
    constants can appear); then certain variables in such positions are grounded,
    that is, the variables are replaced by selected constants of the initial in-
    stance. The obtained program is a sticky program, for which a well-known
    query rewriting technique for CQ answering can be applied.

2   A Grounding Approach
A first deterministic algorithm for query answering on a weakly-sticky Datalog±
program Σ and a database is obtained by grounding the rules of Σ in a bottom-
up fashion. The procedure is inspired by the version of the chase procedure
in [5, 6]. We illustrate this technique by means of an example. Consider a
database D = {p(a, b), v(b)}, a Boolean conjunctive query (BCQ) q = {u(X)},
and a set of WS rules Σ as follows: σ1 : p(X, Y ) → ∃Z p(Y, Z) and
σ2 : v(X), p(X, Y ), p(Y, Z) → u(Y ).
     We start from D and iteratively generate ground rules by mapping via ho-
momorphism the body of the rules in Σ into D or the head of the rules in Σ 0
(the current set of ground rules). The basic procedure is as follows: we iteratively
add a ground rule to Σ 0 if its head atom is not homomorphic to the head of a
rule already in Σ 0 , until no new rule can be added. This “cautious” procedure,
similar to the chase procedure in [5, 6], guarantees its termination. In our exam-
ple, the algorithm stops after adding only one ground rule p(a, b) → p(b, ζ1 ) to
Σ 0 , where ζ1 is a labelled null.
     In order to complete our grounding, we resume the above basic procedure
Mq times, where Mq is the number of existential variables in q. Before each
resumption, we freeze the labelled nulls, which after having been frozen are
considered as constants. In our example, we have only one more resumption,
which adds the rules, p(b, ζ1 ) → p(ζ1 , ζ2 ) and v(b), p(b, ζ1 ), p(ζ1 , ζ2 ) → u(ζ1 ).
Resumptions are needed, intuitively, to capture applications of rules (in a chase
procedure) where a join variable, appearing in two or more distinct atoms, are
mapped to a labelled null.
     Now, the number of resumptions depends on the query, which makes our
grounding dependent on the query; however, for practical purposes, we could
ground with N resumptions so as to be able to answer queries with up to N
existential variables, and if a query with more than N existential variables is to
be answered, we can incrementally retake the already-computed grounding and
add the required number of resumptions.

3    A Hybrid Approach

In this section, we propose an algorithm based on a hybrid approach that com-
bines the grounding of specific variables in rules with the query rewriting al-
gorithm from [3]. Given a set of WS rules Σ and a database D, our algorithm
transforms Σ according to D into a sticky set of rules Σ 0 . Sticky Datalog± en-
joys first-order rewritability, that is, each CQ can be rewritten into a first-order
query and directly answered on the database [3], providing the correct answer.
    As an example, consider D = {v(a), s(a, b)} and a set of WS rules Σ that
includes σ1 and σ2 from the example in Section 2 as well as the following rules:

                          σ3 : v(X) → ∃Y r(X, Y ).
                          σ4 : r(X, Y ), s(X, Z) → t(Y, Z).
                          σ5 : r(X, Y ), t(Y, Z) → p(X, Z).

    Let us call weak rules the rules in which some repeated marked body variables
appear at least once in a position with finite rank; we call such variables weak
variables. We aim at grounding the weak variables only, thus turning the WS
program into a sticky program.
    In our example, σ4 and σ5 are weak rules with X and Y as their weak variables
respectively. σ4 is partially grounded as a sticky rule r(a, Y ), s(a, Z) → r(Y, Z)
since X can only take the value a from D. To (partially) ground σ5 we would
need Y to be replaced with a null value invented by σ3 . Notice that the variables
we are to ground can only be in positions with finite rank, which guarantees the
finiteness of the partial grounding. Our algorithm consists of two phases.
    (Phase 1). We remove the existential variables that are in the positions with
finite rank with a method inspired by [4]. In the context of our example, we first
Skolemize σ3 , that is the only rule with an existential variable in a position with
finite rank. The result is v(X) → r(X, f (X)). Next, we replace r(X, f (X))
with an expanded predicate r0 (X, f, X) with the higher arity that results into
a rule v(X) → r0 (X, f, X) in which f is a fresh constant that represents the
replaced function. Then, we iteratively propagate the expanded predicate in the
body and head of the rules, possibly expanding other predicates e.g. t, obtaining
the new rules: σ30 : v(X) → r0 (X, f, X), σ40 : r0 (X, Y, Y 0 ), s(X, Z) → t0 (Y, Y 0 , Z),
and σ50 : r0 (X, Y, Y 0 ), t0 (Y, Y 0 , Z) → p(X, Z). The result is a set of rules Σ0,∞ =
{σ1 , σ2 , σ30 , σ40 , σ50 } with positions of rank either zero or infinite.
    (Phase 2). Now, we proceed with the grounding step where we replace the
weak variables (X in σ40 and Y, Y 0 in σ50 ) with values from D and the new
constant f. The result is a set of sticky rules Σ 0 that includes σ1 , σ2 and σ30
as well as the following rules, σ400 : r0 (a, Y, Y 0 ), s(a, Z) → t0 (Y, Y 0 , Z), and σ500 :
r0 (X, f, a), t0 (f, a, Z) → p(X, Z).
    To compute the answers to the query q, Σ 0 and q are rewritten into a first-
order query using the rewriting method for sticky rules [3]. The first-order query
is then evaluated on D.

4    Conclusion
Weakly-sticky Datalog± is an expressive ontology language with good compu-
tational properties and capable of capturing the most prominent Semantic Web
languages. We proposed two deterministic algorithms for answering conjunctive
queries on weakly-sticky Datalog± . This, we believe, sets the basis for practical
query answering algorithms for real-world scenarios. We plan to continue our
work by running experiments on large data sets. We also intend to refine the
hybrid algorithm by devising solutions to reduce the number of constants of the
database used to ground the weak variables; this would improve the efficiency
of our approach.

References
 1. A. Cali, G. Gottlob, and A. Pieris. Towards more expressive ontology languages:
    The query answering problem. Artificial Intelligence, 2012, 193:87-128.
 2. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data Exchange: Semantics and
    Query Answering. TCS, 2005, 336:89-124.
 3. G. Gottlob, G. Orsi, and A. Pieris. Query Rewriting and Optimization for Onto-
    logical Databases. Proc. TODS, 2014, 39(3):25.
 4. Krötzsch, M. and Rudolph, S. Extending Decidable Existential Rules by Joining
    Acyclicity and Guardedness. Proc. IJCAI, 2011, pp. 963-968.
 5. N. Leone, M. Manna, G. Terracina, and P. Veltri. Efficiently Computable Datalog±
    Programs. Proc. KR, 2012, pp. 13-23.
 6. M. Milani, L. Bertossi. Tractable Query Answering and Optimization for Exten-
    sions of Weakly-Sticky Datalog+-. Proc. AMW, 2015.