=Paper= {{Paper |id=Vol-1193/paper_22 |storemode=property |title=Acyclic Query Answering under Guarded Disjunctive Existential Rules and Consequences to DLs |pdfUrl=https://ceur-ws.org/Vol-1193/paper_22.pdf |volume=Vol-1193 |dblpUrl=https://dblp.org/rec/conf/dlog/BourhisMP14 }} ==Acyclic Query Answering under Guarded Disjunctive Existential Rules and Consequences to DLs== https://ceur-ws.org/Vol-1193/paper_22.pdf
    Acyclic Query Answering Under Guarded Disjunctive
         Existential Rules and Consequences to DLs

                  Pierre Bourhis1 , Michael Morak2 , and Andreas Pieris2
                      1
                       CNRS LIFL Université Lille1/INRIA Lille, France.
                          pierre.bourhis@univ-lille1.fr
          2
            University of Oxford, Department of Computer Science, United Kingdom.
                {michael.morak,andreas.pieris}@cs.ox.ac.uk


        Abstract. The complete picture of the complexity of conjunctive query answer-
        ing under guarded disjunctive existential rules has been recently settled. However,
        in the case of (unions of) acyclic conjunctive queries ((U)ACQs) there are some
        fundamental questions which are still open. It is the precise aim of the present
        paper to close those questions, and to understand whether the acyclicity of the
        query has a positive impact on the complexity of query answering. Our main re-
        sult states that acyclic conjunctive query answering under a fixed set of guarded
        disjunctive existential rules is E XP T IME-hard. This result together with an E XP -
        T IME upper bound obtained by exploiting classical results on guarded first-order
        logic, gives us a complete picture of the complexity of our problem. We also
        show that our results can be used as a generic tool for establishing results on
        (U)ACQ answering under several central DLs. In fact, restricting the query lan-
        guage to UACQs improves the complexity to E XP T IME-complete for any DL
        between DL-Litebool and ALCHI; this holds even for fixed TBoxes.


1     Introduction
Rule-based languages have a prominent presence in the areas of artificial intelligence
and databases. A noticeable formalism, originally intended for expressing complex
queries over relational databases, is Datalog, i.e., function-free first-order Horn logic.
As Datalog itself is not able to infer the existence of new objects that do not occur in
the extensional database, strong interest in enhancing Datalog with existential quantifi-
cation in rule-heads emerged in recent years; see, e.g., [5, 6, 12, 23, 26]. The obtained
rules are known under a variety of names such as existential rules, tuple-generating
dependencies (TGDs), and Datalog± rules. However, these languages are still not ex-
pressive enough for non-deterministic reasoning, e.g., to express simple statements like
“a grandchild is a male or a female”, which can be logically expressed using the rules
    ∀X∀Y (parentOf (X, Y ) ∧ isfather (Y ) → ∃Z(grparentOf (X, Z) ∧ human(Z)))
                         ∀X(human(X) → male(X) ∨ female(X)).
Extending TGDs to allow such a disjunction in the head yields the formalism of disjunc-
tive TGDs (DTGDs) [17]. Such an extension of plain Datalog (a.k.a. full TGDs), called
disjunctive Datalog, has been studied in [18]. However, the addition of existential quan-
tifiers easily leads to undecidability of the main reasoning tasks, including conjunctive
query (CQ) answering [8].
     Several concrete languages which ensure decidability have thus been proposed; see,
e.g., [5, 10, 12, 13, 19, 23, 24]. In this paper, we will focus on a concept called “guard-
edness” [10], which requires that all universally quantified variables in a rule appear
together in an atom in the left-hand side of the implication. Guardedness is a well-
accepted paradigm, giving rise to robust languages that capture important description
logics such as DL-Lite [14], EL [4] and even ALCI. Recently, the complexity picture
for query answering under guarded DTGDs has been investigated and completed for
(unions of) arbitrary CQs [9]. Also for simple atomic queries the complexity has been
investigated (see, e.g., [1, 21]). As acyclic CQs (ACQs) in the literature usually lead to a
reduction in the complexity of the query answering problem, the objective of the present
paper is to investigate whether this positive impact can also be observed in our setting.
To this end, we aim to pinpoint the exact complexity of answering (unions of) acyclic
CQs ((U)ACQs), in order to compare it to existing results for answering (unions of)
arbitrary CQs. More precisely, we concentrate on the following fundamental question:
What is the exact complexity of answering (U)ACQs under guarded DTGDs, and how
is it affected if we consider a signature of bounded arity, or a fixed set of dependencies?
     After establishing the complexity of (U)ACQ answering, we then investigate the
consequences to DLs. We show that our results can be used as a generic tool for es-
tablishing results on (U)ACQ answering under several central DLs that can be found
in the literature such as the key DL ALCHI, that is, the DL ALC extended with role
hierarchies and inverse roles. Our contributions can be summarized as follows:

 1. We show that UACQ answering under guarded DTGDs with predicates of bounded
    arity is in E XP T IME. This upper bound is established by exploiting results from
    guarded first-order logic [22];
 2. We show that ACQ answering under fixed sets of guarded DTGDs is E XP T IME-
    hard. This is a strong and general lower bound obtained by simulating the behavior
    of an alternating linear space Turing machine.
 3. Finally, we show that our results for fixed arity and fixed sets extend to a number
    of expressive DLs, namely ALCHI, ELUHI, DL-LiteH            bool and also DL-Litebool .
    For each of these DLs, and any others between them, answering UACQs, even over
    fixed TBoxes, is E XP T IME-complete. The upper bounds follow from the fact that
    guarded DTGDs are powerful enough to capture each of these DLs, and they thus
    inherit the positive effects of restricting the query language to acyclic queries. The
    lower bounds follow from the construction of our proof for guarded DTGDs.


2   Preliminaries

General. Let C, N and V be pairwise disjoint infinite countable sets of constants, (la-
beled) nulls and variables, respectively. We denote by X sequences (or sets) of variables
X1 , . . . , Xk . Let [n] = {1, . . . , n}, for n > 1. A term is a constant, null or variable.
An atom has the form p(t1 , . . . , tn ), where p is an n-ary predicate, and t1 , . . . , tn are
terms. For an atom a, dom(a) and var (a) are the set of its terms and the set of its vari-
ables, respectively; those notations extend to sets of atoms. Usually conjunctions and
disjunctions of atoms are treated as sets of atoms. An instance I is a (possibly infinite)
set of atoms of the form p(t), where t is a tuple of constants and nulls. A database D
is a finite instance with only∧constants. Whenever an instance I is treated as a logical
formula, is the formula ∃X ( a∈I I), where X contains a variable for each null in I.
    Conjunctive Queries. A conjunctive query (CQ) q is a sentence ∃Xφ(X), where
φ is a conjunction of atoms. If q does not have free variables, then it is called Boolean.
For brevity, we consider only Boolean CQs; however, all the results of the paper can
be easily extended to non-Boolean CQs. A union of conjunctive queries (UCQ) is a
disjunction of a finite number of CQs. By abuse of notation, sometimes we consider
a UCQ as set of CQs. A CQ q = ∃Xφ(X) has a positive answer over an instance I,
written I |= q, if there exists a homomorphism h such that h(φ(X)) ⊆ I. The answer
to a UCQ Q over I is positive, written I |= Q, if there exists q ∈ Q such that I |= q.
    Several subclasses of conjunctive queries have been considered in the literature,
with the aim of reducing the complexity of the CQ evaluation problem. This paper
focuses on the class of acyclic CQs (ACQs); see, e.g., [16]. The acyclicity of a CQ
is defined via the acyclicity of its hypergraph. Informally, a hypergraph is acyclic if it
can be reduced to the empty hypergraph by iteratively eliminating some non-maximal
hyperedge, or some vertex contained in at most one hyperedge; this procedure is known
as the GYO algorithm. The formal definition is as follows. Consider a hypergraph H.
The GYO-reduct of H, denoted GYO(H), is obtained by applying exhaustively the
following two steps: (1) eliminate vertices that are contained in at most one hyperedge;
and (2) eliminate hyperedges that are empty or contained in other hyperedges. We say
that H is acyclic if GYO(H) = ⟨∅, ∅⟩. The hypergraph of a CQ q, denote H(q), is
a hypergraph ⟨V, H⟩, where V = dom(q), and, for each atom a in q, there exists a
hyperedge h ∈ H such that h = dom(a). We say that q is acyclic iff H(q) is acyclic.
    Disjunctive Tuple-Generating Dependencies. A disjunctive      ∨ntuple-generating de-
pendency (DTGD) σ is a first-order formula ∀X (φ(X) → i=1 ∃Yi ψi (X, Yi )),
where n > 1, X ∪ Y1 ∪ . . . ∪ Yn ⊂ V, and φ, ψ1 , . . . , ψn are∨conjunctions     of atoms.
                                                                  n
The formula φ is called the body of σ, denoted body(σ), while i=1 ψi is the head of σ,
denoted head (σ). If n = 1, then σ is called tuple-generating dependency (TGD). The
schema of a set Σ of DTGDs, denoted sch(Σ), is the set of all predicates occurring in
Σ. For brevity, we will omit the universal quantifiers in front of DTGDs, and use the
comma (instead of ∧) for conjoining atoms. An instance I satisfies σ, written I |= σ, if
the following holds: Whenever there exists a homomorphism h such that h(φ(X)) ⊆ I,
then there exists i ∈ [n] and h′ ⊇ h such that h′ (ψi (X, Yi )) ⊆ I; I satisfies a set Σ of
DTGDs, denoted I |= Σ, if I |= σ, for each σ ∈ ∧      Σ. Whenever a set Σ of DTGDs is
treated as a logical formula, in fact is the formula ( σ∈Σ σ).
    As said, (U)CQ answering under DTGDs (which is defined below) is undecidable;
in fact, is undecidable already for TGDs [8], even when the set of TGDs is fixed and the
query is acyclic [11], or even when the set of TGDs is singleton [5]. A well-known prop-
erty which guarantees decidability of query answering is guardedness [11]. A DTGD
σ is guarded if there exists an atom a ∈ body(σ), called guard, which contains all the
variables occurring in body(σ), i.e., var (a) = var (body(σ)).
    Query Answering. The models of a database D and a set Σ of DTGDs, denoted
mods(D, Σ), is the set of instances {I | I ⊇ D and I |= Σ}. The answer to a CQ q
w.r.t. D and Σ is positive, denoted D ∪ Σ |= q, if I |= q, for each I ∈ mods(D, Σ).
The answer to a UCQ w.r.t. D and Σ is defined analogously. The problem tackled in
this work is defined as follows: Given a CQ q, a database D, and a set Σ of DTGDs,
decide whether D ∪ Σ |= q. If the input query is an ACQ, then the above problem
is called ACQ answering. Analogously, the problem UACQ answering for unions of
ACQs is defined. The data complexity of the above problems is calculated taking only
the database as input. For the combined complexity, the query and set of DTGDs count
as part of the input as well.
    Disjunctive Chase. We employ    ∨n the disjunctive chase [17]. Consider an instance I,
and a DTGD σ : φ(X) → i=1 ∃Y ψi (X, Y). We say that σ is applicable to I if
there exists a homomorphism h such that h(φ(X)) ⊆ I, and the result of applying σ
to I with h is the set {I1 , . . . , In }, where Ii = I ∪ h′ (ψi (X, Y)), for each i ∈ [n],
and h′ ⊇ h is such that h′ (Y ) is a “fresh” null not occurring in I, for each Y ∈ Y.
For such an application of a DTGD, which defines a single DTGD chase step, we write
I⟨σ, h⟩{I1 , . . . , In }. A disjunctive chase tree of a database D and a set Σ of DTGDs
is a (possibly infinite) tree such that the root is D, and for every node I, assuming that
{I1 , . . . , In } are the children of I, there exists σ ∈ Σ and a homomorphism h such that
I⟨σ, h⟩{I1 , . . . , In }. The disjunctive chase algorithm for D and Σ consists of an ex-
haustive application of DTGD chase steps in a fair fashion, which leads to a disjunctive
chase tree T of D and Σ; we denote by chase(D, Σ) the set {I | I is a leaf of T }. It is
well-known that, given a UCQ Q, D ∪ Σ |= Q iff I |= Q, for each I ∈ chase(D, Σ).
    The Guarded Fragment of First-Order Logic. The guarded fragment (GFO) has
been introduced in [2]. The set of GFO formulas over a schema R is the smallest set (1)
containing all atomic R-formulas and equalities; (2) closed under the logical connec-
tives ¬, ∧, ∨, →; and (3) if a is an R-atom containing all the variables of X ∪ Y, and
φ is a GFO formula with free variables contained in (X ∪ Y), then ∀X(a → φ) and
∃X(a∧φ) are GFO formulas. It is known that the problem of deciding whether an GFO
sentence is satisfiable is 2E XP T IME-complete, and E XP T IME-complete for predicates
of bounded arity [22].


3   Answering (Unions of) Acyclic Queries

The problem of answering (unions of) arbitrary conjunctive queries has recently been
investigated in [9]. It turns out that even for fixed sets of very simple forms of DTGDs,
the problem is already 2E XP T IME-hard. Restricting the query language to (unions of)
acyclic queries has often proven to be a way to reduce the complexity. Our goal in this
section is thus to investigate if this also holds true for guarded DTGDs, and to study
how exactly the complexity of (U)CQ answering under our guarded DTGDs is affected
if we focus on acyclic queries. We will begin this section by summarizing our results,
and then proceed with our new complexity bounds.
    Table 1 summarizes the complexity results for answering (U)ACQs under guarded
DTGDs. Compared with the results for (U)CQs, as presented in [9], two observa-
tions can be made: Firstly, the combined and the data complexity do not change if
we restrict ourselves to acyclic queries. Secondly, however, the complexity decreases
from 2E XP T IME-completeness to E XP T IME-completeness, in the case of predicates of
bounded arity, and also if we consider a fixed set of DTGDs. The impact of restricting
            Combined Complexity Bounded Arity Fixed Rules Data Complexity
               2E XP T IME       E XP T IME   E XP T IME       coNP
                UB: [9, Thm. 1]      UB: Thm. 1    UB: Thm. 1    UB: [7, Thm. 19]
               LB: [11, Thm. 6.1]    LB: Thm. 2    LB: Thm. 2   LB: [15, Thm. 4.5]

         Table 1: Complexity of answering (U)ACQs under guarded DTGDs.

the query language to acyclic queries is this only noticeable in these two cases. The
novel results of this section follow:

 1. UACQ answering under guarded DTGDs is in E XP T IME if we focus on predicates
    of bounded arity (Theorem 1) – this is shown by a reduction to satisfiability of
    a slight extension of GFO, and then exploit the fact that this problem is in E XP -
    T IME in case of predicates of bounded arity [22]; and
 2. ACQ answering under fixed sets of guarded DTGDs is E XP T IME-hard (Theorem 2)
    – by simulating the behavior of an alternating linear space Turing machine.

    Existing results from the literature, provide us with the missing optimal upper and
lower bounds, which complete the entire picture of the complexity of (U)ACQ answer-
ing. Firstly, the missing upper bounds are immediately inherited from results presented
in [9], in particular the combined and data complexity of guarded DTGDs, which are in
2E XP T IME and coNP, respectively. Secondly, the coNP-hardness of ACQ answering
under guarded DTGDs follows from [15, Theorem 4.5] – ∧      the CQ employed in the con-
struction of the proof of this result is ∃X∃Y1 ∃Y2 ∃Z1 ∃Z2 ( i∈{1,2} (pi (X, Yi ) ∧ s(Yi ) ∧
ri (X, Zi ) ∧ t(Zi ))), which is clearly acyclic.
    From the results of this section, we observe that the acyclicity of the query does not
make our problem easier w.r.t. the combined and data complexity. However, in the cases
of bounded arity and fixed sets of DTGDs, the complexity decreases from 2E XP T IME to
E XP T IME. Let us now proceed with the formal proofs of our results.


3.1 Complexity Results

We start this section by showing that UACQ answering under guarded DTGDs is in E X -
P T IME in case of predicates of bounded arity. We can assume that each guarded DTGD
                                     ∨
σ is of the form α(X) ∧ ϕ(X) → 16i6m ∃Yψi (X, Y), where ϕ and ψi are conjunc-
tions of atoms, and α is the atom that forms the guard of σ. Each
                                                              ∨ such formula can now
in fact be rewritten as the formula ∀X(α(X) → (ϕ(X) → 16i6m ∃Yψi (X, Y))). It
can be verified that, if each ψi is a single atom, then the above formula falls into GFO.
However, if ψi is a conjunction of atoms, then the formula is “almost” guarded since
the existentially quantified variables are not necessarily guarded. Interestingly, the sat-
isfiability algorithm proposed in [22] is able to treat formulas as the one above, even if
the existentially quantified variables are not guarded, without increasing the complex-
ity – this is explicitly stated in a remark on page 8 of [22]. By exploiting the above
observation, it is now easy to reduce our problem to the satisfiability problem of sen-
tences which are “almost” guarded, which is E XP T IME-complete in case of predicates
of bounded arity [22]. Consider a database D, a set Σ of guarded DTGDs, and a UACQ
Q. The database D itself contains only ground atoms and is thus trivially guarded. The
set Σ can be rewritten as a sentence ΦΣ which is “almost” guarded, as explained above.
Finally, the query Q, although is not guarded, it can be rewritten as a GFO sentence ΦQ
due to acyclicity [20]. Thus, D∪Σ |= Q iff the “almost” GFO sentence (D∧ΦΣ ∧¬ΦQ )
is not satisfiable, and we obtain the following:
Theorem 1. UACQ answering under guarded DTGDs is in E XP T IME in case of pred-
icates of bounded arity.
    We continue this section by showing that ACQ answering under fixed sets of guarded
DTGDs is E XP T IME-hard. The construction simulates an alternating L IN S PACE Turing
machine. Using a fixed set of rules, chains of non-deterministic length can be generated,
where the length represents a number. By linking these chains appropriately, configu-
rations of the Turing machine can be guessed. An acyclic query is then used to check
consistency w.r.t. the configuration and the transition function. The desired result fol-
lows since alternating L IN S PACE already coincides with E XP T IME.
Theorem 2. ACQ answering under fixed sets of guarded DTGDs is E XP T IME-hard.
Proof. The proof is by a reduction from the non-acceptance of an alternating linear
space Turing machine M on input I. Let M = (S, Λ, δ, s0 ), where S = S∀ ⊎ S∃ ⊎
{sa } ⊎ {sr } is a finite set of states partitioned into universal states, existential states, an
accepting state and a rejecting state, Λ = {0, 1, ⊔} is the tape alphabet with ⊔ be the
blank symbol, δ : S×Λ → (S×Λ×{−1, 0, +1})2 is the transition function, and s0 ∈ S
is the initial state. We assume that M is well-behaved and never tries to read beyond
its tape boundaries, always halts, and uses exactly n tape cells, where n = O(|I|). Fur-
thermore, we assume that a rejecting configuration does not have a subsequent config-
uration, while an accepting configuration has only itself as a subsequent configuration.
Finally, we assume that s0 ∈ S∃ , and also that every universal configuration is followed
by two existential configurations and vice versa. The above assumptions can be made,
without sacrificing the generality of our proof, since the non-acceptance problem of M
remains E XP T IME-hard. We are going to construct a database D, a set Σ of DTGDs,
and a UACQ Q such that D ∪ Σ |= Q iff M rejects I, and Σ is a fixed set of guarded
DTGDs. By [9, Lemma 4], the obtained lower bound holds even if we focus on ACQs.
The idea of the proof is to construct trees, which encode possible computation trees of
M on the input string I, by chasing D with Σ, and then exploit Q to check their consis-
tency. On each configuration node v, which represents the configuration Cv of M , we
attach a chain of length n, which mimics the tape in Cv , and a chain of length at most
|S|, which encodes the state of Cv . The formal definition of D, Σ and Q follows.
     The Database D. Let D = {conf ∃ (c)}, where c ∈ C is a special constant which
represents the initial configuration (which is, by assumption, existential).
     The Set Σ. The predicates that we are going to use are self-explanatory, and thus
we proceed with the construction of Σ without describing the predicates of sch(Σ).
 – Each existential (universal) configuration has one (two) successor configuration(s)
   which is (are) universal (existential):
          conf ∃ (X) → ∃Y succ(X, Y ), conf ∀ (Y ),
          conf ∀ (X) → ∃Y ∃Z succ 1 (X, Y ), conf ∃ (Y ), succ 2 (X, Z), conf ∃ (Z).
 – Each configuration has a cell- and state-chain which simulates its tape and encodes
   its state, respectively:

                    conf x (X) → ∃Y cell (X, Y ), for each x ∈ {∃, ∀},
                    cell (X, Y ) → ∃Z cell (Y, Z) ∨ end (Y ),
                     conf x (X) → ∃Y state(X, Y ), for each x ∈ {∃, ∀},
                  state(X, Y ) → ∃Z state(Y, Z) ∨ end (Y ).

 – Finally, we guess the content of each cell and the cursor position:

                         cell (X, Y ) → 0(Y ) ∨ 1(Y ) ∨ ⊔(Y ),
                         cell (X, Y ) → cursor (Y ) ∨ notCursor (Y ).

The construction of Σ is now completed. It is easy to see that Σ does not depend on
M , and also that Σ is a set of guarded DTGDs.
    The UACQ Q. We now proceed with the construction of Q which ensures that each
model of D ∪ Σ encodes a valid computation tree. Roughly, Q consists of the following
disjuncts:
 1. Qinitial for checking that in the initial configuration the tape contains the input
    string I = a1 . . . ak , the state is s0 , and the cursor is at the first cell;
 2. Qlength for checking that cell-chains are of length n, and state-chains are of length
    at most |S|;
 3. Qcursor for checking that exactly one cell is pointed by the cursor;
 4. Qtrans for checking the consistency w.r.t. the transition function of M ; and
 5. Qinertia for checking that the tape cells not under the cursor keep their old value
    during a transition.
We assume a fixed order on S. Given a state s ∈ S, let #s be its position in this order.
The length of the state-chain which encodes s is #s . For a predicate p ∈ {cell , state},
pi (X, Y ) is used as a shorthand for a chain of length i along predicate p, namely
∃Z1 . . . ∃Zi−1 (p(X, Z1 ) ∧ . . . ∧ p(Zi−1 , Y )). A useful subquery that we will use is
                                               (                    )
                      length[p]k (X) ≡ ∃Y pk (X, Y ) ∧ end (Y )

stating that there exists a p-chain of length k. Having all the necessary ingredients in
place, we are now ready to formalize the components of Q.
 1. Qinitial is defined as (recall that c ∈ C represents the initial configuration)
       ∨         ∨         (                                 )
                       ∃X conf ∃ (c) ∧ cell i (c, X) ∧ a(X) ∨
       i∈[n] a∈Λ\{ai }
                              ∨
                                       (conf ∃ (c) ∧ length[state]#s (c)) ∨
                           s∈S\{s0 }
                                          ∨       (                                     )
                                                ∃X conf ∃ (c) ∧ cell i (c, X) ∧ head (X) .
                                        1