=Paper=
{{Paper
|id=Vol-2373/paper-4
|storemode=property
|title=Worst-Case Optimal Querying of Very Expressive Description Logics with Path Expressions and Succinct Counting
|pdfUrl=https://ceur-ws.org/Vol-2373/paper-4.pdf
|volume=Vol-2373
|authors=Bartosz Bednarczyk,Sebastian Rudolph
|dblpUrl=https://dblp.org/rec/conf/dlog/BednarczykR19
}}
==Worst-Case Optimal Querying of Very Expressive Description Logics with Path Expressions and Succinct Counting==
<pdf width="1500px">https://ceur-ws.org/Vol-2373/paper-4.pdf</pdf>
<pre>
                 Worst-Case Optimal Querying of
              Very Expressive Description Logics with
              Path Expressions and Succinct Counting

                      Bartosz Bednarczyk1,2? and Sebastian Rudolph2??
                 1
                     Institute of Computer Science, University of Wrocław, Poland
                       2
                           Computational Logic Group, TU Dresden, Germany
                            firstname.lastname@tu-dresden.de


         Abstract. Among the most expressive knowledge representation formalisms are
         the description logics of the Z family. For well-behaved fragments of ZOIQ, en-
         tailment of positive two-way regular path queries is well known to be 2E XP T IME-
         complete under the proviso of unary encoding of numbers in cardinality con-
         straints. We show that this assumption can be dropped without an increase in com-
         plexity and E XP T IME-completeness can be achieved when bounding the number
         of query atoms, using a novel reduction from query entailment to knowledge base
         satisfiability. These findings allow to strengthen other results regarding query en-
         tailment and query containment problems in very expressive description logics.
         Our results also carry over to GC 2 , the two-variable guarded fragment of first-
         order logic with counting quantifiers, for which hitherto only conjunctive query
         entailment has been investigated.


1     Introduction
Recent years have seen a convergence of the fields of logic-based knowledge represen-
tation (KR) and databases, with querying over knowledge bases (KBs, aka ontologies)
as the archetypical inferencing task considered. Thereby, the query languages of interest
have evolved from plain conjunctive queries to more expressive formalisms with path-
navigational components such as the positive two-way regular path queries (P2RPQs)
considered here. One popular and practically used way of specifying the to-be-queried
knowledge bases is via description logics (DLs), whose most expressive exemplars fea-
ture advanced constructors for roles and path expressions. Certainly, the DLs from the
Z family [6] are among the most powerful KR formalisms on the verge of decidability.
For its most expressive proponent, ZOIQ, featuring nominals (O), role inverses (I),
and number restrictions (Q), querying is undecidable [8,13] and even decidability of KB
satisfiability is open, owing to the intricate interplay of the three mentioned features, but
restricting the interaction of O, I, and Q (or excluding one of the features altogether)
leads to beneficial model-theoretic properties, which give rise to upper bounds of E XP -
T IME for KB satisfiability and 2E XP T IME for querying, established using elaborate
automata techniques [6]. While the first bound holds under the assumption of binary
?
     Supported by “Diamentowy Grant” no. DI2017 006447
??
     Supported by ERC Consolidator Grant 771779 (DeciGUT)
encoding of number restrictions, the proof for the second one requires unary encoding.
While unary encoding of numbers is often silently accepted in the DL community, it is
considered “cheating” in others.
    In our paper, we overcome this restriction. Leveraging said model-theoretic property
of the considered DLs, we provide a novel reduction from the query entailment problem
K |= q to the problem of checking unsatisfiability of some other knowledge base K¬q .
Thereby, the size of the latter is exponential in the total size of q and K (even when as-
suming binary encoding), leading to a 2E XP T IME-complete complexity. Our technique
also yields the new insight that the complexity drops to E XP T IME, when the number of
atoms in q is bounded by a constant. The obtained results allow to correspondingly im-
prove a number of previous results on query containment and can be transferred to DLs
from the SR family. Reaching out to the community researching decidable fragments
of first-order logic, we show that our results also extend to entailment of P2RPQs by
GC 2 databases, i.e. sets of unary and binary ground facts in the presence of a sentence
of the guarded two-variable fragment with counting as defined by Pratt-Hartmann [10].


2   Preliminaries

Description Logics. We will very briefly recap the very expressive DL ZOIQ and its
relevant sublogics following Calvanese et al. [6]. We assume a fixed signature consist-
ing of sets NI of individual names, NC of concept names, and NR of role names. We
let NC contain the special concept names > and ⊥, and NR the special role names >
and ⊥. The following EBNF grammar defines atomic concepts B, concepts C, atomic
roles R, simple roles S and roles T , with o, a, b ∈ NI , A ∈ NC , P ∈ NR \ {>}:

     B ::= A | {o}
     C ::= B | ¬C | C u C | C t C | ∀T.C | ∃T.C | >nS.C | 6nS.C | ∃S.Self
     R ::= P | P −
     S ::= R | S ∩ S | S ∪ S | S \ S
     T ::= > | S | T ∪ T | T ◦ T | T ∗ | id(C)

An assertion is of the form C(a), R(a, b), a ≈ b, or a 6≈ b, a general concept inclusion
(GCI) has the form C1 v C2 . We use C1 ≡ C2 as abbreviation for C1 v C2 , C2 v C1 .
A ZOIQ knowledge base (short: KB) K = (A, T ) consists of a finite set A (called
ABox) of assertions and a finite set T (called TBox) of GCIs.
    The semantics of ZOIQ is defined via interpretations I = (∆I , ·I ) composed
of a non-empty set ∆I called the domain of I and a function ·I mapping individual
names to elements of ∆I , concept names to subsets of ∆I , and role names to subsets
of ∆I ×∆I . This mapping is extended to concepts, simple roles, and roles (cf. Table 1).
Then, a GCI C1 v C2 is satisfied by I if C1I ⊆ C2I , satisfaction of ABox assertions is
defined in the obvious way. We say that an interpretation I satisfies a KB K = (A, T )
(or I is a model of K, written: I |= K) if it satisfies all axioms of A and T . K is
called satisfiable if it has a model and unsatisfiable otherwise. From ZOIQ, we obtain
ZIQ by disallowing nominal concepts {o}, ZOQ, by disallowing role inverses (.)−
and ZOI by disallowing number restrictions.
             Table 1: Semantics of concepts (left) and (simple) roles (right) in ZOIQ .
Name                 Syntax    Semantics                          Name             Syntax    Semantics
top                  >         ∆I                                 universal role   >         ∆I × ∆I
bottom               ⊥         ∅                                  bottom role      ⊥         ∅
nominal              {o}       {oI }                              inverse role     P−        {(x, y) | (y, x) ∈ P I }
negation             ¬C        ∆I \ C I                           intersection     S1 ∩ S2   S1I ∩ S2I
intersection         C1 u C2   C1I ∩ C2I                          union            S1 ∪ S2   S1I ∪ S2I
union                C1 t C2   C1I ∪ C2I                          difference       S1 \ S2   S1I \ S2I
univ. restriction    ∀T.C      {x | ∀y.(x, y) ∈ T I → y ∈ C I }   concatenation    T1 ◦ T2     I     I
                                                                                             S1 ◦ T2 I i
                                                                                             T
exist. restriction   ∃T.C      {x | ∃y.(x, y) ∈ T I ∧ y ∈ C I }   Kleene star      T∗          i≥0 (T )
qualified number     6nS.C     {x | #{y∈C I | (x, y)∈S I } ≤ n}   concept test     id(C)     {(x, x) | x ∈ C I }
 restriction         >nS.C     {x | #{y∈C I | (x, y)∈S I } ≥ n}
Self concept         ∃S.Self   {x | (x, x) ∈ S I }


Queries. In queries, we use variables from a countably infinite set V. A Boolean
positive two-way regular path query (P2RPQ) is an expression q = ∃x.ϕ, where ϕ is a
positive formula (i.e., one using ∧ and ∨) over expressions of the form T (s, t) or C(t),
where T is a role, C is concept, and s and t are elements of x ∪ NI . A P2RPQ is a
conjunctive two-way regular path query (C2RPQ) if it does not use disjunction. It is a
union of conjunctive two-way regular path queries (UC2RPQ) if it is a disjunction of
C2RPQs.3 A variable assignment Z for I is a mapping V → ∆I . For x ∈ V, we set
xI,Z := Z(x); for c ∈ NI , we set cI,Z := cI . T (s, t) evaluates to true under Z and
I if (sI,Z , tI,Z ) ∈ T I . C(t) evaluates to true under Z and I if tI,Z ∈ C I . A P2RPQ
q = ∃x.ϕ is satisfied by I (written: I |= q) if there is a variable assignment Z (called
match) such that ϕ evaluates to true under I and Z. A P2RPQ q is entailed by a KB K
(written: K |= q) if every model of K satisfies q. If q uses only signature elements of K,
we call it a query over K.
Simplifications. A P2RPQ is simplified if all its atoms are of the form T (x, y) where
x and y are variables and T is built from role names using ∪, ◦, and ∗ . For any KB K
and P2RPQ q, we can construct in polynomial time a KB K0 and simplified P2RPQ q 0
such that K |= q iff K0 |= q 0 . Hence from here on, we will focus on simplified queries.
Automata. For each atom T (x, y) of a simplified P2RPQ, T is a regular expression
over NR . Every such expression can be represented by a nondeterministic finite au-
tomaton A = (Σ, Q, I, F, T) with an alphabet Σ ⊆ {P, P − | P ∈ NR } of possibly in-
verted role names, a finite set Q of states, initial states I ⊆ Q, final states F ⊆ Q,
and transitions T ⊆ Q × Σ × Q.4 The size of A is polynomially bounded by T . In
the following, we assume queries in automaton form, where the atoms T (x, y) have
been replaced by the corresponding A(x, y). Given A = (Σ, Q, I, F, T), let A− =
(Σ − , Q, F, I, T− ) with Σ − = {P − | P ∈ Σ} be the corresponding reverse automa-
ton5 with initial and final states swapped and the state transitions flipped: (q0 , R− , q) ∈

 3        W        W
   As ∃x. i ϕi ≡ i ∃x.ϕi , the order of disjunction and existential quantification is irrelevant.
 4
   The correspondence is such that (δ, δ 0 ) ∈ T I iff there is a role path δ −
                                                                              P1
                                                                              −
                                                                                        Pk
                                                                               → · · · −−→ δ 0 in I
   such that A accepts P1 . . . Pk .
 5
   For convenience, we consider (P − )− identical to P .
T− for every (q, R, q0 ) ∈ T. Moreover, we obtain the “state-parameterized” automaton
Aq,q0 = (Σ, Q, {q}, {q0 }, T) from A by setting the initial state to q and the final to q0 .


3    A Little Bit of Model Theory
The “well-behavedness” of certain sublogics of ZOIQ can be conveniently charac-
terised in model-theoretic terms.

Definition 1 (quasi-forest model, [5]). Let K be a KB. A model I of K is a quasi-forest
model if:
– the domain ∆I of I is a forest, i.e., a prefix-closed subset of Roots × N∗ for some
    finite set Roots,
– Roots = {oI | o is an individual name in K}, and
– for every δ, δ 0 ∈ ∆I with (δ, δ 0 ) ∈ P I for some role name P ∈ NR \ {>}, either (i)
    {δ, δ 0 } ∩ Roots 6= ∅, or (ii) δ = δ 0 , or (iii) δ is a child of δ 0 , or (iv) δ 0 is a child of δ.
A KB K has the quasi-forest model property (short: qfmp) if K is either unsatisfiable or
it has a quasi-forest model. A DL L has the qfmp if every L KB K has the qfmp.

For such well-behaved KBs, tight and quite decent complexity bounds for satisfiability
checking have been established.

Theorem 1 ([6]). Satisfiability checking of ZOIQ KBs having the qfmp is E XP T IME-
complete even with binary encoding of number restrictions.

When it comes to query answering, a useful notion for relating interpretations to each
other is via homomorphisms. For two interpretations I = (∆I , ·I ) and J = (∆J , ·J ),
a function h : ∆I → ∆J is a homomorphism from I to J if for all δ, δ 0 ∈ ∆I holds
δ ∈ AI ⇒ h(δ) ∈ AJ for all concept names A, and (δ, δ 0 ) ∈ P I ⇒ (h(δ), h(δ 0 )) ∈
P J for all role names P . If a homomorphism from I to J exists, we write I B J . We
note that I |= q and I B J imply J |= q for every simplified P2RPQ q.

Definition 2 (qfhcp, [3]). A KB K has the quasi-forest homomorphism-cover property
(short: qfhcp) if for every model I of K, there exists a quasi-forest model J of K with
J B I. A description logic L has the qfhcp if every L KB K has the qfhcp. We let tame
ZOIQ denote the set of all ZOIQ KBs having the qfhcp.

   Obviously, the qfhcp implies the qfmp. Note that unrestricted ZOIQ does not even
have the qfmp. The most expressive syntactically defined sublogics of tame ZOIQ
known today are ZIQ, ZOQ, and ZOI [6,2]. From the fact that the set of models for
any P2RPQ is closed under homomorphisms now follows that for any tame ZOIQ KB
with K 6|= q, there must exist a quasi-forest model I, satisfying I |= K and I 6|= q.


4    Annotating (Partial) Query Matches
In the following, we consider a tame ZOIQ KB K and a simplified P2RPQ q over
K in automaton form. We note that q can be rewritten into an equivalent disjunction
W
  1≤i≤n qi of n (also simplified) C2RPQs, where n may be exponential in the size of q,
but every single qi ’s size is polynomial. Let [q] denote {q1 , . . . qn }. Then, testing K 6|= q
amounts to determining if K has a model I which is a simultaneous countermodel (i.e.,
I 6|= qi ) for all qi ∈ [q].
     We now give an intuitive description of our technique to detect (or refute) matches
of q. Given some model I of K, we iteratively, deterministically annotate all its domain
elements δ with fresh concept names QM that indicate which “parts” M of some query
qi match into I and how δ participates in these partial matches. To this end, we employ
descriptions M of query parts which contain information about (i) the query variables
matched (ii) (the existence of) paths realising certain state transitions in the query’s au-
tomata and (iii) optionally, the “position” of the current δ in the query match by use
of a marker • acting like an additional, distinguished query variable. In the annotation
process, the QM s will be assigned to domain elements δ based on their local environ-
ment and known annotations QM 0 for “smaller” partial queries to the same element
δ or its direct (role) neighborhood. As an exception, non-localised query matches M
not containing • will be “broadcast”, i.e., uniformly attached to all domain elements.
This way, in the annotation process, QM s for larger and larger partial queries are as-
signed, until finally (after reaching the unique fixpoint) also the full matches for any qi
are recognised by annotations Qqi . This process will accurately detect partial and full
query matches, if I is a quasi-forest model (which is sufficient for our purposes as K
has the qfhcp). The annotation process is realised by virtue of a KB Kq , the size of
which is exponential in q, but only polynomial in K. In the following, we will stepwise
introduce the KB, interleaved with some necessary definitions.
Definition 3 (partial query). Consider a C2RPQ qi = {A1 (x1 , y1 ),..., Am (xm , ym )}.6
A partial query for qi and a set X ⊆ {x1 , y1 , . . . , xm , ym } of variables from qi is a set
M consisting of
– all atoms A(x, y) from qi for which {x, y} ⊆ X,
– for every A(x, y) from qi with x ∈ X but y 6∈ X, one of the atoms Aq,q0 (x, •) or
  Aq,q0 (x, o) where q is initial in A,
– for every A(x, y) from qi with x 6∈ X but y ∈ X, one of the atoms A−             q,q0 (y, •) or
  A−q,q0 (y, o) where q is final in A,
– in case X = ∅, exactly one atom of the form Aq,q0 (•, o) or A−             q,q0 (•, o) (called
  nominal-anchored path) or Aq,q0 (•, •) or A−      q,q0 (•, •) (called  round  trip) for some
  A from qi .
A partial query is called monodic, if |X| = 1. A partial query is called local if it
contains • and global otherwise.

Nominal-Anchored Paths. First, we want to detect role paths that start in the to-be-
annotated individual, end in named individual oI , and realise state transitions in one of
the query’s automata. Let o be any individual name in K, let A± be either A or A− for
any automaton A occurring in q with states q, q0 , q00 . Then let Kq contain the axiom

                                        Q{A±
                                           q,q (•,o)}
                                                      (o)                                    (1)
 6
     In cases, we may equate a C2RPQ qi with the set of its atoms.
and whenever A± has a transition (q, R, q0 ):

                               ∃R.Q{A±0 00 (•,o)} v Q{A± 00 (•,o)}                     (2)
                                        q ,q                   q,q


Round Trips. Next, we are concerned with paths which start and end in the to-be-
annotated individual. Assuming an automaton A with states q, q0 , q00 , q000 and transition
set T as well as an individual name o, we add the axioms:

                                                      > v Q{Aq,q (•,•)}                (3)
                     Q{Aq,q0 (•,•)} u Q{Aq0 ,q00 (•,•)} v Q{Aq,q00 (•,•)}              (4)
                     Q{Aq,q0 (•,o)} u Q{A−00 0 (•,o)} v Q{Aq,q00 (•,•)}                (5)
                                               q ,q

               for (q, R, q0 ) ∈ T:             ∃R.Self v Q{Aq,q0 (•,•)}               (6)
               for {(q, R, q0 ), (q00 , R0 , q000 )} ⊆ T:
                        ∃(R ∩ R0− ).Q{Aq0 ,q00 (•,•)} v Q{Aq,q000 (•,•)}               (7)

Initialising monodic partial queries. Next we determine domain elements to which
separate query variables could possibly be mapped in a match.

Definition 4. We call a monodic partial query M original, whenever q = q0 for ev-
ery Aq,q0 (x, •) ∈ M and every A−    q,q0 (x, •) ∈ M . For M an original monodic partial
query for q and {x}, the set Prec M of precondition concepts consists of:
– Q{Aq,q0 (•,o)} for each atom Aq,q0 (x, o) in M ,
– Q{A− 0 (•,o)} for each atom A−   q,q0 (x, o) in M ,
   F q,q
– q 0initial Q{Aq,q0 (•,•)} for each atom A(x, x) in M .
     q final


Now we can initialise all original monodic partial queries M by adding the following
axioms to Kq :                     l
                                     Prec M v QM                                 (8)

Updating partial queries. Partial queries can be updated by roundtrips: Let M be a
partial query containing some A±q,q0 (x, •). Then we realise the update by adding the
following axioms to Kq :

                    QM u Q{A±0 00 (•,•)} v QM \{A± 0 (x,•)}∪{A± 00 (x,•)}              (9)
                                 q ,q                    q,q         q,q


Furthermore, a partial query can “progress” when making a “step” in the model, moving
from the considered δ to some role neighbour. In such a step, all the “unready” paths
(corresponding to atoms with •) must be updated in a synchronous manner. Given a
partial query M for qi and nonempty X as well as a set {R1 , . . . , Rk } of (possibly
inverted) role names, assume M 0 can be obtained from M by replacing each atom of
the form A±                            ±                     ±                  0      00
            q,q0 (x, •) by an atom Aq,q00 (x, •) such that A has a transition (q , R, q )
for for any R ∈ {R1 , . . . , Rk }. Then add the following axiom to the KB Kq :

                             ∃(R1− ∩ R2− ∩ . . . ∩ Rm
                                                    −
                                                      ).QM v QM 0                     (10)
Joining partial queries. When two partial queries corresponding to the same query
qi “meet” in a domain element δ, they can – under certain circumstances – be “glued
together” into a “bigger” partial query of qi .

Definition 5 (joinable, join). Let M1 be a partial query for qi and X1 and let M2 be
a partial query for qi and X2 . We call M1 and M2 joinable if
– X1 6= ∅, X2 6= ∅, X1 ∩ X2 = ∅, and
– for each A(x, y) ∈ qi \ (M1 ∪M2 ) with {x, y} ⊆ X1 ∪X2 , there are states q, q0 , q00 of
   A with q initial and q00 final, such that either {Aq,q0 (x, •), A−
                                                                    q00 ,q0 (y, •)} ∈ M1 ∪M2
                      −
   or {Aq,q (x, o), Aq00 ,q0 (y, o)} ∈ M1 ∪ M2 for some o ∈ NI .
            0

For joinable M1 and M2 , their join, denoted M1 ./ M2 , is the partial query obtained
from M1 ∪ M2 by replacing any pair of atoms Aq,q0 (x, ∗) and A−       q00 ,q0 (y, ∗) by A(x, y),
where ∗ is either an individual name o or •.

We implement the join operation for every pair M1 , M2 of joinable partial queries for
a qi ∈ [q] by extending Kq with:

                                   QM1 u QM2 v QM1 ./M2                                     (11)

Broadcasting of global partial queries. Whenever a partial query M does not have any
occurrences of •, which would tie it to a specific element, it will be “broadcast” to all
domain elements:
                                   ∃>.QM v QM                                       (12)
This concludes the definition of Kq . Revisiting our initial intuition of Kq ’s purpose of
deterministically annotating a model I of K, the following lemma formalises this by
singling out one unique annotated model I ∗ for every I.

Lemma 1. Let K be a ZOIQ KB and q a simplified P2RPQ. For every model I of K
                                                                           ∗
there exists a unique model I ∗ of K ∪ Kq which extends I and satisfies QIM ⊆ QJ
                                                                               M for
every QM in Kq and for every model J of K ∪ Kq that extends I.

We will call models I ∗ from Lemma 1 Q-minimal. Note that for every model I of K,
                                ∗
any concept membership δ ∈ QIM holding in the corresponding Q-minimal model I ∗
can be derived from concept and role memberships in I through a finite sequence of
“applications” of axioms from Kq .7


5      Reduction to Satisfiability

Now we establish technical results relating the presence of QM -annotations in models
and the actual semantic satisfaction of the corresponding partial query M . Note that any
partial query M can be seen as a (set representation of a) C2RPQ in automaton form,
assuming • is just an ordinary variable.
 7
     To see this, it may help to realise that all axioms of Kq can be easily expressed in monadic
     Datalog.
Definition 6 ((tight) satisfaction of a partial query). Given an interpretation I, a
domain element δ ∈ ∆I , and a partial query M , we say M is satisfied or holds in
δ, written I, δ |= M , if there is a match Z for M in I with Z(•) = δ. M is satisfied
                                     |= M , if Z is such that every A±
or holds tightly in δ, written I, δ |=                               q1 ,q2 (x, •) ∈ M is
realized by a path Z(x) ; δ not containing oI for any individual name o.

Note that |= and |=|= coincide whenever M is global or does not contain variables. Note
also that, as a consequence of this definition, if M is global, then QM holds (tightly)
everywhere or nowhere throughout the domain. With this notion in place, the next two
lemmas can be seen as soundness and completeness results regarding the deduction
calculus for (partial) query matches embodied by Kq .

Lemma 2 (soundness). Let I be a Q-minimal model of K ∪ Kq and let δ ∈ ∆I . Let
M be any partial query for any C2RPQ qi ∈ [q]. Then δ ∈ QIM implies I, δ |= M .

Lemma 3 (completeness). Let I be a quasi-forest model of K ∪ Kq and let δ ∈ ∆I .
                                                              |= M implies δ ∈ QIM .
Let M be any partial query for any C2RPQ qi ∈ [q]. Then I, δ |=

Soundness is proven by induction over the length of the derivation for δ ∈ QIM . Com-
pleteness is shown by induction over the spread of the match for M , i.e., the number of
variables (except •) plus the sum of the lengths of all paths realising the query atoms.
    Thanks to these correspondences, we can essentially rule out models with matches
of any q1 , . . . , qn by forcing the extensions of Qq1 , . . . , Qqn to be empty. Therefore, let

                          K¬q = K ∪ Kq ∪ {Qqi v ⊥ | qi ∈ [q]}.                              (13)

We establish some syntactic and semantic properties of K¬q : Some easy calculations
yield size bounds for K¬q , tameness is not affected by the reduction, and unsatisfiability
of K¬q indeed coincides with query entailment.

Fact 4 (size of K¬q ) The size of K¬q is single exponential in the total size of K and q.
It is polynomial, if the number of atoms in q is bounded by a constant.

Lemma 5. If K is in tame ZOIQ then so is K¬q .

Theorem 2. Let K be a tame ZOIQ KB and let q be a simplified P2RPQ over K. Then
K |= q if and only if K¬q is unsatisfiable.

Proof. We show the equivalent statement: K 6|= q if and only if K¬q satisfiable.
“only if”: Assume K 6|= q, that is there exists an I with I |= K and I 6|= q. Consider
                                                                ∗
I ∗ , the Q-minimal model of K ∪ Kq extending I. If some QIqi were nonempty, then
there would be a query match of q into I by Lemma 2, contradicting our assumption.
Therefore, I ∗ |= Qqi v ⊥ for all qi ∈ [q] and hence, I ∗ |= K¬q .
“if”: Assume K¬q is satisfiable and hence has a model I. As K¬q has the qfmp thanks
to Lemma 5, we can assume I to be a quasi-forest model. I |= Qqi v ⊥ implies empti-
ness of QIqi for every qi ∈ [q]. If I |= q held true, witnessed by I |= qi for some
qi ∈ [q], Lemma 3 would require δ ∈ QIqi for all δ ∈ ∆I , leading to a contradiction.
Hence I 6|= q. On the other hand, I |= K¬q implies I |= K, therefore K 6|= q.
Based on this theorem, we can now prove our central result:
Theorem 3. Checking entailment of simplified P2RPQs from tame ZOIQ KBs with
binary number encoding is 2E XP T IME-complete. It is E XP T IME-complete if the num-
ber of atoms in the query is bounded.
Proof. By Theorem 2, entailment of a simplified C2RPQ q from a tame ZOIQ KB K
can be reduced to checking unsatisfiability of K¬q , which can be computed in output-
polynomial time and the size of which is exponential (polynomial, if number of atoms in
q is bounded). By Theorem 1, (un)satisfiability of K¬q can be checked in E XP T IME in
the size of K¬q , hence we obtain the desired 2E XP T IME and E XP T IME upper bounds.
The matching lower bounds are well-known even for much weaker logics [9].

6    Querying GC 2 Databases
We now consider the problem of P2RPQ entailment from GC 2 databases, i.e., sets of
unary and binary ground facts in the presence of a sentence in the guarded two-variable
fragment with counting quantifiers as defined by Pratt-Hartmann ([10]). We will show
that there is a polytime query-entailment-preserving transformation from GC 2 databases
to ZIQ KBs. This allows us to transfer the results from Theorem 3. Matching lower
bounds follow from known results for much weaker logics and query formalisms (such
as CQs over ALCI [7]).
Definition 7 (GC 2 , [10]). For a signature S of nullary (N0 ), unary (NC ), and binary
(NR ∪ {≈}) predicates,8 let GC 2 be the smallest set of formulae that contains all atoms
over S using only variables from {x, y}, that is closed under Boolean combinations,
and that contains
– all formulae ∃u.ϕ and ∀u.ϕ with u ∈ {x, y} whenever ϕ is a GC 2 formula with at
   most one free variable, and                Q                   Q
– all formulae ∀u.(γ → ϕ) as well as u.(γ ∧ ϕ) and u.(γ) where γ is a binary
   atom (called “guard”) containing both x and y, ϕ is a GC 2 formula, and is any of
                                                                                      Q
   ∃, ∃6n , ∃=n , or ∃>n .
A GC 2 database DB is a theory {Φ} ∪ A consisting of a GC 2 sentence Φ and a finite set
A of unary and binary ground facts over constants from NI .9 We denote by GC 2qef the
set of GC 2 formulae which are quantifier-free and do not use ≈.
A P2RPQ over a GC 2 database is a P2RPQ where for every atom of the form C(t) holds
C ∈ NC and for every atom of the form T (s, t), every subexpression of the form id(C)
satisfies C ∈ NC . P2RPQ entailment from GC 2 databases is defined in the same way
as for DLs.
    For convenience, we make use of a special normal form for GC 2 introduced in the
following.
 8
   As unary and binary predicates correspond to concept and role names, respectively, we use the
   same set symbols to denote them.
 9
   Note that ABoxes, as defined before, are syntactically and semantically identical to finite sets
   of unary and binary ground facts, so we consider the notions the same and use the same symbol
   A. Note also that by definition, Φ does not contain constants.
Lemma 6 (normal form, [11]). For any GC 2 sentence Φ, one can compute in polyno-
mial time a GC 2 sentence NF(Φ) of the form
                          V
                   ∀xα ∧ 1≤h≤` ∀x∀y (Eh (x, y) → βh ∨ x ≈ y))
                            V                                               (14)
                           ∧ 1≤i≤m ∀x∃=ni y (Fi (x, y) ∧ x 6≈ y) ,

with α ∈ GC 2qef not using y, {E1 , ... , E` , F1 , ... , Fm } ⊆ NR , {β1 , ... , β` } ⊆ GC 2qef ,
and {n1 , ... , nm } ⊆ N, such that (i) NF(Φ) |= Φ and (ii) every model of Φ with at least
maxh nh elements can be extended to a model of NF(Φ).
Given a GC 2 database DB = {Φ} ∪ A, we let NF(DB) = {NF(Φ)} ∪ A. Thanks to the
previous lemma and the known property that models of GC 2 sentences are closed under
disjoint self-union, the normal form can be shown to be query-entailment-preserving.
Lemma 7. For every P2RPQ q over a GC 2 database DB holds DB |= q if and only if
NF(DB) |= q.
We next provide a query-entailment-preserving polynomial translation of GC 2 databases
into ZIQ KBs.
Definition 8. Let Ψ be a GC 2 sentence in normal form as in Lemma 6. We define the
ZIQ TBox TΨ as follows:
                                          [            [
                  TΨ = T≈ ∪ Tps ∪ Tα ∪       TEh ,βh ∪      TFi ,ni           (15)
                                              1≤h≤l            1≤i≤m

Thereby (introducing fresh concept names P ropP and role names R≈ and REh ,at ):
– T≈ = {> v 61R≈ .>, > v ∃R≈ .Self}
– Tps = {P ropP ≡ ∃>.P ropP | P ∈ N0 }
– Tα = {> v tr(α)}, where
                   tr(P ) = P ropP                tr(¬ϕ) = ¬tr(ϕ)
               tr(P (x)) = P                   tr(ϕ ∧ ψ) = tr(ϕ) u tr(ψ)
            tr(P (x, x)) = ∃P.Self             tr(ϕ ∨ ψ) = tr(ϕ) t tr(ψ)
             S
– TEh ,βh = at∈atoms(βh ) (axEh (at) ∪ {> v ∀(REh ,at \Eh ).⊥}) where axEh maps atoms
             ∪ {> v ∀(Eh \(R≈ ∪ trEh (βh ))).⊥}
  in βh to sets of GCIs as follows:
   P        7→ {            ∃REh ,P .> v P ropP ,        ∃(Eh \REh ,P ).> v ¬P ropP }
   P (x) 7→ {          ∃REh ,P (x) .> v P,             ∃(Eh \REh ,P (x) ).> v ¬P    }
   P (x, x) 7→ {     ∃REh ,P (x,x) .> v ∃P.Self, ∃(Eh \REh ,P (x,x) ).> v ¬∃P.Self}
                            −
   P (y) 7→ {           ∃RE   h ,P (y)
                                       .> v P,        ∃(Eh− \RE −
                                                                  h ,P (y)
                                                                           ).> v ¬P }
                          −                              −    −
   P (y, y) 7→ {     ∃REh ,P (y,y) .> v ∃P.Self, ∃(Eh \REh ,P (y,y) ).> v ¬∃P.Self}
   P (x, y) 7→ { ∃(REh ,P (x,y) \P ).> v ⊥,       ∃(Eh ∩P \REh ,P (x,y) ).> v ⊥     }
                    −                                −        −
   P (y, x) 7→ { ∃(RE h ,P (y,x)
                                 \P  ).>  v ⊥,  ∃(E  h ∩P  \R Eh ,P (y,x)  ).> v ⊥  }

   and trEh maps GC 2qef formulae to simple roles:
          trEh (at) = REh ,at                      trEh (ρ ∧ ρ0 ) = trEh (ρ) ∩ trEh (ρ0 )
         trEh (¬ρ) = Eh \tr(ρ)                     trEh (ρ ∨ ρ0 ) = trEh (ρ) ∪ trEh (ρ0 )
– TFi ,ni = { > v 6ni (Fi \R≈ ).> u >ni (Fi \R≈ ).> }
Given a GC 2 database DB = {Φ} ∪ A, we let KDB denote the ZIQ KB (TNF(Φ) , A).

Intuitively, T≈ axiomatises equality, Tps introduces synchronised concept names P ropP
for every propositional symbol P from DB, since ZIQ does not allow for nullary pred-
icates. Tα , TEh ,βh , and TFi ,ni implement the conjuncts from NF(Φ).
    We can now observe that KDB can be computed in polytime wrt. the size of DB and
that it has the qfhcp (since it is a ZIQ KB). Moreover we can show that any model of
KDB can be extended to one of NF(DB) and, conversely, any model of NF(DB) can be
extended to one of KDB . With the help of these correspondences and Lemma 7, we can
then argue that for any P2RPQ q over DB holds DB |= q iff KDB |= q. These findings
allow us to apply Theorem 3 to establish the following result.

Theorem 4. Checking P2RPQ entailment from GC 2 databases is 2E XP T IME-complete.
It is E XP T IME-complete if the number of query atoms is bounded by a constant.


7      Conclusion

We have established tight complexity bounds for expressive querying in very expressive
DLs under the assumption of succinct (i.e. binary) encoding of number restrictions. Ar-
guing along the lines of Calvanese et al. [6], we can leverage our findings to strengthen
their results on query containment as well as the SR family as follows:10

Theorem 5 (query containment). Testing query entailment K |= q 0 ⊆ q is in 2E XP -
T IME with respect to the total size of q 0 , q, and K (with binary encoding of number
restrictions) if (i) K is in ZOQ or ZOI and q 0 and q are P2RPQs over K, or (ii) K is
in ZIQ, q 0 is a conjunctive query and q is a P2RPQ over K. The complexity drops to
E XP T IME if the number of atoms occurring in q is bounded by a constant.

Theorem 6 (SR family). Deciding K |= q and K |= q 0 ⊆ q is in 3E XP T IME in the
total size of q 0 , q, K (with binary encoding of numbers) – and in 2E XP T IME if K’s RBox
is given by defining regular expressions for the non-simple roles – if (i) K is in SROQ
or SROI and q 0 , q are P2RPQs over K, or (ii) K is in SRIQ, q 0 is a conjunctive
query and q is a P2RPQ over K. The complexities are in 2E XP T IME and in E XP T IME,
respectively, if the number of atoms occurring in q is bounded by a constant.

There are plenty of open questions left for future work. First, the results established
here hold for arbitrary models, however, so far, very little is known about finite (model)
query answering in DLs from the Z family and the methods applied here do not seem
to easily carry over to the finite model setting. Second, while the extension of our query
formalism by nestings in the sense of Bienvenu et al. [1] seems straightforward without
impacting complexities, our technique seems not readily extendable to capture more
elaborate query languages [14,4,12].
10
     For space reasons, we assume the reader to be familiar with the notions used in these theorems.
References
 1. Meghyn Bienvenu, Diego Calvanese, Magdalena Ortiz, and Mantas Simkus. Nested regular
    path queries in description logics. In Chitta Baral, Giuseppe De Giacomo, and Thomas Eiter,
    editors, Proc. 14th Int. Conf. on Principles of Knowledge Representation and Reasoning
    (KR’14), pages 218–227. AAAI Press, 2014.
 2. Piero A. Bonatti, Carsten Lutz, Aniello Murano, and Moshe Y. Vardi. The complexity of
    enriched mu-calculi. Logical Methods in Computer Science, 4(3:11):1–27, 2008.
 3. Pierre Bourhis, Markus Krötzsch, and Sebastian Rudolph. How to best nest regular path
    queries. In Proc. 27th Int. Workshop on Description Logics (DL’14), volume 1193 of CEUR
    Workshop Proceedings, pages 404–415. CEUR-WS.org, 2014.
 4. Pierre Bourhis, Markus Krötzsch, and Sebastian Rudolph. Reasonable highly expressive
    query languages. In Qiang Yang and Michael Wooldridge, editors, Proc. 24th Int. Joint
    Conf. on Artificial Intelligence (IJCAI’15), pages 2826–2832. AAAI Press, 2015.
 5. Diego Calvanese, Thomas Eiter, and Magdalena Ortiz. Answering regular path queries in
    expressive description logics: An automata-theoretic approach. In Proc. 22nd AAAI Conf. on
    Artificial Intelligence (AAAI’07), pages 391–396. AAAI Press, 2007.
 6. Diego Calvanese, Thomas Eiter, and Magdalena Ortiz. Regular path queries in expressive
    description logics with nominals. In Craig Boutilier, editor, Proc. 21st Int. Joint Conf. on
    Artificial Intelligence (IJCAI’09), pages 714–720. IJCAI, 2009.
 7. Carsten Lutz. The complexity of conjunctive query answering in expressive description
    logics. In Alessandro Armando, Peter Baumgartner, and Gilles Dowek, editors, Proc. 4th
    Int. Joint Conf. on Automated Reasoning (IJCAR’08), number 5195 in LNAI, pages 179–
    193. Springer, 2008.
 8. Magdalena Ortiz, Sebastian Rudolph, and Mantas Simkus. Query answering is undecidable
    in DLs with regular expressions, inverses, nominals, and counting. INFSYS Research Report
    1843-10-03, TU Vienna, 2010.
 9. Magdalena Ortiz and Mantas Simkus. Reasoning and query answering in description logics.
    In Thomas Eiter and Thomas Krennwallner, editors, Proc. 8th Int. Reasoning Web Summer
    School (RW’12), volume 7487 of Lecture Notes in Computer Science, pages 1–53. Springer,
    2012.
10. Ian Pratt-Hartmann. Complexity of the guarded two-variable fragment with counting quan-
    tifiers. J. Log. Comput., 17(1):133–155, 2007.
11. Ian Pratt-Hartmann. Data-complexity of the two-variable fragment with counting quantifiers.
    Inf. Comput., 207(8):867–888, 2009.
12. Juan L. Reutter, Miguel Romero, and Moshe Y. Vardi. Regular queries on graph databases.
    Theor. Comp. Sys., 61(1):31–83, July 2017.
13. Sebastian Rudolph. Undecidability Results for Database-Inspired Reasoning Problems in
    Very Expressive Description Logics. In Chitta Baral, James P. Delgrande, and Frank Wolter,
    editors, Proc. 15th Int. Conf. on Knowledge Representation and Reasoning (KR’16), pages
    247–257. AAAI Press, 2016.
14. Sebastian Rudolph and Markus Krötzsch. Flag & check: Data access with monadically de-
    fined queries. In Richard Hull and Wenfei Fan, editors, Proc. 32nd Symposium on Principles
    of Database Systems (PODS’13), pages 151–162. ACM, 2013.

</pre>