=Paper=
{{Paper
|id=Vol-2373/paper-4
|storemode=property
|title=Worst-Case Optimal Querying of Very Expressive Description Logics with Path Expressions and Succinct Counting
|pdfUrl=https://ceur-ws.org/Vol-2373/paper-4.pdf
|volume=Vol-2373
|authors=Bartosz Bednarczyk,Sebastian Rudolph
|dblpUrl=https://dblp.org/rec/conf/dlog/BednarczykR19
}}
==Worst-Case Optimal Querying of Very Expressive Description Logics with Path Expressions and Succinct Counting==
Worst-Case Optimal Querying of Very Expressive Description Logics with Path Expressions and Succinct Counting Bartosz Bednarczyk1,2? and Sebastian Rudolph2?? 1 Institute of Computer Science, University of Wrocław, Poland 2 Computational Logic Group, TU Dresden, Germany firstname.lastname@tu-dresden.de Abstract. Among the most expressive knowledge representation formalisms are the description logics of the Z family. For well-behaved fragments of ZOIQ, en- tailment of positive two-way regular path queries is well known to be 2E XP T IME- complete under the proviso of unary encoding of numbers in cardinality con- straints. We show that this assumption can be dropped without an increase in com- plexity and E XP T IME-completeness can be achieved when bounding the number of query atoms, using a novel reduction from query entailment to knowledge base satisfiability. These findings allow to strengthen other results regarding query en- tailment and query containment problems in very expressive description logics. Our results also carry over to GC 2 , the two-variable guarded fragment of first- order logic with counting quantifiers, for which hitherto only conjunctive query entailment has been investigated. 1 Introduction Recent years have seen a convergence of the fields of logic-based knowledge represen- tation (KR) and databases, with querying over knowledge bases (KBs, aka ontologies) as the archetypical inferencing task considered. Thereby, the query languages of interest have evolved from plain conjunctive queries to more expressive formalisms with path- navigational components such as the positive two-way regular path queries (P2RPQs) considered here. One popular and practically used way of specifying the to-be-queried knowledge bases is via description logics (DLs), whose most expressive exemplars fea- ture advanced constructors for roles and path expressions. Certainly, the DLs from the Z family [6] are among the most powerful KR formalisms on the verge of decidability. For its most expressive proponent, ZOIQ, featuring nominals (O), role inverses (I), and number restrictions (Q), querying is undecidable [8,13] and even decidability of KB satisfiability is open, owing to the intricate interplay of the three mentioned features, but restricting the interaction of O, I, and Q (or excluding one of the features altogether) leads to beneficial model-theoretic properties, which give rise to upper bounds of E XP - T IME for KB satisfiability and 2E XP T IME for querying, established using elaborate automata techniques [6]. While the first bound holds under the assumption of binary ? Supported by “Diamentowy Grant” no. DI2017 006447 ?? Supported by ERC Consolidator Grant 771779 (DeciGUT) encoding of number restrictions, the proof for the second one requires unary encoding. While unary encoding of numbers is often silently accepted in the DL community, it is considered “cheating” in others. In our paper, we overcome this restriction. Leveraging said model-theoretic property of the considered DLs, we provide a novel reduction from the query entailment problem K |= q to the problem of checking unsatisfiability of some other knowledge base K¬q . Thereby, the size of the latter is exponential in the total size of q and K (even when as- suming binary encoding), leading to a 2E XP T IME-complete complexity. Our technique also yields the new insight that the complexity drops to E XP T IME, when the number of atoms in q is bounded by a constant. The obtained results allow to correspondingly im- prove a number of previous results on query containment and can be transferred to DLs from the SR family. Reaching out to the community researching decidable fragments of first-order logic, we show that our results also extend to entailment of P2RPQs by GC 2 databases, i.e. sets of unary and binary ground facts in the presence of a sentence of the guarded two-variable fragment with counting as defined by Pratt-Hartmann [10]. 2 Preliminaries Description Logics. We will very briefly recap the very expressive DL ZOIQ and its relevant sublogics following Calvanese et al. [6]. We assume a fixed signature consist- ing of sets NI of individual names, NC of concept names, and NR of role names. We let NC contain the special concept names > and ⊥, and NR the special role names > and ⊥. The following EBNF grammar defines atomic concepts B, concepts C, atomic roles R, simple roles S and roles T , with o, a, b ∈ NI , A ∈ NC , P ∈ NR \ {>}: B ::= A | {o} C ::= B | ¬C | C u C | C t C | ∀T.C | ∃T.C | >nS.C | 6nS.C | ∃S.Self R ::= P | P − S ::= R | S ∩ S | S ∪ S | S \ S T ::= > | S | T ∪ T | T ◦ T | T ∗ | id(C) An assertion is of the form C(a), R(a, b), a ≈ b, or a 6≈ b, a general concept inclusion (GCI) has the form C1 v C2 . We use C1 ≡ C2 as abbreviation for C1 v C2 , C2 v C1 . A ZOIQ knowledge base (short: KB) K = (A, T ) consists of a finite set A (called ABox) of assertions and a finite set T (called TBox) of GCIs. The semantics of ZOIQ is defined via interpretations I = (∆I , ·I ) composed of a non-empty set ∆I called the domain of I and a function ·I mapping individual names to elements of ∆I , concept names to subsets of ∆I , and role names to subsets of ∆I ×∆I . This mapping is extended to concepts, simple roles, and roles (cf. Table 1). Then, a GCI C1 v C2 is satisfied by I if C1I ⊆ C2I , satisfaction of ABox assertions is defined in the obvious way. We say that an interpretation I satisfies a KB K = (A, T ) (or I is a model of K, written: I |= K) if it satisfies all axioms of A and T . K is called satisfiable if it has a model and unsatisfiable otherwise. From ZOIQ, we obtain ZIQ by disallowing nominal concepts {o}, ZOQ, by disallowing role inverses (.)− and ZOI by disallowing number restrictions. Table 1: Semantics of concepts (left) and (simple) roles (right) in ZOIQ . Name Syntax Semantics Name Syntax Semantics top > ∆I universal role > ∆I × ∆I bottom ⊥ ∅ bottom role ⊥ ∅ nominal {o} {oI } inverse role P− {(x, y) | (y, x) ∈ P I } negation ¬C ∆I \ C I intersection S1 ∩ S2 S1I ∩ S2I intersection C1 u C2 C1I ∩ C2I union S1 ∪ S2 S1I ∪ S2I union C1 t C2 C1I ∪ C2I difference S1 \ S2 S1I \ S2I univ. restriction ∀T.C {x | ∀y.(x, y) ∈ T I → y ∈ C I } concatenation T1 ◦ T2 I I S1 ◦ T2 I i T exist. restriction ∃T.C {x | ∃y.(x, y) ∈ T I ∧ y ∈ C I } Kleene star T∗ i≥0 (T ) qualified number 6nS.C {x | #{y∈C I | (x, y)∈S I } ≤ n} concept test id(C) {(x, x) | x ∈ C I } restriction >nS.C {x | #{y∈C I | (x, y)∈S I } ≥ n} Self concept ∃S.Self {x | (x, x) ∈ S I } Queries. In queries, we use variables from a countably infinite set V. A Boolean positive two-way regular path query (P2RPQ) is an expression q = ∃x.ϕ, where ϕ is a positive formula (i.e., one using ∧ and ∨) over expressions of the form T (s, t) or C(t), where T is a role, C is concept, and s and t are elements of x ∪ NI . A P2RPQ is a conjunctive two-way regular path query (C2RPQ) if it does not use disjunction. It is a union of conjunctive two-way regular path queries (UC2RPQ) if it is a disjunction of C2RPQs.3 A variable assignment Z for I is a mapping V → ∆I . For x ∈ V, we set xI,Z := Z(x); for c ∈ NI , we set cI,Z := cI . T (s, t) evaluates to true under Z and I if (sI,Z , tI,Z ) ∈ T I . C(t) evaluates to true under Z and I if tI,Z ∈ C I . A P2RPQ q = ∃x.ϕ is satisfied by I (written: I |= q) if there is a variable assignment Z (called match) such that ϕ evaluates to true under I and Z. A P2RPQ q is entailed by a KB K (written: K |= q) if every model of K satisfies q. If q uses only signature elements of K, we call it a query over K. Simplifications. A P2RPQ is simplified if all its atoms are of the form T (x, y) where x and y are variables and T is built from role names using ∪, ◦, and ∗ . For any KB K and P2RPQ q, we can construct in polynomial time a KB K0 and simplified P2RPQ q 0 such that K |= q iff K0 |= q 0 . Hence from here on, we will focus on simplified queries. Automata. For each atom T (x, y) of a simplified P2RPQ, T is a regular expression over NR . Every such expression can be represented by a nondeterministic finite au- tomaton A = (Σ, Q, I, F, T) with an alphabet Σ ⊆ {P, P − | P ∈ NR } of possibly in- verted role names, a finite set Q of states, initial states I ⊆ Q, final states F ⊆ Q, and transitions T ⊆ Q × Σ × Q.4 The size of A is polynomially bounded by T . In the following, we assume queries in automaton form, where the atoms T (x, y) have been replaced by the corresponding A(x, y). Given A = (Σ, Q, I, F, T), let A− = (Σ − , Q, F, I, T− ) with Σ − = {P − | P ∈ Σ} be the corresponding reverse automa- ton5 with initial and final states swapped and the state transitions flipped: (q0 , R− , q) ∈ 3 W W As ∃x. i ϕi ≡ i ∃x.ϕi , the order of disjunction and existential quantification is irrelevant. 4 The correspondence is such that (δ, δ 0 ) ∈ T I iff there is a role path δ − P1 − Pk → · · · −−→ δ 0 in I such that A accepts P1 . . . Pk . 5 For convenience, we consider (P − )− identical to P . T− for every (q, R, q0 ) ∈ T. Moreover, we obtain the “state-parameterized” automaton Aq,q0 = (Σ, Q, {q}, {q0 }, T) from A by setting the initial state to q and the final to q0 . 3 A Little Bit of Model Theory The “well-behavedness” of certain sublogics of ZOIQ can be conveniently charac- terised in model-theoretic terms. Definition 1 (quasi-forest model, [5]). Let K be a KB. A model I of K is a quasi-forest model if: – the domain ∆I of I is a forest, i.e., a prefix-closed subset of Roots × N∗ for some finite set Roots, – Roots = {oI | o is an individual name in K}, and – for every δ, δ 0 ∈ ∆I with (δ, δ 0 ) ∈ P I for some role name P ∈ NR \ {>}, either (i) {δ, δ 0 } ∩ Roots 6= ∅, or (ii) δ = δ 0 , or (iii) δ is a child of δ 0 , or (iv) δ 0 is a child of δ. A KB K has the quasi-forest model property (short: qfmp) if K is either unsatisfiable or it has a quasi-forest model. A DL L has the qfmp if every L KB K has the qfmp. For such well-behaved KBs, tight and quite decent complexity bounds for satisfiability checking have been established. Theorem 1 ([6]). Satisfiability checking of ZOIQ KBs having the qfmp is E XP T IME- complete even with binary encoding of number restrictions. When it comes to query answering, a useful notion for relating interpretations to each other is via homomorphisms. For two interpretations I = (∆I , ·I ) and J = (∆J , ·J ), a function h : ∆I → ∆J is a homomorphism from I to J if for all δ, δ 0 ∈ ∆I holds δ ∈ AI ⇒ h(δ) ∈ AJ for all concept names A, and (δ, δ 0 ) ∈ P I ⇒ (h(δ), h(δ 0 )) ∈ P J for all role names P . If a homomorphism from I to J exists, we write I B J . We note that I |= q and I B J imply J |= q for every simplified P2RPQ q. Definition 2 (qfhcp, [3]). A KB K has the quasi-forest homomorphism-cover property (short: qfhcp) if for every model I of K, there exists a quasi-forest model J of K with J B I. A description logic L has the qfhcp if every L KB K has the qfhcp. We let tame ZOIQ denote the set of all ZOIQ KBs having the qfhcp. Obviously, the qfhcp implies the qfmp. Note that unrestricted ZOIQ does not even have the qfmp. The most expressive syntactically defined sublogics of tame ZOIQ known today are ZIQ, ZOQ, and ZOI [6,2]. From the fact that the set of models for any P2RPQ is closed under homomorphisms now follows that for any tame ZOIQ KB with K 6|= q, there must exist a quasi-forest model I, satisfying I |= K and I 6|= q. 4 Annotating (Partial) Query Matches In the following, we consider a tame ZOIQ KB K and a simplified P2RPQ q over K in automaton form. We note that q can be rewritten into an equivalent disjunction W 1≤i≤n qi of n (also simplified) C2RPQs, where n may be exponential in the size of q, but every single qi ’s size is polynomial. Let [q] denote {q1 , . . . qn }. Then, testing K 6|= q amounts to determining if K has a model I which is a simultaneous countermodel (i.e., I 6|= qi ) for all qi ∈ [q]. We now give an intuitive description of our technique to detect (or refute) matches of q. Given some model I of K, we iteratively, deterministically annotate all its domain elements δ with fresh concept names QM that indicate which “parts” M of some query qi match into I and how δ participates in these partial matches. To this end, we employ descriptions M of query parts which contain information about (i) the query variables matched (ii) (the existence of) paths realising certain state transitions in the query’s au- tomata and (iii) optionally, the “position” of the current δ in the query match by use of a marker • acting like an additional, distinguished query variable. In the annotation process, the QM s will be assigned to domain elements δ based on their local environ- ment and known annotations QM 0 for “smaller” partial queries to the same element δ or its direct (role) neighborhood. As an exception, non-localised query matches M not containing • will be “broadcast”, i.e., uniformly attached to all domain elements. This way, in the annotation process, QM s for larger and larger partial queries are as- signed, until finally (after reaching the unique fixpoint) also the full matches for any qi are recognised by annotations Qqi . This process will accurately detect partial and full query matches, if I is a quasi-forest model (which is sufficient for our purposes as K has the qfhcp). The annotation process is realised by virtue of a KB Kq , the size of which is exponential in q, but only polynomial in K. In the following, we will stepwise introduce the KB, interleaved with some necessary definitions. Definition 3 (partial query). Consider a C2RPQ qi = {A1 (x1 , y1 ),..., Am (xm , ym )}.6 A partial query for qi and a set X ⊆ {x1 , y1 , . . . , xm , ym } of variables from qi is a set M consisting of – all atoms A(x, y) from qi for which {x, y} ⊆ X, – for every A(x, y) from qi with x ∈ X but y 6∈ X, one of the atoms Aq,q0 (x, •) or Aq,q0 (x, o) where q is initial in A, – for every A(x, y) from qi with x 6∈ X but y ∈ X, one of the atoms A− q,q0 (y, •) or A−q,q0 (y, o) where q is final in A, – in case X = ∅, exactly one atom of the form Aq,q0 (•, o) or A− q,q0 (•, o) (called nominal-anchored path) or Aq,q0 (•, •) or A− q,q0 (•, •) (called round trip) for some A from qi . A partial query is called monodic, if |X| = 1. A partial query is called local if it contains • and global otherwise. Nominal-Anchored Paths. First, we want to detect role paths that start in the to-be- annotated individual, end in named individual oI , and realise state transitions in one of the query’s automata. Let o be any individual name in K, let A± be either A or A− for any automaton A occurring in q with states q, q0 , q00 . Then let Kq contain the axiom Q{A± q,q (•,o)} (o) (1) 6 In cases, we may equate a C2RPQ qi with the set of its atoms. and whenever A± has a transition (q, R, q0 ): ∃R.Q{A±0 00 (•,o)} v Q{A± 00 (•,o)} (2) q ,q q,q Round Trips. Next, we are concerned with paths which start and end in the to-be- annotated individual. Assuming an automaton A with states q, q0 , q00 , q000 and transition set T as well as an individual name o, we add the axioms: > v Q{Aq,q (•,•)} (3) Q{Aq,q0 (•,•)} u Q{Aq0 ,q00 (•,•)} v Q{Aq,q00 (•,•)} (4) Q{Aq,q0 (•,o)} u Q{A−00 0 (•,o)} v Q{Aq,q00 (•,•)} (5) q ,q for (q, R, q0 ) ∈ T: ∃R.Self v Q{Aq,q0 (•,•)} (6) for {(q, R, q0 ), (q00 , R0 , q000 )} ⊆ T: ∃(R ∩ R0− ).Q{Aq0 ,q00 (•,•)} v Q{Aq,q000 (•,•)} (7) Initialising monodic partial queries. Next we determine domain elements to which separate query variables could possibly be mapped in a match. Definition 4. We call a monodic partial query M original, whenever q = q0 for ev- ery Aq,q0 (x, •) ∈ M and every A− q,q0 (x, •) ∈ M . For M an original monodic partial query for q and {x}, the set Prec M of precondition concepts consists of: – Q{Aq,q0 (•,o)} for each atom Aq,q0 (x, o) in M , – Q{A− 0 (•,o)} for each atom A− q,q0 (x, o) in M , F q,q – q 0initial Q{Aq,q0 (•,•)} for each atom A(x, x) in M . q final Now we can initialise all original monodic partial queries M by adding the following axioms to Kq : l Prec M v QM (8) Updating partial queries. Partial queries can be updated by roundtrips: Let M be a partial query containing some A±q,q0 (x, •). Then we realise the update by adding the following axioms to Kq : QM u Q{A±0 00 (•,•)} v QM \{A± 0 (x,•)}∪{A± 00 (x,•)} (9) q ,q q,q q,q Furthermore, a partial query can “progress” when making a “step” in the model, moving from the considered δ to some role neighbour. In such a step, all the “unready” paths (corresponding to atoms with •) must be updated in a synchronous manner. Given a partial query M for qi and nonempty X as well as a set {R1 , . . . , Rk } of (possibly inverted) role names, assume M 0 can be obtained from M by replacing each atom of the form A± ± ± 0 00 q,q0 (x, •) by an atom Aq,q00 (x, •) such that A has a transition (q , R, q ) for for any R ∈ {R1 , . . . , Rk }. Then add the following axiom to the KB Kq : ∃(R1− ∩ R2− ∩ . . . ∩ Rm − ).QM v QM 0 (10) Joining partial queries. When two partial queries corresponding to the same query qi “meet” in a domain element δ, they can – under certain circumstances – be “glued together” into a “bigger” partial query of qi . Definition 5 (joinable, join). Let M1 be a partial query for qi and X1 and let M2 be a partial query for qi and X2 . We call M1 and M2 joinable if – X1 6= ∅, X2 6= ∅, X1 ∩ X2 = ∅, and – for each A(x, y) ∈ qi \ (M1 ∪M2 ) with {x, y} ⊆ X1 ∪X2 , there are states q, q0 , q00 of A with q initial and q00 final, such that either {Aq,q0 (x, •), A− q00 ,q0 (y, •)} ∈ M1 ∪M2 − or {Aq,q (x, o), Aq00 ,q0 (y, o)} ∈ M1 ∪ M2 for some o ∈ NI . 0 For joinable M1 and M2 , their join, denoted M1 ./ M2 , is the partial query obtained from M1 ∪ M2 by replacing any pair of atoms Aq,q0 (x, ∗) and A− q00 ,q0 (y, ∗) by A(x, y), where ∗ is either an individual name o or •. We implement the join operation for every pair M1 , M2 of joinable partial queries for a qi ∈ [q] by extending Kq with: QM1 u QM2 v QM1 ./M2 (11) Broadcasting of global partial queries. Whenever a partial query M does not have any occurrences of •, which would tie it to a specific element, it will be “broadcast” to all domain elements: ∃>.QM v QM (12) This concludes the definition of Kq . Revisiting our initial intuition of Kq ’s purpose of deterministically annotating a model I of K, the following lemma formalises this by singling out one unique annotated model I ∗ for every I. Lemma 1. Let K be a ZOIQ KB and q a simplified P2RPQ. For every model I of K ∗ there exists a unique model I ∗ of K ∪ Kq which extends I and satisfies QIM ⊆ QJ M for every QM in Kq and for every model J of K ∪ Kq that extends I. We will call models I ∗ from Lemma 1 Q-minimal. Note that for every model I of K, ∗ any concept membership δ ∈ QIM holding in the corresponding Q-minimal model I ∗ can be derived from concept and role memberships in I through a finite sequence of “applications” of axioms from Kq .7 5 Reduction to Satisfiability Now we establish technical results relating the presence of QM -annotations in models and the actual semantic satisfaction of the corresponding partial query M . Note that any partial query M can be seen as a (set representation of a) C2RPQ in automaton form, assuming • is just an ordinary variable. 7 To see this, it may help to realise that all axioms of Kq can be easily expressed in monadic Datalog. Definition 6 ((tight) satisfaction of a partial query). Given an interpretation I, a domain element δ ∈ ∆I , and a partial query M , we say M is satisfied or holds in δ, written I, δ |= M , if there is a match Z for M in I with Z(•) = δ. M is satisfied |= M , if Z is such that every A± or holds tightly in δ, written I, δ |= q1 ,q2 (x, •) ∈ M is realized by a path Z(x) ; δ not containing oI for any individual name o. Note that |= and |=|= coincide whenever M is global or does not contain variables. Note also that, as a consequence of this definition, if M is global, then QM holds (tightly) everywhere or nowhere throughout the domain. With this notion in place, the next two lemmas can be seen as soundness and completeness results regarding the deduction calculus for (partial) query matches embodied by Kq . Lemma 2 (soundness). Let I be a Q-minimal model of K ∪ Kq and let δ ∈ ∆I . Let M be any partial query for any C2RPQ qi ∈ [q]. Then δ ∈ QIM implies I, δ |= M . Lemma 3 (completeness). Let I be a quasi-forest model of K ∪ Kq and let δ ∈ ∆I . |= M implies δ ∈ QIM . Let M be any partial query for any C2RPQ qi ∈ [q]. Then I, δ |= Soundness is proven by induction over the length of the derivation for δ ∈ QIM . Com- pleteness is shown by induction over the spread of the match for M , i.e., the number of variables (except •) plus the sum of the lengths of all paths realising the query atoms. Thanks to these correspondences, we can essentially rule out models with matches of any q1 , . . . , qn by forcing the extensions of Qq1 , . . . , Qqn to be empty. Therefore, let K¬q = K ∪ Kq ∪ {Qqi v ⊥ | qi ∈ [q]}. (13) We establish some syntactic and semantic properties of K¬q : Some easy calculations yield size bounds for K¬q , tameness is not affected by the reduction, and unsatisfiability of K¬q indeed coincides with query entailment. Fact 4 (size of K¬q ) The size of K¬q is single exponential in the total size of K and q. It is polynomial, if the number of atoms in q is bounded by a constant. Lemma 5. If K is in tame ZOIQ then so is K¬q . Theorem 2. Let K be a tame ZOIQ KB and let q be a simplified P2RPQ over K. Then K |= q if and only if K¬q is unsatisfiable. Proof. We show the equivalent statement: K 6|= q if and only if K¬q satisfiable. “only if”: Assume K 6|= q, that is there exists an I with I |= K and I 6|= q. Consider ∗ I ∗ , the Q-minimal model of K ∪ Kq extending I. If some QIqi were nonempty, then there would be a query match of q into I by Lemma 2, contradicting our assumption. Therefore, I ∗ |= Qqi v ⊥ for all qi ∈ [q] and hence, I ∗ |= K¬q . “if”: Assume K¬q is satisfiable and hence has a model I. As K¬q has the qfmp thanks to Lemma 5, we can assume I to be a quasi-forest model. I |= Qqi v ⊥ implies empti- ness of QIqi for every qi ∈ [q]. If I |= q held true, witnessed by I |= qi for some qi ∈ [q], Lemma 3 would require δ ∈ QIqi for all δ ∈ ∆I , leading to a contradiction. Hence I 6|= q. On the other hand, I |= K¬q implies I |= K, therefore K 6|= q. Based on this theorem, we can now prove our central result: Theorem 3. Checking entailment of simplified P2RPQs from tame ZOIQ KBs with binary number encoding is 2E XP T IME-complete. It is E XP T IME-complete if the num- ber of atoms in the query is bounded. Proof. By Theorem 2, entailment of a simplified C2RPQ q from a tame ZOIQ KB K can be reduced to checking unsatisfiability of K¬q , which can be computed in output- polynomial time and the size of which is exponential (polynomial, if number of atoms in q is bounded). By Theorem 1, (un)satisfiability of K¬q can be checked in E XP T IME in the size of K¬q , hence we obtain the desired 2E XP T IME and E XP T IME upper bounds. The matching lower bounds are well-known even for much weaker logics [9]. 6 Querying GC 2 Databases We now consider the problem of P2RPQ entailment from GC 2 databases, i.e., sets of unary and binary ground facts in the presence of a sentence in the guarded two-variable fragment with counting quantifiers as defined by Pratt-Hartmann ([10]). We will show that there is a polytime query-entailment-preserving transformation from GC 2 databases to ZIQ KBs. This allows us to transfer the results from Theorem 3. Matching lower bounds follow from known results for much weaker logics and query formalisms (such as CQs over ALCI [7]). Definition 7 (GC 2 , [10]). For a signature S of nullary (N0 ), unary (NC ), and binary (NR ∪ {≈}) predicates,8 let GC 2 be the smallest set of formulae that contains all atoms over S using only variables from {x, y}, that is closed under Boolean combinations, and that contains – all formulae ∃u.ϕ and ∀u.ϕ with u ∈ {x, y} whenever ϕ is a GC 2 formula with at most one free variable, and Q Q – all formulae ∀u.(γ → ϕ) as well as u.(γ ∧ ϕ) and u.(γ) where γ is a binary atom (called “guard”) containing both x and y, ϕ is a GC 2 formula, and is any of Q ∃, ∃6n , ∃=n , or ∃>n . A GC 2 database DB is a theory {Φ} ∪ A consisting of a GC 2 sentence Φ and a finite set A of unary and binary ground facts over constants from NI .9 We denote by GC 2qef the set of GC 2 formulae which are quantifier-free and do not use ≈. A P2RPQ over a GC 2 database is a P2RPQ where for every atom of the form C(t) holds C ∈ NC and for every atom of the form T (s, t), every subexpression of the form id(C) satisfies C ∈ NC . P2RPQ entailment from GC 2 databases is defined in the same way as for DLs. For convenience, we make use of a special normal form for GC 2 introduced in the following. 8 As unary and binary predicates correspond to concept and role names, respectively, we use the same set symbols to denote them. 9 Note that ABoxes, as defined before, are syntactically and semantically identical to finite sets of unary and binary ground facts, so we consider the notions the same and use the same symbol A. Note also that by definition, Φ does not contain constants. Lemma 6 (normal form, [11]). For any GC 2 sentence Φ, one can compute in polyno- mial time a GC 2 sentence NF(Φ) of the form V ∀xα ∧ 1≤h≤` ∀x∀y (Eh (x, y) → βh ∨ x ≈ y)) V (14) ∧ 1≤i≤m ∀x∃=ni y (Fi (x, y) ∧ x 6≈ y) , with α ∈ GC 2qef not using y, {E1 , ... , E` , F1 , ... , Fm } ⊆ NR , {β1 , ... , β` } ⊆ GC 2qef , and {n1 , ... , nm } ⊆ N, such that (i) NF(Φ) |= Φ and (ii) every model of Φ with at least maxh nh elements can be extended to a model of NF(Φ). Given a GC 2 database DB = {Φ} ∪ A, we let NF(DB) = {NF(Φ)} ∪ A. Thanks to the previous lemma and the known property that models of GC 2 sentences are closed under disjoint self-union, the normal form can be shown to be query-entailment-preserving. Lemma 7. For every P2RPQ q over a GC 2 database DB holds DB |= q if and only if NF(DB) |= q. We next provide a query-entailment-preserving polynomial translation of GC 2 databases into ZIQ KBs. Definition 8. Let Ψ be a GC 2 sentence in normal form as in Lemma 6. We define the ZIQ TBox TΨ as follows: [ [ TΨ = T≈ ∪ Tps ∪ Tα ∪ TEh ,βh ∪ TFi ,ni (15) 1≤h≤l 1≤i≤m Thereby (introducing fresh concept names P ropP and role names R≈ and REh ,at ): – T≈ = {> v 61R≈ .>, > v ∃R≈ .Self} – Tps = {P ropP ≡ ∃>.P ropP | P ∈ N0 } – Tα = {> v tr(α)}, where tr(P ) = P ropP tr(¬ϕ) = ¬tr(ϕ) tr(P (x)) = P tr(ϕ ∧ ψ) = tr(ϕ) u tr(ψ) tr(P (x, x)) = ∃P.Self tr(ϕ ∨ ψ) = tr(ϕ) t tr(ψ) S – TEh ,βh = at∈atoms(βh ) (axEh (at) ∪ {> v ∀(REh ,at \Eh ).⊥}) where axEh maps atoms ∪ {> v ∀(Eh \(R≈ ∪ trEh (βh ))).⊥} in βh to sets of GCIs as follows: P 7→ { ∃REh ,P .> v P ropP , ∃(Eh \REh ,P ).> v ¬P ropP } P (x) 7→ { ∃REh ,P (x) .> v P, ∃(Eh \REh ,P (x) ).> v ¬P } P (x, x) 7→ { ∃REh ,P (x,x) .> v ∃P.Self, ∃(Eh \REh ,P (x,x) ).> v ¬∃P.Self} − P (y) 7→ { ∃RE h ,P (y) .> v P, ∃(Eh− \RE − h ,P (y) ).> v ¬P } − − − P (y, y) 7→ { ∃REh ,P (y,y) .> v ∃P.Self, ∃(Eh \REh ,P (y,y) ).> v ¬∃P.Self} P (x, y) 7→ { ∃(REh ,P (x,y) \P ).> v ⊥, ∃(Eh ∩P \REh ,P (x,y) ).> v ⊥ } − − − P (y, x) 7→ { ∃(RE h ,P (y,x) \P ).> v ⊥, ∃(E h ∩P \R Eh ,P (y,x) ).> v ⊥ } and trEh maps GC 2qef formulae to simple roles: trEh (at) = REh ,at trEh (ρ ∧ ρ0 ) = trEh (ρ) ∩ trEh (ρ0 ) trEh (¬ρ) = Eh \tr(ρ) trEh (ρ ∨ ρ0 ) = trEh (ρ) ∪ trEh (ρ0 ) – TFi ,ni = { > v 6ni (Fi \R≈ ).> u >ni (Fi \R≈ ).> } Given a GC 2 database DB = {Φ} ∪ A, we let KDB denote the ZIQ KB (TNF(Φ) , A). Intuitively, T≈ axiomatises equality, Tps introduces synchronised concept names P ropP for every propositional symbol P from DB, since ZIQ does not allow for nullary pred- icates. Tα , TEh ,βh , and TFi ,ni implement the conjuncts from NF(Φ). We can now observe that KDB can be computed in polytime wrt. the size of DB and that it has the qfhcp (since it is a ZIQ KB). Moreover we can show that any model of KDB can be extended to one of NF(DB) and, conversely, any model of NF(DB) can be extended to one of KDB . With the help of these correspondences and Lemma 7, we can then argue that for any P2RPQ q over DB holds DB |= q iff KDB |= q. These findings allow us to apply Theorem 3 to establish the following result. Theorem 4. Checking P2RPQ entailment from GC 2 databases is 2E XP T IME-complete. It is E XP T IME-complete if the number of query atoms is bounded by a constant. 7 Conclusion We have established tight complexity bounds for expressive querying in very expressive DLs under the assumption of succinct (i.e. binary) encoding of number restrictions. Ar- guing along the lines of Calvanese et al. [6], we can leverage our findings to strengthen their results on query containment as well as the SR family as follows:10 Theorem 5 (query containment). Testing query entailment K |= q 0 ⊆ q is in 2E XP - T IME with respect to the total size of q 0 , q, and K (with binary encoding of number restrictions) if (i) K is in ZOQ or ZOI and q 0 and q are P2RPQs over K, or (ii) K is in ZIQ, q 0 is a conjunctive query and q is a P2RPQ over K. The complexity drops to E XP T IME if the number of atoms occurring in q is bounded by a constant. Theorem 6 (SR family). Deciding K |= q and K |= q 0 ⊆ q is in 3E XP T IME in the total size of q 0 , q, K (with binary encoding of numbers) – and in 2E XP T IME if K’s RBox is given by defining regular expressions for the non-simple roles – if (i) K is in SROQ or SROI and q 0 , q are P2RPQs over K, or (ii) K is in SRIQ, q 0 is a conjunctive query and q is a P2RPQ over K. The complexities are in 2E XP T IME and in E XP T IME, respectively, if the number of atoms occurring in q is bounded by a constant. There are plenty of open questions left for future work. First, the results established here hold for arbitrary models, however, so far, very little is known about finite (model) query answering in DLs from the Z family and the methods applied here do not seem to easily carry over to the finite model setting. Second, while the extension of our query formalism by nestings in the sense of Bienvenu et al. [1] seems straightforward without impacting complexities, our technique seems not readily extendable to capture more elaborate query languages [14,4,12]. 10 For space reasons, we assume the reader to be familiar with the notions used in these theorems. References 1. Meghyn Bienvenu, Diego Calvanese, Magdalena Ortiz, and Mantas Simkus. Nested regular path queries in description logics. In Chitta Baral, Giuseppe De Giacomo, and Thomas Eiter, editors, Proc. 14th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR’14), pages 218–227. AAAI Press, 2014. 2. Piero A. Bonatti, Carsten Lutz, Aniello Murano, and Moshe Y. Vardi. The complexity of enriched mu-calculi. Logical Methods in Computer Science, 4(3:11):1–27, 2008. 3. Pierre Bourhis, Markus Krötzsch, and Sebastian Rudolph. How to best nest regular path queries. In Proc. 27th Int. Workshop on Description Logics (DL’14), volume 1193 of CEUR Workshop Proceedings, pages 404–415. CEUR-WS.org, 2014. 4. Pierre Bourhis, Markus Krötzsch, and Sebastian Rudolph. Reasonable highly expressive query languages. In Qiang Yang and Michael Wooldridge, editors, Proc. 24th Int. Joint Conf. on Artificial Intelligence (IJCAI’15), pages 2826–2832. AAAI Press, 2015. 5. Diego Calvanese, Thomas Eiter, and Magdalena Ortiz. Answering regular path queries in expressive description logics: An automata-theoretic approach. In Proc. 22nd AAAI Conf. on Artificial Intelligence (AAAI’07), pages 391–396. AAAI Press, 2007. 6. Diego Calvanese, Thomas Eiter, and Magdalena Ortiz. Regular path queries in expressive description logics with nominals. In Craig Boutilier, editor, Proc. 21st Int. Joint Conf. on Artificial Intelligence (IJCAI’09), pages 714–720. IJCAI, 2009. 7. Carsten Lutz. The complexity of conjunctive query answering in expressive description logics. In Alessandro Armando, Peter Baumgartner, and Gilles Dowek, editors, Proc. 4th Int. Joint Conf. on Automated Reasoning (IJCAR’08), number 5195 in LNAI, pages 179– 193. Springer, 2008. 8. Magdalena Ortiz, Sebastian Rudolph, and Mantas Simkus. Query answering is undecidable in DLs with regular expressions, inverses, nominals, and counting. INFSYS Research Report 1843-10-03, TU Vienna, 2010. 9. Magdalena Ortiz and Mantas Simkus. Reasoning and query answering in description logics. In Thomas Eiter and Thomas Krennwallner, editors, Proc. 8th Int. Reasoning Web Summer School (RW’12), volume 7487 of Lecture Notes in Computer Science, pages 1–53. Springer, 2012. 10. Ian Pratt-Hartmann. Complexity of the guarded two-variable fragment with counting quan- tifiers. J. Log. Comput., 17(1):133–155, 2007. 11. Ian Pratt-Hartmann. Data-complexity of the two-variable fragment with counting quantifiers. Inf. Comput., 207(8):867–888, 2009. 12. Juan L. Reutter, Miguel Romero, and Moshe Y. Vardi. Regular queries on graph databases. Theor. Comp. Sys., 61(1):31–83, July 2017. 13. Sebastian Rudolph. Undecidability Results for Database-Inspired Reasoning Problems in Very Expressive Description Logics. In Chitta Baral, James P. Delgrande, and Frank Wolter, editors, Proc. 15th Int. Conf. on Knowledge Representation and Reasoning (KR’16), pages 247–257. AAAI Press, 2016. 14. Sebastian Rudolph and Markus Krötzsch. Flag & check: Data access with monadically de- fined queries. In Richard Hull and Wenfei Fan, editors, Proc. 32nd Symposium on Principles of Database Systems (PODS’13), pages 151–162. ACM, 2013.