-

Worst-Case Optimal Querying of Very Expressive Description Logics with Path Expressions and Succinct Counting

Bartosz Bednarczyk

0 1

Sebastian Rudolph

0 0 Computational Logic Group, TU Dresden , Germany 1 Institute of Computer Science, University of Wrocław , Poland

Among the most expressive knowledge representation formalisms are the description logics of the Z family. For well-behaved fragments of ZOIQ, entailment of positive two-way regular path queries is well known to be 2EXPTIMEcomplete under the proviso of unary encoding of numbers in cardinality constraints. We show that this assumption can be dropped without an increase in complexity and EXPTIME-completeness can be achieved when bounding the number of query atoms, using a novel reduction from query entailment to knowledge base satisfiability. These findings allow to strengthen other results regarding query entailment and query containment problems in very expressive description logics. Our results also carry over to GC2, the two-variable guarded fragment of firstorder logic with counting quantifiers, for which hitherto only conjunctive query entailment has been investigated.

1 Introduction Recent years have seen a convergence of the fields of logic-based knowledge representation (KR) and databases, with querying over knowledge bases (KBs, aka ontologies) as the archetypical inferencing task considered. Thereby, the query languages of interest have evolved from plain conjunctive queries to more expressive formalisms with pathnavigational components such as the positive two-way regular path queries (P2RPQs) considered here. One popular and practically used way of specifying the to-be-queried knowledge bases is via description logics (DLs), whose most expressive exemplars feature advanced constructors for roles and path expressions. Certainly, the DLs from the Z family [ 6 ] are among the most powerful KR formalisms on the verge of decidability. For its most expressive proponent, ZOIQ, featuring nominals (O), role inverses (I), and number restrictions (Q), querying is undecidable [ 8,13 ] and even decidability of KB satisfiability is open, owing to the intricate interplay of the three mentioned features, but restricting the interaction of O, I, and Q (or excluding one of the features altogether) leads to beneficial model-theoretic properties, which give rise to upper bounds of EXPTIME for KB satisfiability and 2EXPTIME for querying, established using elaborate automata techniques [ 6 ]. While the first bound holds under the assumption of binary ? Supported by “Diamentowy Grant” no. DI2017 006447 ?? Supported by ERC Consolidator Grant 771779 (DeciGUT) encoding of number restrictions, the proof for the second one requires unary encoding. While unary encoding of numbers is often silently accepted in the DL community, it is considered “cheating” in others.

In our paper, we overcome this restriction. Leveraging said model-theoretic property of the considered DLs, we provide a novel reduction from the query entailment problem K j= q to the problem of checking unsatisfiability of some other knowledge base K:q. Thereby, the size of the latter is exponential in the total size of q and K (even when assuming binary encoding), leading to a 2EXPTIME-complete complexity. Our technique also yields the new insight that the complexity drops to EXPTIME, when the number of atoms in q is bounded by a constant. The obtained results allow to correspondingly improve a number of previous results on query containment and can be transferred to DLs from the SR family. Reaching out to the community researching decidable fragments of first-order logic, we show that our results also extend to entailment of P2RPQs by GC2 databases, i.e. sets of unary and binary ground facts in the presence of a sentence of the guarded two-variable fragment with counting as defined by Pratt-Hartmann [ 10 ]. 2

Preliminaries Description Logics. We will very briefly recap the very expressive DL ZOIQ and its relevant sublogics following Calvanese et al. [ 6 ]. We assume a fixed signature consisting of sets NI of individual names, NC of concept names, and NR of role names. We let NC contain the special concept names > and ?, and NR the special role names > and ?. The following EBNF grammar defines atomic concepts B, concepts C, atomic roles R, simple roles S and roles T , with o; a; b 2 NI, A 2 NC, P 2 NR n f>g: B ::= A j fog C ::= B j :C j C u C j C t C j 8T:C j 9T:C j >nS:C j 6nS:C j 9S:Self R ::= P j P S ::= R j S \ S j S [ S j S n S

T ::= > j S j T [ T j T T j T j id(C) An assertion is of the form C(a), R(a; b), a b, or a 6 b, a general concept inclusion (GCI) has the form C1 v C2. We use C1 C2 as abbreviation for C1 v C2, C2 v C1. A ZOIQ knowledge base (short: KB) K = (A; T ) consists of a finite set A (called ABox) of assertions and a finite set T (called TBox) of GCIs.

The semantics of ZOIQ is defined via interpretations I = ( I ; I ) composed of a non-empty set I called the domain of I and a function I mapping individual names to elements of I , concept names to subsets of I , and role names to subsets of I I . This mapping is extended to concepts, simple roles, and roles (cf. Table 1). Then, a GCI C1 v C2 is satisfied by I if C1I C2I , satisfaction of ABox assertions is defined in the obvious way. We say that an interpretation I satisfies a KB K = (A; T ) (or I is a model of K, written: I j= K) if it satisfies all axioms of A and T . K is called satisfiable if it has a model and unsatisfiable otherwise. From ZOIQ, we obtain ZIQ by disallowing nominal concepts fog, ZOQ, by disallowing role inverses (:) and ZOI by disallowing number restrictions. Queries. In queries, we use variables from a countably infinite set V. A Boolean positive two-way regular path query (P2RPQ) is an expression q = 9x:', where ' is a positive formula (i.e., one using ^ and _) over expressions of the form T (s; t) or C(t), where T is a role, C is concept, and s and t are elements of x [ NI. A P2RPQ is a conjunctive two-way regular path query (C2RPQ) if it does not use disjunction. It is a union of conjunctive two-way regular path queries (UC2RPQ) if it is a disjunction of C2RPQs.3 A variable assignment Z for I is a mapping V ! I . For x 2 V, we set xI;Z := Z(x); for c 2 NI, we set cI;Z := cI . T (s; t) evaluates to true under Z and I if (sI;Z ; tI;Z ) 2 T I . C(t) evaluates to true under Z and I if tI;Z 2 CI . A P2RPQ q = 9x:' is satisfied by I (written: I j= q) if there is a variable assignment Z (called match) such that ' evaluates to true under I and Z. A P2RPQ q is entailed by a KB K (written: K j= q) if every model of K satisfies q. If q uses only signature elements of K, we call it a query over K.

Simplifications. A P2RPQ is simplified if all its atoms are of the form T (x; y) where x and y are variables and T is built from role names using [, , and . For any KB K and P2RPQ q, we can construct in polynomial time a KB K0 and simplified P2RPQ q0 such that K j= q iff K0 j= q0. Hence from here on, we will focus on simplified queries. Automata. For each atom T (x; y) of a simplified P2RPQ, T is a regular expression over NR. Every such expression can be represented by a nondeterministic finite automaton A = ( ; Q; I; F; T) with an alphabet fP; P j P 2 NRg of possibly inverted role names, a finite set Q of states, initial states I Q, final states F Q, and transitions T Q Q.4 The size of A is polynomially bounded by T . In the following, we assume queries in automaton form, where the atoms T (x; y) have been replaced by the corresponding A(x; y). Given A = ( ; Q; I; F; T), let A = ( ; Q; F; I; T ) with = fP j P 2 g be the corresponding reverse automaton5 with initial and final states swapped and the state transitions flipped: (q0; R ; q) 2 3 As 9x: Wi 'i Wi 9x:'i, the order of disjunction and existential quantification is irrelevant. 4 The correspondence is such that ( ; 0) 2 T I iff there is a role path P!1 P!k 0 in I such that A accepts P1 : : : Pk. 5 For convenience, we consider (P ) identical to P .

T for every (q; R; q0) 2 T. Moreover, we obtain the “state-parameterized” automaton Aq;q0 = ( ; Q; fqg; fq0g; T) from A by setting the initial state to q and the final to q0. 3

A Little Bit of Model Theory The “well-behavedness” of certain sublogics of ZOIQ can be conveniently characterised in model-theoretic terms.

Definition 1 (quasi-forest model, [ 5 ]). Let K be a KB. A model I of K is a quasi-forest model if: – the domain I of I is a forest, i.e., a prefix-closed subset of Roots N for some finite set Roots, – Roots = foI j o is an individual name in Kg, and – for every ; 0 2 I with ( ; 0) 2 P I for some role name P 2 NR n f>g, either (i) f ; 0g \ Roots 6= ;, or (ii) = 0; or (iii) is a child of 0, or (iv) 0 is a child of . A KB K has the quasi-forest model property (short: qfmp) if K is either unsatisfiable or it has a quasi-forest model. A DL L has the qfmp if every L KB K has the qfmp. For such well-behaved KBs, tight and quite decent complexity bounds for satisfiability checking have been established.

Theorem 1 ([ 6 ]). Satisfiability checking of ZOIQ KBs having the qfmp is EXPTIMEcomplete even with binary encoding of number restrictions.

When it comes to query answering, a useful notion for relating interpretations to each other is via homomorphisms. For two interpretations I = ( I ; I ) and J = ( J ; J ), a function h : I ! J is a homomorphism from I to J if for all ; 0 2 I holds 2 AI ) h( ) 2 AJ for all concept names A, and ( ; 0) 2 P I ) (h( ); h( 0)) 2 P J for all role names P . If a homomorphism from I to J exists, we write I B J . We note that I j= q and I B J imply J j= q for every simplified P2RPQ q.

Definition 2 (qfhcp, [ 3 ]). A KB K has the quasi-forest homomorphism-cover property (short: qfhcp) if for every model I of K, there exists a quasi-forest model J of K with J B I. A description logic L has the qfhcp if every L KB K has the qfhcp. We let tame ZOIQ denote the set of all ZOIQ KBs having the qfhcp.

Obviously, the qfhcp implies the qfmp. Note that unrestricted ZOIQ does not even have the qfmp. The most expressive syntactically defined sublogics of tame ZOIQ known today are ZIQ, ZOQ, and ZOI [ 6,2 ]. From the fact that the set of models for any P2RPQ is closed under homomorphisms now follows that for any tame ZOIQ KB with K 6j= q, there must exist a quasi-forest model I, satisfying I j= K and I 6j= q. 4

Annotating (Partial) Query Matches In the following, we consider a tame ZOIQ KB K and a simplified P2RPQ q over K in automaton form. We note that q can be rewritten into an equivalent disjunction W1 i n qi of n (also simplified) C2RPQs, where n may be exponential in the size of q, but every single qi’s size is polynomial. Let [q] denote fq1; : : : qng. Then, testing K 6j= q amounts to determining if K has a model I which is a simultaneous countermodel (i.e., I 6j= qi) for all qi 2 [q].

We now give an intuitive description of our technique to detect (or refute) matches of q. Given some model I of K, we iteratively, deterministically annotate all its domain elements with fresh concept names QM that indicate which “parts” M of some query qi match into I and how participates in these partial matches. To this end, we employ descriptions M of query parts which contain information about (i) the query variables matched (ii) (the existence of) paths realising certain state transitions in the query’s automata and (iii) optionally, the “position” of the current in the query match by use of a marker acting like an additional, distinguished query variable. In the annotation process, the QM s will be assigned to domain elements based on their local environment and known annotations QM 0 for “smaller” partial queries to the same element or its direct (role) neighborhood. As an exception, non-localised query matches M not containing will be “broadcast”, i.e., uniformly attached to all domain elements. This way, in the annotation process, QM s for larger and larger partial queries are assigned, until finally (after reaching the unique fixpoint) also the full matches for any qi are recognised by annotations Qqi . This process will accurately detect partial and full query matches, if I is a quasi-forest model (which is sufficient for our purposes as K has the qfhcp). The annotation process is realised by virtue of a KB Kq , the size of which is exponential in q, but only polynomial in K. In the following, we will stepwise introduce the KB, interleaved with some necessary definitions.

Definition 3 (partial query). Consider a C2RPQ qi = fA1(x1; y1);:::; Am(xm; ym)g.6 A partial query for qi and a set X fx1; y1; : : : ; xm; ymg of variables from qi is a set M consisting of – all atoms A(x; y) from qi for which fx; yg X , – for every A(x; y) from qi with x 2 X but y 62 X , one of the atoms Aq;q0 (x; ) or

Aq;q0 (x; o) where q is initial in A, – for every A(x; y) from qi with x 62 X but y 2 X , one of the atoms Aq;q0 (y; ) or

Aq;q0 (y; o) where q is final in A, – in case X = ;, exactly one atom of the form Aq;q0 ( ; o) or Aq;q0 ( ; o) (called nominal-anchored path) or Aq;q0 ( ; ) or Aq;q0 ( ; ) (called round trip) for some A from qi.

A partial query is called monodic, if jX j = 1. A partial query is called local if it contains and global otherwise.

Nominal-Anchored Paths. First, we want to detect role paths that start in the to-beannotated individual, end in named individual oI , and realise state transitions in one of the query’s automata. Let o be any individual name in K, let A be either A or A for any automaton A occurring in q with states q; q0; q00. Then let Kq contain the axiom QfAq;q( ;o)g(o) (1) 6 In cases, we may equate a C2RPQ qi with the set of its atoms. and whenever A has a transition (q; R; q0):

9R:QfAq0;q00 ( ;o)g v QfAq;q00 ( ;o)g Round Trips. Next, we are concerned with paths which start and end in the to-beannotated individual. Assuming an automaton A with states q; q0; q00; q000 and transition set T as well as an individual name o, we add the axioms: (2) (3) (4) (5) (6) (7) (8) (9) Initialising monodic partial queries. Next we determine domain elements to which separate query variables could possibly be mapped in a match.

Definition 4. We call a monodic partial query M original, whenever q = q0 for every Aq;q0 (x; ) 2 M and every Aq;q0 (x; ) 2 M . For M an original monodic partial query for q and fxg, the set PrecM of precondition concepts consists of: – QfAq;q0 ( ;o)g for each atom Aq;q0 (x; o) in M , – QfAq;q0 ( ;o)g for each atom Aq;q0 (x; o) in M , – F q initial QfAq;q0 ( ; )g for each atom A(x; x) in M .

q0 final Now we can initialise all original monodic partial queries M by adding the following axioms to Kq:

l PrecM v QM Updating partial queries. Partial queries can be updated by roundtrips: Let M be a partial query containing some Aq;q0 (x; ). Then we realise the update by adding the following axioms to Kq:

QM u QfAq0;q00 ( ; )g v QMnfAq;q0 (x; )g[fAq;q00 (x; )g

> v QfAq;q( ; )g QfAq;q0 ( ; )g u QfAq0;q00 ( ; )g v QfAq;q00 ( ; )g

QfAq;q0 ( ;o)g u QfAq00;q0 ( ;o)g v QfAq;q00 ( ; )g for (q; R; q0) 2 T:

9R:Self v QfAq;q0 ( ; )g for f(q; R; q0); (q00; R0; q000)g

T: 9(R \ R0 ):QfAq0;q00 ( ; )g v QfAq;q000 ( ; )g Furthermore, a partial query can “progress” when making a “step” in the model, moving from the considered to some role neighbour. In such a step, all the “unready” paths (corresponding to atoms with ) must be updated in a synchronous manner. Given a partial query M for qi and nonempty X as well as a set fR1; : : : ; Rkg of (possibly inverted) role names, assume M 0 can be obtained from M by replacing each atom of the form Aq;q0 (x; ) by an atom Aq;q00 (x; ) such that A has a transition (q0; R; q00) for for any R 2 fR1; : : : ; Rkg. Then add the following axiom to the KB Kq: (10) Joining partial queries. When two partial queries corresponding to the same query qi “meet” in a domain element , they can – under certain circumstances – be “glued together” into a “bigger” partial query of qi.

Definition 5 (joinable, join). Let M1 be a partial query for qi and X1 and let M2 be a partial query for qi and X2. We call M1 and M2 joinable if – X1 6= ;, X2 6= ;, X1 \ X2 = ;, and – for each A(x; y) 2 qi n (M1[M2) with fx; yg X1[X2, there are states q; q0; q00 of A with q initial and q00 final, such that either fAq;q0 (x; ); Aq00;q0 (y; )g 2 M1[M2 or fAq;q0 (x; o); Aq00;q0 (y; o)g 2 M1 [ M2 for some o 2 NI.

For joinable M1 and M2, their join, denoted M1 ./ M2, is the partial query obtained from M1 [ M2 by replacing any pair of atoms Aq;q0 (x; ) and Aq00;q0 (y; ) by A(x; y), where is either an individual name o or .

We implement the join operation for every pair M1, M2 of joinable partial queries for a qi 2 [q] by extending Kq with:

QM1 u QM2 v QM1./M2 Broadcasting of global partial queries. Whenever a partial query M does not have any occurrences of , which would tie it to a specific element, it will be “broadcast” to all domain elements: 9>:QM v QM (11) (12) This concludes the definition of Kq. Revisiting our initial intuition of Kq’s purpose of deterministically annotating a model I of K, the following lemma formalises this by singling out one unique annotated model I for every I.

Lemma 1. Let K be a ZOIQ KB and q a simplified P2RPQ. For every model I of K there exists a unique model I of K [ Kq which extends I and satisfies QIM QJM for every QM in Kq and for every model J of K [ Kq that extends I.

We will call models I from Lemma 1 Q-minimal. Note that for every model I of K, any concept membership 2 QIM holding in the corresponding Q-minimal model I can be derived from concept and role memberships in I through a finite sequence of “applications” of axioms from Kq.7 5

Reduction to Satisfiability Now we establish technical results relating the presence of QM -annotations in models and the actual semantic satisfaction of the corresponding partial query M . Note that any partial query M can be seen as a (set representation of a) C2RPQ in automaton form, assuming is just an ordinary variable. 7 To see this, it may help to realise that all axioms of Kq can be easily expressed in monadic

Datalog.

Definition 6 ((tight) satisfaction of a partial query). Given an interpretation I, a domain element 2 I , and a partial query M , we say M is satisfied or holds in , written I; j= M , if there is a match Z for M in I with Z( ) = . M is satisfied or holds tightly in , written I; j=j= M , if Z is such that every Aq1;q2 (x; ) 2 M is realized by a path Z(x) ; not containing oI for any individual name o. Note that j= and j=j= coincide whenever M is global or does not contain variables. Note also that, as a consequence of this definition, if M is global, then QM holds (tightly) everywhere or nowhere throughout the domain. With this notion in place, the next two lemmas can be seen as soundness and completeness results regarding the deduction calculus for (partial) query matches embodied by Kq.

Lemma 2 (soundness). Let I be a Q-minimal model of K [ Kq and let 2 I . Let M be any partial query for any C2RPQ qi 2 [q]. Then 2 QIM implies I; j= M . Lemma 3 (completeness). Let I be a quasi-forest model of K [ Kq and let Let M be any partial query for any C2RPQ qi 2 [q]. Then I; j=j= M implies 2 2 QIM .

I .

Soundness is proven by induction over the length of the derivation for 2 QIM . Completeness is shown by induction over the spread of the match for M , i.e., the number of variables (except ) plus the sum of the lengths of all paths realising the query atoms.

Thanks to these correspondences, we can essentially rule out models with matches of any q1; : : : ; qn by forcing the extensions of Qq1 ; : : : ; Qqn to be empty. Therefore, let K:q = K [ Kq [ fQqi v ? j qi 2 [q]g: (13) We establish some syntactic and semantic properties of K:q: Some easy calculations yield size bounds for K:q, tameness is not affected by the reduction, and unsatisfiability of K:q indeed coincides with query entailment.

Fact 4 (size of K:q) The size of K:q is single exponential in the total size of K and q. It is polynomial, if the number of atoms in q is bounded by a constant.

Lemma 5. If K is in tame ZOIQ then so is K:q.

Theorem 2. Let K be a tame ZOIQ KB and let q be a simplified P2RPQ over K. Then K j= q if and only if K:q is unsatisfiable.

Proof. We show the equivalent statement: K 6j= q if and only if K:q satisfiable. “only if”: Assume K 6j= q, that is there exists an I with I j= K and I 6j= q. Consider I , the Q-minimal model of K [ Kq extending I. If some QqIi were nonempty, then there would be a query match of q into I by Lemma 2, contradicting our assumption. Therefore, I j= Qqi v ? for all qi 2 [q] and hence, I j= K:q. “if”: Assume K:q is satisfiable and hence has a model I. As K:q has the qfmp thanks to Lemma 5, we can assume I to be a quasi-forest model. I j= Qqi v ? implies emptiness of QqIi for every qi 2 [q]. If I j= q held true, witnessed by I j= qi for some qi 2 [q], Lemma 3 would require 2 QqIi for all 2 I , leading to a contradiction. Hence I 6j= q. On the other hand, I j= K:q implies I j= K, therefore K 6j= q. Based on this theorem, we can now prove our central result: Theorem 3. Checking entailment of simplified P2RPQs from tame ZOIQ KBs with binary number encoding is 2EXPTIME-complete. It is EXPTIME-complete if the number of atoms in the query is bounded.

Proof. By Theorem 2, entailment of a simplified C2RPQ q from a tame ZOIQ KB K can be reduced to checking unsatisfiability of K:q, which can be computed in outputpolynomial time and the size of which is exponential (polynomial, if number of atoms in q is bounded). By Theorem 1, (un)satisfiability of K:q can be checked in EXPTIME in the size of K:q, hence we obtain the desired 2EXPTIME and EXPTIME upper bounds. The matching lower bounds are well-known even for much weaker logics [ 9 ]. 6

Querying GC2 Databases We now consider the problem of P2RPQ entailment from GC2 databases, i.e., sets of unary and binary ground facts in the presence of a sentence in the guarded two-variable fragment with counting quantifiers as defined by Pratt-Hartmann ([ 10 ]). We will show that there is a polytime query-entailment-preserving transformation from GC2 databases to ZIQ KBs. This allows us to transfer the results from Theorem 3. Matching lower bounds follow from known results for much weaker logics and query formalisms (such as CQs over ALCI [ 7 ]).

Definition 7 (GC2, [ 10 ]). For a signature S of nullary (N0), unary (NC), and binary (NR [ f g) predicates,8 let GC2 be the smallest set of formulae that contains all atoms over S using only variables from fx; yg, that is closed under Boolean combinations, and that contains – all formulae 9u:' and 8u:' with u 2 fx; yg whenever ' is a GC2 formula with at most one free variable, and – all formulae 8u:( ! ') as well as uQ:( ^ ') and uQ:( ) where is a binary atom (called “guard”) containing both x and y, ' is a GC2 formula, and Qis any of 9, 96n, 9=n, or 9>n.

A GC2 database DB is a theory f g [ A consisting of a GC2 sentence and a finite set 2 A of unary and binary ground facts over constants from NI.9 We denote by GCqef the set of GC2 formulae which are quantifier-free and do not use .

A P2RPQ over a GC2 database is a P2RPQ where for every atom of the form C(t) holds C 2 NC and for every atom of the form T (s; t), every subexpression of the form id(C) satisfies C 2 NC. P2RPQ entailment from GC2 databases is defined in the same way as for DLs.

For convenience, we make use of a special normal form for GC2 introduced in the following. 8 As unary and binary predicates correspond to concept and role names, respectively, we use the same set symbols to denote them. 9 Note that ABoxes, as defined before, are syntactically and semantically identical to finite sets of unary and binary ground facts, so we consider the notions the same and use the same symbol A. Note also that by definition, does not contain constants.

Lemma 6 (normal form, [ 11 ]). For any GC2 sentence , one can compute in polynomial time a GC2 sentence NF( ) of the form 8x ^ V1 h ` 8x8y (Eh(x; y) ! h _ x

y)) ^ V1 i m 8x9=ni y (Fi(x; y) ^ x 6 y) ; (14) with 2 GC2qef not using y, fE1; ::: ; E`; F1; ::: ; Fmg NR, f 1; ::: ; `g GC2qef , and fn1; ::: ; nmg N, such that (i) NF( ) j= and (ii) every model of with at least maxh nh elements can be extended to a model of NF( ).

Given a GC2 database DB = f g [ A, we let NF(DB) = fNF( )g [ A. Thanks to the previous lemma and the known property that models of GC2 sentences are closed under disjoint self-union, the normal form can be shown to be query-entailment-preserving. Lemma 7. For every P2RPQ q over a GC2 database DB holds DB j= q if and only if NF(DB) j= q.

We next provide a query-entailment-preserving polynomial translation of GC2 databases into ZIQ KBs.

Definition 8. Let be a GC2 sentence in normal form as in Lemma 6. We define the ZIQ TBox T as follows:

[ 1 h l

[ 1 i m T = T – T = f> v 61R :>; > v 9R :Selfg – Tps = fP ropP 9>:P ropP j P 2 N0g – T = f> v tr( )g, where

tr(P ) = P ropP tr(P (x)) = P tr(P (x; x)) = 9P:Self

tr(:') = :tr(') tr(' ^ ) = tr(') u tr( ) tr(' _ ) = tr(') t tr( ) [ f> v 8(Ehn(R in h to sets of GCIs as follows: – TEh; h = Sat2atoms( h) (axEh (at) [ f> v 8(REh;atnEh):?g) where axEh maps atoms [ trEh ( h))):?g P 7! f P (x) 7! f P (x; x) 7! f P (y) 7! f P (y; y) 7! f

9REh;P (y;y):> v 9P:Self; P (x; y) 7! f 9(REh;P (x;y)nP ):> v ?; P (y; x) 7! f 9(REh;P (y;x)nP ):> v ?;

2 and trEh maps GCqef formulae to simple roles:

9REh;P :> v P ropP ; 9REh;P (x):> v P; 9REh;P (x;x):> v 9P:Self; 9REh;P (y):> v P;

9(EhnREh;P ):> v :P ropP g 9(EhnREh;P (x)):> v :P g 9(EhnREh;P (x;x)):> v :9P:Selfg 9(Eh nREh;P (y)):> v :P g 9(Eh nREh;P (y;y)):> v :9P:Selfg 9(Eh\P nREh;P (x;y)):> v ? 9(Eh \P nREh;P (y;x)):> v ? g g trEh (at ) = REh;at trEh (: ) = Ehntr( ) trEh ( ^ 0) = trEh ( ) \ trEh ( 0) trEh ( _ 0) = trEh ( ) [ trEh ( 0) – TFi;ni = f > v 6ni(Fi nR ):> u >ni(Fi nR ):> g Given a GC2 database DB = f g [ A, we let KDB denote the ZIQ KB (TNF( ); A). Intuitively, T axiomatises equality, Tps introduces synchronised concept names P ropP for every propositional symbol P from DB, since ZIQ does not allow for nullary predicates. T , TEh; h , and TFi;ni implement the conjuncts from NF( ).

We can now observe that KDB can be computed in polytime wrt. the size of DB and that it has the qfhcp (since it is a ZIQ KB). Moreover we can show that any model of KDB can be extended to one of NF(DB) and, conversely, any model of NF(DB) can be extended to one of KDB. With the help of these correspondences and Lemma 7, we can then argue that for any P2RPQ q over DB holds DB j= q iff KDB j= q. These findings allow us to apply Theorem 3 to establish the following result.

Theorem 4. Checking P2RPQ entailment from GC2 databases is 2EXPTIME-complete. It is EXPTIME-complete if the number of query atoms is bounded by a constant. 7

Conclusion We have established tight complexity bounds for expressive querying in very expressive DLs under the assumption of succinct (i.e. binary) encoding of number restrictions. Arguing along the lines of Calvanese et al. [ 6 ], we can leverage our findings to strengthen their results on query containment as well as the SR family as follows:10 Theorem 5 (query containment). Testing query entailment K j= q0 q is in 2EXPTIME with respect to the total size of q0, q, and K (with binary encoding of number restrictions) if (i) K is in ZOQ or ZOI and q0 and q are P2RPQs over K, or (ii) K is in ZIQ, q0 is a conjunctive query and q is a P2RPQ over K. The complexity drops to EXPTIME if the number of atoms occurring in q is bounded by a constant. Theorem 6 (SR family). Deciding K j= q and K j= q0 q is in 3EXPTIME in the total size of q0, q, K (with binary encoding of numbers) – and in 2EXPTIME if K’s RBox is given by defining regular expressions for the non-simple roles – if (i) K is in SROQ or SROI and q0, q are P2RPQs over K, or (ii) K is in SRIQ, q0 is a conjunctive query and q is a P2RPQ over K. The complexities are in 2EXPTIME and in EXPTIME, respectively, if the number of atoms occurring in q is bounded by a constant. There are plenty of open questions left for future work. First, the results established here hold for arbitrary models, however, so far, very little is known about finite (model) query answering in DLs from the Z family and the methods applied here do not seem to easily carry over to the finite model setting. Second, while the extension of our query formalism by nestings in the sense of Bienvenu et al. [ 1 ] seems straightforward without impacting complexities, our technique seems not readily extendable to capture more elaborate query languages [ 14,4,12 ]. 10 For space reasons, we assume the reader to be familiar with the notions used in these theorems.

Meghyn

Bienvenu , Diego Calvanese, Magdalena Ortiz, and

Mantas

Simkus . Nested regular path queries in description logics . In Chitta Baral, Giuseppe De Giacomo, and Thomas Eiter, editors, Proc. 14th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR'14) , pages 218 - 227 . AAAI Press, 2014 .

2. Piero

Bonatti , Carsten Lutz, Aniello Murano, and Moshe

Vardi . The complexity of enriched mu-calculi . Logical Methods in Computer Science , 4 ( 3 :11): 1 - 27 , 2008 .

Pierre

Bourhis , Markus Krötzsch, and

Sebastian

Rudolph . How to best nest regular path queries . In Proc. 27th Int. Workshop on Description Logics (DL'14) , volume 1193 of CEUR Workshop Proceedings , pages 404 - 415 . CEUR-WS.org, 2014 .

Pierre

Bourhis , Markus Krötzsch, and

Sebastian

Rudolph . Reasonable highly expressive query languages . In Qiang Yang and Michael Wooldridge , editors, Proc. 24th Int. Joint Conf. on Artificial Intelligence (IJCAI'15) , pages 2826 - 2832 . AAAI Press, 2015 .

5. Diego Calvanese, Thomas Eiter, and

Magdalena

Ortiz . Answering regular path queries in expressive description logics: An automata-theoretic approach . In Proc. 22nd AAAI Conf. on Artificial Intelligence (AAAI'07) , pages 391 - 396 . AAAI Press, 2007 .

6. Diego Calvanese, Thomas Eiter, and

Magdalena

Ortiz . Regular path queries in expressive description logics with nominals . In Craig Boutilier, editor, Proc. 21st Int. Joint Conf. on Artificial Intelligence (IJCAI'09) , pages 714 - 720 . IJCAI, 2009 .

Carsten

Lutz . The complexity of conjunctive query answering in expressive description logics . In Alessandro Armando,

Peter

Baumgartner , and Gilles Dowek, editors, Proc. 4th Int. Joint Conf. on Automated Reasoning (IJCAR'08) , number 5195 in LNAI, pages 179 - 193 . Springer, 2008 .

Magdalena

Ortiz ,

Sebastian

Rudolph , and

Mantas

Simkus . Query answering is undecidable in DLs with regular expressions, inverses, nominals, and counting . INFSYS Research Report 1843-10-03 , TU Vienna, 2010 .

Magdalena

Ortiz and

Mantas

Simkus . Reasoning and query answering in description logics . In Thomas Eiter and Thomas Krennwallner, editors, Proc. 8th Int. Reasoning Web Summer School (RW'12) , volume 7487 of Lecture Notes in Computer Science, pages 1 - 53 . Springer, 2012 .

10. Ian Pratt-Hartmann. Complexity of the guarded two-variable fragment with counting quantifiers . J. Log. Comput. , 17 ( 1 ): 133 - 155 , 2007 .

11. Ian Pratt-Hartmann. Data-complexity of the two-variable fragment with counting quantifiers . Inf. Comput. , 207 ( 8 ): 867 - 888 , 2009 .

12. Juan L. Reutter , Miguel Romero, and Moshe

Vardi . Regular queries on graph databases . Theor. Comp. Sys. , 61 ( 1 ): 31 - 83 , July 2017 .

13.

Sebastian

Rudolph . Undecidability Results for Database-Inspired Reasoning Problems in Very Expressive Description Logics . In Chitta Baral,

James P.

Delgrande , and Frank Wolter, editors, Proc. 15th Int. Conf. on Knowledge Representation and Reasoning (KR'16) , pages 247 - 257 . AAAI Press, 2016 .

14.

Sebastian

Rudolph and

Markus

Krötzsch . Flag & check: Data access with monadically defined queries . In Richard Hull and Wenfei Fan, editors, Proc. 32nd Symposium on Principles of Database Systems (PODS'13) , pages 151 - 162 . ACM, 2013 .