-

Combined Complexity of Answering Tree-like Queries in OWL 2 QL

Meghyn Bienvenu

Stanislav Kikot

Vladimir Podolskii

2 0 Institute for Information Transmission Problems & MIPT , Moscow , Russia 1 LRI - CNRS & Universite ́ Paris Sud , Orsay , France 2 Steklov Mathematical Institute & Higher School of Economics , Moscow , Russia

Introduction The OWL 2 QL ontology language [11], based upon the description logic DL-LiteR, is considered particularly well suited for applications involving large amounts of data. While the data complexity of querying OWL 2 QL knowledge bases is well understood, far less is known about combined complexity of conjunctive query (CQ) answering for restricted classes of conjunctive queries. By contrast, the combined complexity of CQ answering in the relational setting has been thoroughly investigated. In relational databases, it is well known that CQ answering is NP-complete in the general case. A seminal result by Yannakakis established the tractability of answering tree-shaped (aka acyclic) CQs [14], and this result was later extended to wider classes of queries, most notably to bounded treewidth CQs [5]. Gottlob et al. [6] pinpointed the precise complexity of answering tree-shaped and bounded treewidth CQs, showing both problems to be complete for the class LOGCFL of all languages logspace-reducible to context-free languages [13]. In the presence of arbitrary OWL 2 QL ontologies, the NP upper bound for arbitrary CQs continues to hold [4], but answering tree-shaped queries becomes NP-hard [8]. Interestingly, the latter problem was recently proven tractable in [3] for DL-Litecore (a slightly less expressive logic than OWL 2 QL), raising the hope that other restrictions might also yield tractability. This extended abstract summarizes our investigation [2] into the combined complexity of conjunctive query answering in OWL 2 QL for tree-shaped queries, their restriction to linear and bounded leaf queries and their generalization to bounded treewidth queries. Our complexity analysis reveals that all query-ontology combinations that have not already been shown NP-hard are in fact tractable. Specifically, in the case of bounded depth ontologies, we prove membership in LOGCFL for bounded treewidth queries (generalizing the result in [6]) and membership in NL for bounded leaf queries. We also show LOGCFL-completeness for linear and bounded leaf queries in the presence of arbitrary OWL 2 QL ontologies. This last result is the most interesting technically, as the upper and lower bounds rely on two different characterizations of the class LOGCFL. Preliminaries We assume the reader familiar is OWL 2 QL (or DL-LiteR) knowledge bases (KBs), composed of a TBox T and ABox A built from countably infinite, mutually disjoint sets NC, NR, and NI of concept names, role names, and individual names. Roles R and basic concepts B are defined in a standard way, cf. [4]. We use NR to refer to the set of all roles. We recall that every consistent OWL 2 QL KB (T ; A) possesses a canonical model CT ;A with the property that T ; A j= q(a) iff CT ;A j= q(a) for every CQ q and tuple a inds(A). Thus, CQ answering in OWL 2 QL corresponds to deciding the existence of a homomorphism of the query into the canonical model. Informally,

CT ;A is obtained from A by repeatedly applying the axioms in T , introducing fresh elements as needed to serve as witnesses for the existential quantifiers. According to the standard construction (cf. [ 10 ]), the domain CT ;A of CT ;A consists of inds(A) and all words of the form aR1R2 : : : Rn 1Rn (n 1) with a 2 NC and Ri 2 NR. Intuitively, the element aR1R2 : : : Rn 1Rn is obtained by applying an axiom with right-hand side 9Rn to the element aR1R2 : : : Rn 1 2 CT ;A . A TBox T is of depth ! if there is an ABox A such that the domain of CT ;A is infinite; T is of depth d, 0 d < !, if d is the greatest number such that some CT ;A contains an element of the form aR1 : : : Rd. Contributions In what follows, we briefly formulate our combined complexity results and provide some intuitions about the proof techniques. See [ 2 ] for details. Theorem 1. CQ answering is in LOGCFL for bounded treewidth queries and bounded depth ontologies.

Proof sketch. We exploit the fact that CQ answering over a KB (T ; A) corresponds to evaluating the query over the canonical model CT ;A viewed as a database. If T has depth k (with k a fixed constant), then CT ;A can be computed by a deterministic logspace Turing machine (TM) with access to an NL oracle. Indeed, the depth bound k implies the finiteness of CT ;A and that all domain elements can be described using logarithmically many bits. To complete the argument, we use the fact that answering bounded treewidth queries over databases is in LOGCFL [ 7 ] and that the class LOGCFL is closed under LLOGCFL (and hence LNL) reductions [ 13 ]. For lack of space, we have not specified how to check whether a variable (resp. pair of variables) can be mapped to an element (resp. pair of elements), but this can be done in NL using a small number of entailment checks. Also note that the bound on the number of leaves yields the bound on size of Frontier, and the bound on the TBox depth guarantees that we only need logarithmically many bits per pair in Frontier. Theorem 3. CQ answering is LOGCFL-complete for bounded leaf queries and arbitrary ontologies. The lower bound holds already for linear queries.

Theorem 2. CQ answering is NL-complete for bounded leaf queries and bounded depth ontologies.

Proof sketch. The lower bound is an immediate consequence of the NL-hardness of answering atomic queries in OWL 2 QL. To prove the upper bound, we apply a straightforward non-deterministic procedure for deciding (T ; A) j= q : 1. Fix a directed tree T compatible with q. Let v0 be the root variable. 2. Guess u0 2 CT ;A . Return no if v0 cannot be mapped to u0. 3. Initialize Frontier to f(v0; u0)g. 4. While Frontier 6= ; (a) Remove (v1; u1) from Frontier. (b) For every child v2 of v1 i. Guess an element u2 from CT ;A . ii. Return no if (v1; v2) cannot be mapped to (u1; u2).

iii. Add (v2; u2) to Frontier. 5. Return yes. tu tu Proof sketch. Concerning the upper bound, it is easy to adapt the previous algorithm to handle arbitrary TBoxes: we simply replace CT ;A by faw 2 CT ;A j jwj 2jT j + jqjg. The modified algorithm gives the correct answers, but it does not have the required complexity, because it might need more than logarithmically many bits to store guessed elements aw. To show LOGCFL membership, we further modify the procedure so that it can be implemented by a non-deterministic polytime logspace-bounded Turing machine augmented with a stack (such TMs are known to capture LOGCFL computation [ 12 ]). The stack is used to store the word part w of a domain element aw. The modification is not at all obvious since we need to store several words at a time while the specified machine has only a single stack; the trick is to employ a careful ‘synchronization’ of traversals of different branches of the query.

The lower bound is by reduction from the problem of deciding whether an input of length l is accepted by the lth circuit of a logspace-uniform family of SAC1 circuits (proven LOGCFL-hard in [ 13 ]). This problem was used in [ 7 ] to show LOGCFLhardness of evaluating tree-shaped CQs over databases. We follow a broadly similar approach, but with one crucial difference: the power of OWL 2 QL TBoxes allows us to ‘unravel’ the circuit into a tree and to use linear queries instead of tree-shaped ones. tu Discussion If we compare the new and existing results for OWL 2 QL with those from relational databases, we observe that adding an OWL 2 QL TBox of bounded depth does not change the combined complexity for query answering, while for TBoxes of unbounded depth, the complexity class shifts one ‘step’ higher: from NL to LOGCFL for bounded leaf queries and from LOGCFL to NP for tree-shaped and bounded treewidth CQs. It is also interesting to compare the combined complexity landscape (below right) with the succinctness landscape for query rewriting (below left) from [ 1 ]. :[] 4 :][ 8 c P

N -cL .3

m FLCGO :hT; arb NL=poly NP=poly: no polysize PE or NDL

[ 9 ] [ 9 ] E P A H S Y R E U Q [ 1 ] poly PE, FO and NDL [ 9 ]

SAC1: no poly PE but poly NDL

[ 1 ] NL=poly: no poly PE but poly NDL [ 1 ] ][ 8 y l o p = P N arb n itw.. h eed. r T 2 trees s` e v a l . e f . reo. b u1 m N

NP-complete

: DBs : [ 4 ]

LOGCFL-complete

: [ 7 ] : Thm. 1

NL-complete

: [ 4 ] / DBs : Thm. 2 1 2 3

: : : ONTOLOGY DEPTH d arb 1 2 3

: : : ONTOLOGY DEPTH d Observe that for our newly identified tractable classes, polynomial-size non-recursive datalog (NDL) rewritings are guaranteed to exist, whereas this is not the case for the positive existential (PE) rewritings more typically considered. In future work, we plan to marry these positive succinctness and complexity results by developing concrete NDLrewriting algorithms for OWL 2 QL for which both the rewriting and evaluation phases run in polynomial time (as was done in [ 3 ] for DL-Litecore).

Acknowledgments. Partial support was provided by ANR grant 12-JS02-007-01, Russian Foundation for Basic Research and the programme “Leading Scientific Schools”.

Bienvenu ,

Kikot , and

V. V.

Podolskii . Succinctness of query rewriting in OWL 2 QL: the case of tree-like queries . In Informal Proceedings of the 27th International Workshop on Description Logics , Vienna, Austria, July 17-20 , 2014 ., pages 45 - 57 , 2014 .

Bienvenu ,

Kikot , and

V. V.

Podolskii . Tree-like queries in OWL 2 QL: Succinctness and complexity results . In Proc. of the 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2015 ). IEEE, 2015 .

Bienvenu ,

Ortiz ,

Simkus , and

Xiao . Tractable queries for lightweight description logics . In Proc. of the 23rd Int. Joint Conf. on Artificial Intelligence (IJCAI 2013 ). AAAI Press, 2013 .

Calvanese , G. De Giacomo,

Lembo ,

Lenzerini , and

Rosati . Tractable reasoning and efficient query answering in description logics: The DL-Lite family . J. of Automated Reasoning , 39 ( 3 ): 385 - 429 , 2007 .

Chekuri and

Rajaraman . Conjunctive query containment revisited . Theoretical Computer Science , 239 ( 2 ): 211 - 229 , 2000 .

Gottlob ,

Leone , and

Scarcello . Computing LOGCFL certificates . In ICALP-99 , pages 361 - 371 , 1999 .

Gottlob ,

Leone , and

Scarcello . The complexity of acyclic conjunctive queries . J. ACM , 48 ( 3 ): 431 - 498 , 2001 .

Kikot ,

Kontchakov ,

V. V.

Podolskii , and

Zakharyaschev . Exponential lower bounds and separation for query rewriting . In Proc. of the 39th Int. Colloquium on Automata, Languages, and Programming (ICALP 2012 ), Part

, volume 7392 of LNCS , pages 263 - 274 . Springer, 2012 .

Kikot ,

Kontchakov ,

V. V.

Podolskii , and

Zakharyaschev . On the succinctness of query rewriting over OWL 2 QL ontologies with shallow chases . In Proc. of the 29th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2014 ). ACM Press, 2014 .

10.

Kontchakov ,

Lutz ,

Toman ,

Wolter , and

Zakharyaschev . The combined approach to query answering in DL-Lite . In Proc. of the 10th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR 2010 ). AAAI Press, 2010 .

11.

Motik ,

B. Cuenca

Grau ,

Horrocks ,

Wu ,

Fokoue , and

Lutz . OWL 2 Web Ontology Language profiles . W3C Recommendation , 11 December 2012 . Available at http://www.w3.org/TR/owl2-profiles/.

12.

I. H.

Sudborough . On the tape complexity of deterministic context-free languages . Journal of the ACM , 25 ( 3 ): 405 - 414 , 1978 .

13.

Venkateswaran . Properties that characterize LOGCFL . J. Computer and System Sciences , 43 ( 2 ): 380 - 404 , 1991 .

14.

Yannakakis . Algorithms for acyclic database schemes . In Proc. of the 7th Int. Conf. on Very Large Data Bases (VLDB'81) , pages 82 - 94 . IEEE Computer Society, 1981 .