XPath for DL-Lite Ontologies Egor V. Kostylev1 , Juan L. Reutter2 , and Domagoj Vrgoč3 1 University of Oxford egor.kostylev@cs.ox.ac.uk 2 PUC Chile jreutter@ing.puc.cl 3 University of Edinburgh domagoj.vrgoc@ed.ac.uk Abstract. Applications of description logics (DLs) such as OWL 2 and ontology- based data access (OBDA) require understanding of how to pose database queries over DL knowledge bases. While there have been many studies regarding tradi- tional relational query formalisms such as conjunctive queries and their exten- sions, little attention has been paid to graph database queries, despite the fact that graph databases share the structure of interpretations with DLs; that is they de- scribe essentially the same objects. In particular, not much is known about the interplay between DLs and XPath. The last is a powerful formalism for querying semistructured data: it is in the core of most practical query languages for XML trees, and it is also gaining popularity in theory and practice of graph databases. In this paper we make a step towards coupling knowledge bases and graph data- bases by studying how to answer powerful XPath-style queries over DL-Lite. We start with adapting the definition of XPath to the DL context, and then proceed to study the complexity of evaluating XPath queries over knowledge bases. Results show that, while query answering is undecidable for the full XPath, by carefully tuning the amount of negation allowed in the queries we can arrive to XPath frag- ments that have a potential to be used in practical applications. 1 Introduction Satisfiability and model checking have long been two central problems in the knowl- edge representation community. But new applications of description logics (DLs for short), such as the Web Ontology Language (OWL) [21] and ontology-based data ac- cess (OBDA), are forcing us to develop algorithms solving more complex data ma- nipulation and extraction tasks, and in particular, require understanding how to answer database-style queries over knowledge bases that are specified by DLs [14]. The literature on knowledge bases usually considers relational queries, mostly fo- cusing on conjunctive queries (CQs) and their extensions with union [1, 7], forms of negation [15, 24], or aggregates [10, 18]. It is also natural to look at queries that are designed to operate over graph databases, since these share the same structure with DL knowledge bases; namely they use unary and binary predicates (that is concepts and roles in DL terminology) to represent graph nodes and edges. While the idea of using graph queries in knowledge bases is not new (see e.g., [8]), we are only starting to un- derstand how such queries can be fine tuned for various ontology languages. So far the specific research on this topic has primarily been concerned with the class of regular path queries (RPQs; see e.g. [2]), one of the most basic graph query languages. We now know how to evaluate such queries over various description logics [9], how to deal with some of their extensions [4, 5] and understand how to solve their containment problem in the presence of DL constraints [11]. There are of course many other languages for querying graph and semi-structured data, most notable amongst them being XPath. Originally designed to extract data from XML trees [27], XPath has recently been adapted to work over graph databases [19] and was shown to retain good evaluation properties while at the same time being more powerful than RPQs and many of their extensions. Moreover, XPath also subsumes navigational graph querying features of SPARQL [17], such as property paths, and other commercial graph query languages (see e.g. Neo4j [23]), and thus can also be seen as a querying primitive for more powerful languages. Given that XPath is capable of expressing virtually all relevant querying primitives for graphs, and was tried and tested by XML practitioners, it is natural to see if it will have the same impact as a language for DL knowledge bases. In this paper we take a decisive step towards the implementation of graph query languages over ontologies and study the problem of evaluating XPath queries over DL. To that extent, we first adapt the XPath query language to work over ontologies, arriving to DLXPath—an expressive language designed specifically for DL knowledge bases. To get a flavour of the interplay between DLXPath and DLs, we then study the problem of evaluating DLXPath queries over DL-Lite R and DL-Lite core ontologies. The choice for this particular family of DLs is twofold: on the one hand, it is simple enough to start such a research; on the other, this family is of the most importance in practice, since it underlies OWL 2 QL profile [21]. As a simple useful example of an DLXPath query, we can give the following: ‘Can I fly from city A to city B making stops only in cities with UNESCO World Heritage Sites which are endangered?’ Here we assume that the considered ontology has binary predicates (roles) HasDirectFlight and HasUNESCOSite, as well as unary predicate (concept) InDanger. Under this ontology this query cannot be answered by RPQs. But note that it can, however, be expressed as a nested RPQ [4], as this language corresponds to DLXPath without negation. However, if we additionally require the site of the last intermediate stop to be included in the list not under cultural criteria (expressed by concept Cultural), then such a query cannot be expressed by any known language for knowledge bases, but can be written as DLXPath query. Regardless of the choice of a particular DL, one cannot expect that the study of DLXPath query answering will come as a straightforward adaptation of known tech- niques. The language of DLXPath contains expressive primitives such as transitive clo- sure and negation, that is, the operators which are known to bring in conceptual difficul- ties even when studied in isolation. For example, it has been shown that adding negation to conjunctive queries immediately leads to undecidability of query answering [15], while adding path operators usually implies jumping one or two exponents in the com- plexity of query evaluation algorithms [9]. Thus, one of the challenges when studying DLXPath is to fine-tune these operators to archive an optimal trade-off between good evaluation properties and expressive power of the query language. To this end, we will distinguish two flavours of the language: the core fragment, denoted by DLXPathcore , and the regular fragment, denoted by DLXPathreg . The two are designed to match the duality present in the standard XPath query language [27], and while the core fragment allows transitive closure only over basic role names, the regular fragment lifts this re- striction, allowing us to post more general path queries. Regarding negation, we will distinguish full fragments, which allow both unary and binary negation, path-positive fragments with only the unary one, and positive fragments without any negation. As usual when gauging the usefulness of a new query language for ontologies, we start by considering data complexity of the query answering problem, where one as- sumes that the query and the terminological knowledge (TBox) are fixed, while the only input is the assertional knowledge (ABox). Although we show that for the most general case the problem is undecidable, by limiting the amount of negation we obtain expres- sive languages whose queries can be answered in CO NP and even NL OG S PACE, both over DL-Lite R and DL-Lite core . We then move to studying the combined complexity, where both the knowledge base and the query form the input. Here we obtain bounds ranging from NP-complete (thus matching the ones for ordinary CQs), to EXPTIME- complete, and undecidable for the full language. We would like to note that some of the results are obtained using the deep connection between DLs, DLXPath and propo- sitional dynamic logic, thus providing some interesting new techniques for answering queries over ontologies. Proofs Due to the space restrictions, we only give sketches of short proofs and ideas of longer ones. 2 Preliminaries The language of DL-Lite R (and DL-Lite core ) [1,7] contains individuals c1 , c2 , . . ., con- cept names A1 , A2 , . . ., and role names P1 , P2 , . . .. Concepts B and roles R are defined by the grammar B ::= Ai | ∃R, R ::= Pj | Pj− . A DL-Lite R TBox is a finite set of concept and role inclusions of the form B1 ⊑ B2 , B1 ⊓ B2 ⊑ ⊥, R1 ⊑ R2 , R1 ⊓ R2 ⊑ ⊥. A DL-Lite core TBox contains only concept inclusions. An ABox is a finite set of asser- tions of the form Ai (ck ) and Pj (ck , cℓ ). A knowledge base (KB) is a pair (T , A), where T is a TBox and A an ABox. An interpretation I = (∆I , ·I ) is a nonempty domain ∆I of elements with an interpretation function ·I that assigns an element cIk ∈ ∆I to each individual ck , a subset AIi of ∆I to each concept name Ai , and a binary relation PjI ⊆ ∆I × ∆I to each role name Pj . When dealing with DL-Lite it is usual to adopt the unique name assumption (UNA), and we do so here by requiring that cIk 6= cIℓ , for all individuals ck 6= cℓ . Our results, however, do not depend on UNA. The interpretation function ·I is extended to roles and concepts in the following standard way: (∃R)I = d ∈ ∆I | there is d′ ∈ ∆I with (d, d′ ) ∈ RI ,  (Pj− )I = {(d′ , d) ∈ ∆I × ∆I | (d, d′ ) ∈ PjI }. The satisfaction relation |= for TBox inclusions and ABox assertions is also standard: I |= B1 ⊑ B2 iff B1I ⊆ B2I , I |= B1 ⊓ B2 ⊑ ⊥ iff B1I ∩ B2I = ∅, I |= R1 ⊑ R2 iff R1I ⊆ R2I , I |= R1 ⊓ R2 ⊑ ⊥ iff R1I ∩ R2I = ∅, I |= Ai (ck ) iff cIk ∈ AIi , I |= Pj (ck , cℓ ) iff (cIk , cIℓ ) ∈ PjI . A KB K = (T , A) is satisfiable if there is an interpretation I satisfying all inclu- sions of T and assertions of A. In this case we write I |= K and say that I is a model of K. 3 DLXPath: XPath for Knowledge Bases As mentioned in the introduction, the family of DL-Lite was designed not only to keep satisfiability and model checking problems simple, but mainly to keep the complexity of conjunctive query answering the same as in the case of relational databases, while si- multaneously maximising the expressive power of the ontological language. This allows to use DL-Lite as a foundation for practical data managing applications, such as OWL 2 QL and OBDA. However, this does not automatically mean that other useful query formalisms with good evaluation properties over databases also have good properties when posed over DL-Lite knowledge bases. Hence, each class of queries which can be useful in knowledge base applications requires a separate research on its computational properties. In this paper we concentrate on an adaptation of XPath query language for XML trees to knowledge bases. Recently it was shown that (a version of) XPath can be suc- cessfully used for querying graph databases [19]. Every interpretation of a DL-Lite vocabulary can be seen as a graph, and hence every DL-Lite KB is an incomplete description of a graph. That is why we expect our adaptation DLXPath for querying knowledge bases to be useful in practical applications. In what follows, we will consider several fragments of DLXPath. We start with DLXPathcore , the fragment which corresponds to core XPath, the theoretical foundation of most practical languages for querying XML trees.4 Definition 1. Node formulas ϕ, ψ and path formulas α, β of DLXPathcore are expres- sions satisfying the grammar ϕ, ψ := A | ¬ϕ | ϕ ∧ ψ |ϕ ∨ ψ | hαi , (1) α, β := ε | R | [ϕ] | α ∪ β | α · β | α | R+ , where A ranges over concept names and R ranges over roles (i.e., role names and their inverses). Semantics J·KI of DLXPathcore for an interpretation I associates subsets of ∆I to node formulas and binary relations on ∆I to path formulas as given in Table 1. 4 The subscript ‘core’ is used in this paper for two unrelated purposes. This matching is histor- ical and accidental, but we decided to stay with conventional notation despite this undesired collision. JAKI = AI J¬ϕKI = ∆I \ JϕKI Jϕ ∧ ψKI = JϕKI ∩ JψKI Jϕ ∨ ψKI = JϕKI ∪ JψKI JhαiKI = {d | there exists d′ such that (d, d′ ) ∈ JαKI } JεKI = {(d, d) | d ∈ ∆I } JRKI = RI J[ϕ]KI = {(d, d) | d ∈ JϕKI } Jα ∪ βKI = JαKI ∪ JβKI Jα · βKI = JαKI ◦ JβKI JαKI = (∆I × ∆I ) \ JαKI JR+ KI is the transitive closure of RI Table 1. Semantics of DLXPathreg . The symbol ‘\’ stands for set-theoretic difference. As usual when dealing with ontologies, our interest is not query answering for a particular interpretation, but computing those answers that are true in all possible mod- els of the knowledge base. Formally, let K = (T , A) be a knowledge base and α a DLXPath path formula. The certain answers of α over K, denoted Certain(α, K), is the set of all pairs (c1 , c2 ) of individuals such that (cI1 , cI2 ) ∈ JαKI for all models I of K. Similarly, one can define certain answers Certain(ϕ, K) for a DLXPath node formula ϕ as the set of all individuals c such that cI ∈ JϕKI , for all I models of K. In the paper all of the results will be stated for path formulas (queries from here on), however, they remain unchanged for node formulas. Example 1. Coming back to the example from the introduction, consider role names HasDirectFlight, which connects cities with a direct flight, and HasUNESCOSite, which connects cities with their UNESCO world heritage sites, as well as concept InDanger denoting that a particular site is endangered. Let KB K represents the flight destination graph, as well as knowledge about heritage sites, partially explicitly in the ABox, and partially implicitly, by means of TBox inclusions. Then checking whether it is possible to fly from Edinburgh to a city that has an endangered UNESCO world heritage site is equivalent to checking if Edinburgh is in Certain(ϕ, K), where ϕ is a node formula hHasDirectFlight+ [hHasUNESCOSite[InDanger]i]i.  As already noted, the formalism of core XPath is in the nutshell of the most practi- cal query languages for XML trees. However, in [19] it was shown that the correspond- ing graph language cannot express certain properties that are deemed essential when querying graphs. Thus, besides DLXPathcore we also consider its generalisation called regular DLXPath, or DLXPathreg , which extends the core fragment by allowing the use of transitive closure operator + over arbitrary path formulas. Formally, path formulas of DLXPathreg satisfy the grammar α, β := ε | R | [ϕ] | α ∪ β | α · β | α | α+ , while node formulas remain the same as for DLXPathcore in grammar (1). As expected, the semantics Jα+ KI over an interpretation I is the transitive closure of JαKI . Example 2. With full transitive closure we are now able to post more complex queries than in the core fragment. Consider again the knowledge base K from Example 1. We can now ask if it is possible to fly from Liverpool to Jerusalem, making stopovers only in places that have an endangered UNESCO world heritage site by checking if the pair (Liverpool, Jerusalem) is in Certain(α, K), where α is a query (HasDirectFlight[hHasUNESCOSite[InDanger]i])+ . Note that here we check if each of the cities along the path has an endangered site. If we want to additionally require that the sites along the route are not included in the UNESCO list under cultural criteria, we need to replace [InDanger] with [InDanger ∧ ¬Cultural] in the query above.  Besides these two query languages, we will consider their fragments, which will be introduced as needed. Before passing on to the complexity of DLXPath query evaluation, we briefly com- pare our languages with other formalisms. First, DLXPath is clearly incomparable with CQs. However, all tree-shaped CQs and unions of CQs with no more than two free vari- ables can be written as DLXPathcore queries. On the other hand, all negation-free and + -free DLXPathreg queries can be written as unions of CQs, though with a cost of ex- ponential blow-up. If we allow negation but only over concept and role names, then a query can be written as a union of CQs with safe negation. Second, DLXPathreg queries without negation and node tests [ϕ] are 2-way regular path queries (2RPQs), the stan- dard formalism for querying graph databases. If, additionally, role inverses are not al- lowed as path formulas, it becomes plain RPQs, that is, essentially, regular expressions. Some fragments of DLXPath can be expressed in other description logic formalisms. For example, it is well-known that a unary tree-shaped CQ with the corresponding tree being rooted and directed, can be written as a concept of EL, a DL underlying OWL 2 EL profile [21]. Finally, DLXPathreg node formulas without path negation are nothing else but propositional dynamic logic with converse (CPDL) formulas; we will discuss this connection in more detail later on. In the following sections we analyse the complexity of evaluating DLXPath queries over DL-Lite knowledge bases. 4 Data Complexity of DLXPath Query Evaluation As it is widely accepted in theory and proved in practice, the size of the query and TBox is usually much smaller than the size of the ABox (see e.g., [25] for discussion in the relational database context and [7] for DLs). This is why one usually considers data complexity of query answering, assuming that the TBox and the query are fixed, and only ABox is part of the input. In this section we study this problem for various fragments of DLXPath. Formally, let T be a TBox and α a DLXPath query. We are interested in the following family of problems. C ERTAIN A NSWERS (α, T ) Input: ABox A and pair (c1 , c2 ) of individuals Question: Is (c1 , c2 ) ∈ Certain(α, (A, T ))? As it was previously mentioned, data complexity of CQ query answering over DL- Lite R knowledge bases is the same as over relational databases, that is, in L OG S PACE [7]. At the first glance, one may expect a similar result in the case of DLXPath, where the complexity is NL OG S PACE-complete over graph databases [19]. However, the com- bination of the open world assumption and allowing negation in queries makes things quite different. In fact, the situation here is more like in the case of CQ with safe nega- tion, where the complexity jump is dramatic: from polynomiality to undecidability [15]. Theorem 1. There exists a DL-Litecore TBox T and a DLXPathcore query α such that the problem C ERTAIN A NSWERS (α, T ) is undecidable. The proof uses similar techniques as the proof of Theorem 1 in [15], that shows the undecidability of the problem of computing certain answers for conjunctive queries with safe negation over DL-Lite R knowledge bases. Having this negative result, a natural direction is to search for DLXPath fragments that have decidable certain answers problem. Based on previous studies for XPath in XML and graph databases (see e.g., [3]), the most problematic primitive seems to be the negation ᾱ in path formulas. Indeed, we now show that removing this primitive from the syntax leads to decidability of the certain answers problem. We denote DLXPathpath-pos core the fragment of DLXPathcore that does not allow the negation ᾱ, for α a path formula, and define the class DLXPathpath-pos reg accordingly. We then have the following theorem. Theorem 2. There exists a DL-Litecore TBox T and a DLXPathpath-pos core query α such that the problem C ERTAIN A NSWERS (α, T ) is CO NP-hard. The problem is in CO NP for any DL-LiteR TBox T and DLXPathpath-pos reg query α. Proof (sketch). We first sketch the CO NP-hardness of the problem. Consider a language with role names P, T1 , T2 , T3 , F1 , F2 , and F3 , as well as concept name A. Fix a query h _ ^ i α= P hSivi [Aui ]i , v1 6=u1 ,v2 6=u2 ,v3 6=u3 i=1,2,3 where all vi and ui range over {⊤, ⊥}; Sivi is Ti if vi = ⊤ and Fi if vi = ⊥; Aui is A if ui = ⊤ and ¬A if vi′ = ⊥. Consider the complement of the NP-complete problem 3CNF-SAT whose input is a conjunction Φ of clauses of the form ℓ1 ∨ ℓ2 ∨ ℓ3 , where the ℓi are literals, that is, propositional variables or their negations, and whose answer is ‘yes’ if Φ is not satisfiable and ‘no’ otherwise. For each variable x in Φ the ABox A uses an individual cx . Also, for each clause γ in Φ it uses an individual cγ . Finally, it uses an individual c. For each clause γ in Φ the ABox A contains the assertion P (c, cγ ). Also, for each literal ℓi in each clause γ = ℓ1 ∨ ℓ2 ∨ ℓ3 of Φ the ABox A contains the assertion Ti (cγ , cx ) if ℓi = x and the assertion Fi (cγ , cx ) if ℓi = x̄. It is a technicality to show that (c, c) is a certain answer to α over the KB (A, ∅) if and only if Φ is not satisfiable. The CO NP upper bound can be obtained by a careful analysis of the proof of Theo- rem 4. ❑ In fact, we can see that the reduction for hardness is done for the empty TBox, that is all the complexity is ‘contained’ in the formulation of the fixed query (which does not use the transitive closure operator + ). While this theorem looks positive in light of the general undecidability result, the complexity might still be too high for practical applications, as it leads to algorithms which run in exponential time in the size of the ABox. To lower the complexity and obtain tractable algorithms, one could also consider fragments of DLXPath that do not allow any form of negation, neither in node nor in path formulas. We denote such frag- ments of DLXPathcore and DLXPathreg by DLXPathpos pos core and DLXPathreg , respectively. This last fragment is nothing else but nested 2RPQs [22], a language that has already been studied for DL-Lite knowledge bases [4], where an NL OG S PACE tight complexity bound was shown for the problem of computing certain answers for both DL-Lite core and DL-Lite R . Furthermore, by adapting known results from graph databases, it is not difficult to extend this result to show that the problem is already NL OG S PACE hard even for DLXPathpos core queries (see e.g., [2] for a survey on such techniques). Note that from this result we conclude that nesting and inverse in regular expres- sions, as well as fixed DL-Lite R TBoxes do not increase the tractable data complexity of query evaluation. 5 Combined complexity of DLXPath Query Evaluation Even if data complexity is the most important measure in practice, combined complex- ity, that is complexity under the assumption that both TBox, ABox and the query are given as input, allows us to get a better understanding of the query answering problem, and often provides a blueprint of how to solve the problem in practice. That is why we continue our study in this direction. Formally, we consider the following family of problems, where X ranges over {core, R}, y over {core, reg}, and z is either nothing, or ‘path-pos’, or ‘pos’. DL-Lite X C ERTAIN A NSWERS OVER DLXPathzy Input: DL-Lite X KB K, DLXPathzy query α, and pair (c1 , c2 ) of individuals Question: Is (c1 , c2 ) ∈ Certain(α, K)? As it immediately follows from Theorem 1, the problem remains undecidable for full DLXPathcore and, hence, for full DLXPathreg . However, the last result also fol- lows from the connection of DLXPath with propositional dynamic logic (PDL) and some well known properties of PDL. Since this connection will be heavily used in the remainder of the paper we now discuss it in more detail. First of all, we note that in PDL community a different terminology is used: for example, interpretations are called Kripke structures, concept names are called propo- sitional letters or variables, role names are atomic programs, and inverse operator is converse. Though, to be consistent with the rest of the paper we stay with the DL ter- minology. As already mentioned, node formulas of DLXPathpath-pos reg are, essentially, PDL with converse (CPDL) formulas (see [16] for a good introduction on the topic). Formally, the syntax of plain propositional dynamic logic (PDL) is the same as DLXPathpath-pos reg (i.e., CPDL), except that it does not allow the inverse Pj− of role names as path expressions. The standard problem in the PDL community is the satisfiability of a node formula ϕ; that is, checking whether there exists an interpretation I and an element d in its domain such that ϕ holds in d (written, I, d |= ϕ). Lutz et al. showed that the satisfiability of PDL formulas extended with arbitrary path negation (such a logic is denoted PDL¬ ) is undecidable [20], which already implies the undecidability of the certain answers problem DLXPathreg : indeed, a PDL¬ formula ϕ is satisfiable if and only if the DLXPathreg query [¬ϕ] has the certain answer (c, c) over the empty KB with individual c in the language. Note, however, that it does not immediately imply undecidability in data complexity shown in Theorem 1. Turning our attention to the certain answers problem for DLXPathpath-pos queries, we again use the connection with the theory of PDL. It is well-known that the satisfia- bility problem for PDL is EXPTIME-complete [16]. It remains in EXPTIME even if these formulas are allowed to use the transitive closure operator + only over role names. The same holds for CPDL [16] and PDL(¬) , that is, the extension of PDL which allows path negation in a limited form—only on role names [20]. Similarly to the previous undecidability result, the EXPTIME lower bounds for these classes of PDL formulas already imply EXPTIME-hardness of the certain answers problem for DLXPathpath-pos core , even for empty KBs. Also, in [4] hardness was established even for DLXPathpos reg . The upper bound is, however, much more challenging. To deal with DL-Lite R knowledge bases, in particular with role inclusions, we join the aforementioned ex- tensions of PDL with inverse and negation on role names, and consider the language CPDL(¬) whose node formulas obey the same grammar (1) as PDL (and DLXPath), and path formulas are defined as follows: α, β := ε | R | [ϕ] | α ∪ β | α · β | R | α+ . We need the following result to establish complexity of the certain answers problem. Theorem 3. Checking satisfiability of a CPDL(¬) node formula can be done in EXP- TIME. The proof makes use of ideas from [20] and [26]. Although it is not strictly related to description logics and knowledge representation, we state the result explicitly as we believe it might be of interest to the PDL community. Of course, satisfiability results do not transfer directly to query answering over KBs. However, widening and recasting the ideas from [12] and [13], we obtain the desired upper bound. Lemma 1. The problem DL-LiteR C ERTAIN A NSWERS OVER DLXPathpath-pos reg is in EXPTIME. Summing up, we obtain the following theorem (the hardness results follows from [16] and [4] and are included just for completeness). Theorem 4. The problem DL-LiteR C ERTAIN A NSWERS OVER DLXPathpath-pos reg is EXPTIME-complete. It remains EXPTIME-hard for DLXPathpos reg queries with DL- Litecore KBs, and for DLXPathpath-pos core even with empty KBs. The only remaining fragment is that of DLXPathpos core queries, that is, the restriction of the DLXPathcore language that use neither binary nor unary negation. In the previous section we saw that data complexity of answering DLXPathpos queries is the same as answering RPQs and 2RPQs, regardless of whether we made use of the regular or the core fragment. For combined complexity, the case is now different. We have already seen that query answering remains EXPTIME-hard for DLXPathpos . We now show that the restriction to the core fragment limits the complexity by almost one exponential. Theorem 5. The problem DL-LiteR C ERTAIN A NSWERS OVER DLXPathpos core is NP- complete. It remains NP-hard for DL-Litecore . Proof (sketch). For designing an NP algorithm we rethink the ideas from the proof of Theorem 4.4 in [3], where a similar problem was solved in the context of XML. First, we show that a pair (c1 , c2 ) is a certain answer to a query α over a satisfiable KB K if and only if (c1 , c2 ) ∈ JαKCan(K) , where Can(K) is the canonical model of K (see e.g., [7] for definition). Second, if (c1 , c2 ) is an answer, then this is witnessed by a sub- interpretation of Can(K) of polynomial size. Hence, the algorithm can just guess this sub-interpretation. To show hardness we give a reduction from the NP-complete problem 3CNF-SAT. Suppose we are given a conjunction Φ of clauses of the form ℓ1 ∨ ℓ2 ∨ ℓ3 , where ℓk are literals, that is propositional variables or their negations (we can assume that all literals in each clause are distinct). Let x1 , . . . , xn be the variables in Φ. Consider the KB K with the ABox A = {A(c)} and the TBox T consisting of the inclusions A ⊑ ∃(X1v )− , for v ∈ {⊤, ⊥}, ∃Xi−1 ⊑ ∃(Xiu )− , for v, u ∈ {⊤, ⊥}, 1 < i ≤ n. v For every variable xi and truth value v ∈ {⊤, ⊥} let ψxvi be the node formula h(Xn⊤ ∪ Xn⊥ ) · · · (Xi+1 ⊤ ⊥ ∪ Xi+1 ⊤ ) · Xiv · (Xi−1 ⊥ ∪ Xi−1 ) · · · (X1⊤ ∪ X1⊥ )i. Let ϕ be the node formula hα[ψ]i, where α = ((X1⊤ )− ∪ (X1⊥ )− ) · . . . · ((Xn⊤ )− ∪ (Xn⊥ )− ), and ψ is a conjunction of node formulas of the following form, for each clause γ in Φ, using variables x, y, z: _ ψxvx ∧ ψyvy ∧ ψzvz . (vx ,vy ,vz ) assignment of (x,y,z) satisfying γ Complexity DLXPathpos DLXPathpath-pos DLXPath Data NL OG S PACE -c CO NP-c undec. Combined NP-c∗ / EXPTIME-c† EXPTIME-c undec. Table 2. A summary of the complexity results. Here “-c” stands for “-complete” and “undec.” for “undecidable”. All the results hold for both DL-Lite core and DL-Lite R , as well as for both DLXPathcore and DLXPathreg ; the only exception is that ∗ holds just for DLXPathcore and † just for DLXPathreg . The new results of this paper are set off in bold. One can show that (c, c) is a certain answer to [ϕ] over K iff Φ is satisfiable. ❑ It is worth to mention, that the result holds even if the queries are not allowed to use the transitive closure operator + at all, being, essentially, very restricted form of unions of CQs but with the same complexity of query answering. Hence, the border between tractable and intractable combined complexity of the problem lies somewhere very close to here, leaving 2RPQs on one side and DLXPathpos core with CQs on the other. 6 Conclusions and Future Work In this paper we conducted a detailed study about using the XPath language to query on- tologies of the DL-Lite family. The results, summarised in Table 2, show that although the problem is generally undecidable, by limiting the amount of negation we can get decidable and even tractable fragments. The deep connection between XPath, DL and PDL allowed us to use ideas developed in other areas and gave insight on how query evaluation can be affected by certain aspects of the language. Although this connection was explored previously (see e.g., discussion in Chapters 13 and 14 in [6]), we believe that the time is ripe for a comprehensive survey describing how techniques from one area can be transferred to another and hope that the results of this paper can motivate such a survey. From a practical point of view, we would like to find and test the classes of queries that are of particular practical interest and have either tractable general algorithms or re- liable heuristics. Here we primarily want to tackle positive and path-positive fragments of XPath, as these queries were implemented and tested by the XML community, and many good heuristics have been developed over the years. Acknowledgements. We thank Diego Figueira for his precious comments. Reutter is funded by FONDECYT grant 11130648 and the Millennium Nucleus Center for Se- mantic Web Research under Grant NC120004. References 1. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. The DL-Lite family and relations. J. Artif. Intell. Res. (JAIR), 36:1–69, 2009. 2. P. Barceló. Querying graph databases. In PODS, pages 175–188, 2013. 3. M. Benedikt, W. Fan, and F. Geerts. XPath satisfiability in the presence of DTDs. J. ACM, 55(2), 2008. 4. M. Bienvenu, D. Calvanese, M. Ortiz, and M. Šimkus. Nested regular path queries in de- scription logics. In KR, 2014. 5. M. Bienvenu, M. Ortiz, and M. Šimkus. Conjunctive regular path queries in lightweight description logics. In IJCAI, 2013. 6. P. Blackburn, J. F. A. K. v. Benthem, and F. Wolter. Handbook of Modal Logic, Volume 3 (Studies in Logic and Practical Reasoning). Elsevier Science Inc., New York, NY, USA, 2006. 7. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. 8. D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Vardi. Containment of conjunctive reg- ular path queries with inverse. In 7th International Conference on Principles of Knowledge Representation and Reasoning (KR), pages 176–185, 2000. 9. D. Calvanese, T. Eiter, and M. Ortiz. Answering regular path queries in expressive descrip- tion logics: An automata-theoretic approach. In AAAI, pages 391–396, 2007. 10. D. Calvanese, E. Kharlamov, W. Nutt, and C. Thorne. Aggregate queries over ontologies. In R. Elmasri, M. Doerr, M. Brochhausen, and H. Han, editors, ONISW, pages 97–104. ACM, 2008. 11. D. Calvanese, M. Ortiz, and M. Šimkus. Containment of regular path queries under descrip- tion logic constraints. In IJCAI, pages 805–812, 2011. 12. G. De Giacomo and M. Lenzerini. Boosting the Correspondence between Description Logics and Propositional Dynamic Logics. In AAAI, pages 205–212, 1994. 13. G. De Giacomo and M. Lenzerini. TBox and ABox Reasoning in Expressive Description Logics. In KR, pages 316–327, 1996. 14. B. Glimm, C. Ogbuji, S. Hawke, I. Herman, B. Parsia, A. Polleres, and A. Seaborne. SPARQL 1.1 entailment regimes, 2013. W3C Recommendation 21 March 2013, http://www.w3.org/TR/2013/REC-sparql11- entailment-20130321/. 15. V. Gutiérrez-Basulto, Y. A. Ibáñez-Garcı́a, R. Kontchakov, and E. V. Kostylev. Conjunctive queries with negation over dl-lite: A closer look. In RR, pages 109–122, 2013. 16. D. Harel, D. Kozen, and J. Tiuryn. Dynamic Logic. MIT Press, 2000. 17. S. Harris and A. Seaborne. Sparql 1.1 query language. W3C working draft, 14, 2010. 18. E. V. Kostylev and J. L. Reutter. Answering counting aggregate queries over ontologies of the DL-Lite family. In AAAI, 2013. 19. L. Libkin, W. Martens, and D. Vrgoč. Querying graph databases with XPath. In ICDT, pages 129–140, 2013. 20. C. Lutz and D. Walther. PDL with Negation of Atomic Programs. Journal of Applied Non- Classical Logics, 15(2):189–213, 2005. 21. B. Motik, B. Cuenca Grau, I. Horrocks, Z. Wu, A. Fokoue, and C. Lutz. OWL 2 Web Ontology Language Profiles (2nd Edition), 2012. W3C Recommendation. 22. J. Pérez, M. Arenas, and C. Gutierrez. nSPARQL: A navigational language for RDF. Web Semantics: Science, Services and Agents on the World Wide Web, 8(4):255 – 270, 2010. 23. I. Robinson, J. Webber, and E. Eifrem. Graph databases. ” O’Reilly Media, Inc.”, 2013. 24. R. Rosati. The limits of querying ontologies. In Proc. of the 11th Int. Conf. on Database Theory (ICDT), volume 4353 of LNCS, pages 164–178. Springer, 2007. 25. M. Y. Vardi. The complexity of relational query languages (extended abstract). In STOC, pages 137–146, 1982. 26. M. Y. Vardi and P. Wolper. Automata-Theoretic Techniques for Modal Logics of Programs. J. Comput. Syst. Sci., 32(2):183–221, 1986. 27. XML Path Language (XPath) 2.0 (Second Edition). www.w3.org/TR/xpath20, 2010.