<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>XPath for DL-Lite Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Egor V. Kostylev</string-name>
          <email>egor.kostylev@cs.ox.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan L. Reutter</string-name>
          <email>jreutter@ing.puc.cl</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domagoj Vrgocˇ</string-name>
          <email>domagoj.vrgoc@ed.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Edinburgh</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Oxford</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Applications of description logics (DLs) such as OWL 2 and ontologybased data access (OBDA) require understanding of how to pose database queries over DL knowledge bases. While there have been many studies regarding traditional relational query formalisms such as conjunctive queries and their extensions, little attention has been paid to graph database queries, despite the fact that graph databases share the structure of interpretations with DLs; that is they describe essentially the same objects. In particular, not much is known about the interplay between DLs and XPath. The last is a powerful formalism for querying semistructured data: it is in the core of most practical query languages for XML trees, and it is also gaining popularity in theory and practice of graph databases. In this paper we make a step towards coupling knowledge bases and graph databases by studying how to answer powerful XPath-style queries over DL-Lite. We start with adapting the definition of XPath to the DL context, and then proceed to study the complexity of evaluating XPath queries over knowledge bases. Results show that, while query answering is undecidable for the full XPath, by carefully tuning the amount of negation allowed in the queries we can arrive to XPath fragments that have a potential to be used in practical applications.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Satisfiability and model checking have long been two central problems in the
knowledge representation community. But new applications of description logics (DLs for
short), such as the Web Ontology Language (OWL) [21] and ontology-based data
access (OBDA), are forcing us to develop algorithms solving more complex data
manipulation and extraction tasks, and in particular, require understanding how to answer
database-style queries over knowledge bases that are specified by DLs [14].</p>
      <p>The literature on knowledge bases usually considers relational queries, mostly
focusing on conjunctive queries (CQs) and their extensions with union [1, 7], forms of
negation [15, 24], or aggregates [10, 18]. It is also natural to look at queries that are
designed to operate over graph databases, since these share the same structure with DL
knowledge bases; namely they use unary and binary predicates (that is concepts and
roles in DL terminology) to represent graph nodes and edges. While the idea of using
graph queries in knowledge bases is not new (see e.g., [8]), we are only starting to
understand how such queries can be fine tuned for various ontology languages. So far the
specific research on this topic has primarily been concerned with the class of regular
path queries (RPQs; see e.g. [2]), one of the most basic graph query languages. We now
know how to evaluate such queries over various description logics [9], how to deal with
some of their extensions [4, 5] and understand how to solve their containment problem
in the presence of DL constraints [11].</p>
      <p>There are of course many other languages for querying graph and semi-structured
data, most notable amongst them being XPath. Originally designed to extract data from
XML trees [27], XPath has recently been adapted to work over graph databases [19]
and was shown to retain good evaluation properties while at the same time being more
powerful than RPQs and many of their extensions. Moreover, XPath also subsumes
navigational graph querying features of SPARQL [17], such as property paths, and other
commercial graph query languages (see e.g. Neo4j [23]), and thus can also be seen
as a querying primitive for more powerful languages. Given that XPath is capable of
expressing virtually all relevant querying primitives for graphs, and was tried and tested
by XML practitioners, it is natural to see if it will have the same impact as a language
for DL knowledge bases.</p>
      <p>In this paper we take a decisive step towards the implementation of graph query
languages over ontologies and study the problem of evaluating XPath queries over DL.
To that extent, we first adapt the XPath query language to work over ontologies, arriving
to DLXPath—an expressive language designed specifically for DL knowledge bases.
To get a flavour of the interplay between DLXPath and DLs, we then study the problem
of evaluating DLXPath queries over DL-Lite R and DL-Lite core ontologies. The choice
for this particular family of DLs is twofold: on the one hand, it is simple enough to start
such a research; on the other, this family is of the most importance in practice, since it
underlies OWL 2 QL profile [21].</p>
      <p>As a simple useful example of an DLXPath query, we can give the following: ‘Can
I fly from city A to city B making stops only in cities with UNESCO World Heritage
Sites which are endangered?’ Here we assume that the considered ontology has binary
predicates (roles) HasDirectFlight and HasUNESCOSite, as well as unary predicate
(concept) InDanger. Under this ontology this query cannot be answered by RPQs. But
note that it can, however, be expressed as a nested RPQ [4], as this language corresponds
to DLXPath without negation. However, if we additionally require the site of the last
intermediate stop to be included in the list not under cultural criteria (expressed by
concept Cultural), then such a query cannot be expressed by any known language for
knowledge bases, but can be written as DLXPath query.</p>
      <p>Regardless of the choice of a particular DL, one cannot expect that the study of
DLXPath query answering will come as a straightforward adaptation of known
techniques. The language of DLXPath contains expressive primitives such as transitive
closure and negation, that is, the operators which are known to bring in conceptual
difficulties even when studied in isolation. For example, it has been shown that adding negation
to conjunctive queries immediately leads to undecidability of query answering [15],
while adding path operators usually implies jumping one or two exponents in the
complexity of query evaluation algorithms [9]. Thus, one of the challenges when studying
DLXPath is to fine-tune these operators to archive an optimal trade-off between good
evaluation properties and expressive power of the query language. To this end, we will
distinguish two flavours of the language: the core fragment, denoted by DLXPathcore,
and the regular fragment, denoted by DLXPathreg. The two are designed to match the
duality present in the standard XPath query language [27], and while the core fragment
allows transitive closure only over basic role names, the regular fragment lifts this
restriction, allowing us to post more general path queries. Regarding negation, we will
distinguish full fragments, which allow both unary and binary negation, path-positive
fragments with only the unary one, and positive fragments without any negation.</p>
      <p>As usual when gauging the usefulness of a new query language for ontologies, we
start by considering data complexity of the query answering problem, where one
assumes that the query and the terminological knowledge (TBox) are fixed, while the only
input is the assertional knowledge (ABox). Although we show that for the most general
case the problem is undecidable, by limiting the amount of negation we obtain
expressive languages whose queries can be answered in CONP and even NLOGSPACE, both
over DL-Lite R and DL-Lite core. We then move to studying the combined complexity,
where both the knowledge base and the query form the input. Here we obtain bounds
ranging from NP-complete (thus matching the ones for ordinary CQs), to
EXPTIMEcomplete, and undecidable for the full language. We would like to note that some of
the results are obtained using the deep connection between DLs, DLXPath and
propositional dynamic logic, thus providing some interesting new techniques for answering
queries over ontologies.</p>
      <p>Proofs Due to the space restrictions, we only give sketches of short proofs and ideas of
longer ones.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>The language of DL-Lite R (and DL-Lite core) [1, 7] contains individuals c1, c2, . . .,
concept names A1, A2, . . ., and role names P1, P2, . . .. Concepts B and roles R are defined
by the grammar</p>
      <p>B ::=</p>
      <p>Ai | ∃R,</p>
      <p>R ::= Pj | Pj−.</p>
      <p>A DL-Lite R TBox is a finite set of concept and role inclusions of the form
B1 ⊑ B2,</p>
      <p>B1 ⊓ B2 ⊑ ⊥,</p>
      <p>R1 ⊑ R2,</p>
      <p>R1 ⊓ R2 ⊑ ⊥.</p>
      <p>A DL-Lite core TBox contains only concept inclusions. An ABox is a finite set of
assertions of the form Ai(ck) and Pj (ck, cℓ). A knowledge base (KB) is a pair (T , A), where
T is a TBox and A an ABox.</p>
      <p>An interpretation I = (∆ I , ·I ) is a nonempty domain ∆ I of elements with an
interpretation function ·I that assigns an element ckI ∈ ∆ I to each individual ck, a
subset AiI of ∆ I to each concept name Ai, and a binary relation PjI ⊆ ∆ I × ∆ I to
each role name Pj . When dealing with DL-Lite it is usual to adopt the unique name
assumption (UNA), and we do so here by requiring that ckI 6= cℓI , for all individuals
ck 6= cℓ. Our results, however, do not depend on UNA. The interpretation function ·I is
extended to roles and concepts in the following standard way:
(∃R)I =</p>
      <p>d ∈ ∆ I | there is d′ ∈ ∆ I with (d, d′) ∈ RI ,
(Pj−)I = {(d′, d) ∈ ∆ I × ∆ I | (d, d′) ∈ PjI }.</p>
      <p>The satisfaction relation |= for TBox inclusions and ABox assertions is also standard:
I |= B1 ⊑ B2
I |= B1 ⊓ B2 ⊑ ⊥
I |= R1 ⊑ R2
I |= R1 ⊓ R2 ⊑ ⊥
I |= Ai(ck)
I |= Pj (ck, cℓ)
iff
iff
iff
iff
iff
iff</p>
      <p>B1I ⊆ BI ,</p>
      <p>2
B1I ∩ B2I = ∅,
R1I ⊆ R2I ,
R1I ∩ R2I = ∅,
ckI ∈ AiI ,
(ckI , cℓI ) ∈ PjI .</p>
      <p>A KB K = (T , A) is satisfiable if there is an interpretation I satisfying all
inclusions of T and assertions of A. In this case we write I |= K and say that I is a model
of K.
3</p>
    </sec>
    <sec id="sec-3">
      <title>DLXPath: XPath for Knowledge Bases</title>
      <p>As mentioned in the introduction, the family of DL-Lite was designed not only to keep
satisfiability and model checking problems simple, but mainly to keep the complexity
of conjunctive query answering the same as in the case of relational databases, while
simultaneously maximising the expressive power of the ontological language. This allows
to use DL-Lite as a foundation for practical data managing applications, such as OWL
2 QL and OBDA. However, this does not automatically mean that other useful query
formalisms with good evaluation properties over databases also have good properties
when posed over DL-Lite knowledge bases. Hence, each class of queries which can be
useful in knowledge base applications requires a separate research on its computational
properties.</p>
      <p>In this paper we concentrate on an adaptation of XPath query language for XML
trees to knowledge bases. Recently it was shown that (a version of) XPath can be
successfully used for querying graph databases [19]. Every interpretation of a DL-Lite
vocabulary can be seen as a graph, and hence every DL-Lite KB is an incomplete
description of a graph. That is why we expect our adaptation DLXPath for querying
knowledge bases to be useful in practical applications.</p>
      <p>
        In what follows, we will consider several fragments of DLXPath. We start with
DLXPathcore, the fragment which corresponds to core XPath, the theoretical foundation
of most practical languages for querying XML trees.4
Definition 1. Node formulas ϕ, ψ and path formulas α, β of DLXPathcore are
expressions satisfying the grammar
ϕ, ψ := A | ¬ϕ | ϕ ∧ ψ |ϕ ∨ ψ | hαi ,
α, β := ε | R | [ϕ] | α ∪ β | α · β | α | R+,
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
where A ranges over concept names and R ranges over roles (i.e., role names and their
inverses).
      </p>
      <p>Semantics J·KI of DLXPathcore for an interpretation I associates subsets of ∆ I to
node formulas and binary relations on ∆ I to path formulas as given in Table 1.
4 The subscript ‘core’ is used in this paper for two unrelated purposes. This matching is
historical and accidental, but we decided to stay with conventional notation despite this undesired
collision.</p>
      <p>JAKI = AI</p>
      <p>J¬ϕKI = ΔI \ JϕKI
Jϕ ∧ ψKI = JϕKI ∩ JψKI
Jϕ ∨ ψKI = JϕKI ∪ JψKI</p>
      <p>JhαiKI = {d | there exists d′ such that (d, d′) ∈ JαKI}</p>
      <p>JεKI = {(d, d) | d ∈ ΔI }
JRKI = RI</p>
      <p>J[ϕ]KI = {(d, d) | d ∈ JϕKI}
Jα ∪ βKI = JαKI ∪ JβKI
Jα · βKI = JαKI ◦ JβKI</p>
      <p>JαKI = (ΔI × ΔI ) \ JαKI</p>
      <p>JR+KI is the transitive closure of RI</p>
      <p>As usual when dealing with ontologies, our interest is not query answering for a
particular interpretation, but computing those answers that are true in all possible
models of the knowledge base. Formally, let K = (T , A) be a knowledge base and α a
DLXPath path formula. The certain answers of α over K, denoted Certain(α, K), is
the set of all pairs (c1, c2) of individuals such that (c1I , c2 ) ∈ JαKI for all models I
I
of K. Similarly, one can define certain answers Certain(ϕ, K) for a DLXPath node
formula ϕ as the set of all individuals c such that cI ∈ JϕKI , for all I models of K.
In the paper all of the results will be stated for path formulas (queries from here on),
however, they remain unchanged for node formulas.</p>
      <p>Example 1. Coming back to the example from the introduction, consider role names
HasDirectFlight, which connects cities with a direct flight, and HasUNESCOSite, which
connects cities with their UNESCO world heritage sites, as well as concept InDanger
denoting that a particular site is endangered. Let KB K represents the flight destination
graph, as well as knowledge about heritage sites, partially explicitly in the ABox, and
partially implicitly, by means of TBox inclusions. Then checking whether it is possible
to fly from Edinburgh to a city that has an endangered UNESCO world heritage site is
equivalent to checking if Edinburgh is in Certain(ϕ, K), where ϕ is a node formula
hHasDirectFlight+[hHasUNESCOSite[InDanger]i]i.</p>
      <p>As already noted, the formalism of core XPath is in the nutshell of the most
practical query languages for XML trees. However, in [19] it was shown that the
corresponding graph language cannot express certain properties that are deemed essential when
querying graphs. Thus, besides DLXPathcore we also consider its generalisation called
regular DLXPath, or DLXPathreg, which extends the core fragment by allowing the use
of transitive closure operator + over arbitrary path formulas. Formally, path formulas
of DLXPathreg satisfy the grammar</p>
      <p>
        α, β := ε | R | [ϕ] | α ∪ β | α · β | α | α+,
while node formulas remain the same as for DLXPathcore in grammar (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ). As expected,
the semantics Jα+KI over an interpretation I is the transitive closure of JαKI .
Example 2. With full transitive closure we are now able to post more complex queries
than in the core fragment. Consider again the knowledge base K from Example 1. We
can now ask if it is possible to fly from Liverpool to Jerusalem, making stopovers only
in places that have an endangered UNESCO world heritage site by checking if the pair
(Liverpool, J erusalem) is in Certain(α, K), where α is a query
      </p>
      <p>(HasDirectFlight[hHasUNESCOSite[InDanger]i])+.</p>
      <p>Note that here we check if each of the cities along the path has an endangered site. If
we want to additionally require that the sites along the route are not included in the
UNESCO list under cultural criteria, we need to replace [InDanger] with [InDanger ∧
¬Cultural] in the query above.</p>
      <p>Besides these two query languages, we will consider their fragments, which will be
introduced as needed.</p>
      <p>Before passing on to the complexity of DLXPath query evaluation, we briefly
compare our languages with other formalisms. First, DLXPath is clearly incomparable with
CQs. However, all tree-shaped CQs and unions of CQs with no more than two free
variables can be written as DLXPathcore queries. On the other hand, all negation-free and
+-free DLXPathreg queries can be written as unions of CQs, though with a cost of
exponential blow-up. If we allow negation but only over concept and role names, then a
query can be written as a union of CQs with safe negation. Second, DLXPathreg queries
without negation and node tests [ϕ] are 2-way regular path queries (2RPQs), the
standard formalism for querying graph databases. If, additionally, role inverses are not
allowed as path formulas, it becomes plain RPQs, that is, essentially, regular expressions.
Some fragments of DLXPath can be expressed in other description logic formalisms.
For example, it is well-known that a unary tree-shaped CQ with the corresponding tree
being rooted and directed, can be written as a concept of E L, a DL underlying OWL 2
EL profile [21]. Finally, DLXPathreg node formulas without path negation are nothing
else but propositional dynamic logic with converse (CPDL) formulas; we will discuss
this connection in more detail later on.</p>
      <p>In the following sections we analyse the complexity of evaluating DLXPath queries
over DL-Lite knowledge bases.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Data Complexity of DLXPath Query Evaluation</title>
      <p>As it is widely accepted in theory and proved in practice, the size of the query and
TBox is usually much smaller than the size of the ABox (see e.g., [25] for discussion
in the relational database context and [7] for DLs). This is why one usually considers
data complexity of query answering, assuming that the TBox and the query are fixed,
and only ABox is part of the input. In this section we study this problem for various
fragments of DLXPath. Formally, let T be a TBox and α a DLXPath query. We are
interested in the following family of problems.</p>
      <p>CERTAIN ANSWERS (α, T )</p>
      <p>Input: ABox A and pair (c1, c2) of individuals</p>
      <p>Question: Is (c1, c2) ∈ Certain(α, (A, T ))?</p>
      <p>As it was previously mentioned, data complexity of CQ query answering over
DLLite R knowledge bases is the same as over relational databases, that is, in LOGSPACE
[7]. At the first glance, one may expect a similar result in the case of DLXPath, where
the complexity is NLOGSPACE-complete over graph databases [19]. However, the
combination of the open world assumption and allowing negation in queries makes things
quite different. In fact, the situation here is more like in the case of CQ with safe
negation, where the complexity jump is dramatic: from polynomiality to undecidability [15].
Theorem 1. There exists a DL-Litecore TBox T and a DLXPathcore query α such that
the problem CERTAIN ANSWERS (α, T ) is undecidable.</p>
      <p>The proof uses similar techniques as the proof of Theorem 1 in [15], that shows
the undecidability of the problem of computing certain answers for conjunctive queries
with safe negation over DL-Lite R knowledge bases.</p>
      <p>Having this negative result, a natural direction is to search for DLXPath fragments
that have decidable certain answers problem. Based on previous studies for XPath in
XML and graph databases (see e.g., [3]), the most problematic primitive seems to be the
negation α¯ in path formulas. Indeed, we now show that removing this primitive from the
syntax leads to decidability of the certain answers problem. We denote DLXPathpath-pos
core
the fragment of DLXPathcore that does not allow the negation α¯, for α a path formula,
and define the class DLXPathpath-pos accordingly. We then have the following theorem.</p>
      <p>reg
Theorem 2. There exists a DL-Litecore TBox T and a DLXPathpath-pos query α such
core
that the problem CERTAIN ANSWERS (α, T ) is CONP-hard. The problem is in CONP
for any DL-LiteR TBox T and DLXPathpath-pos query α.</p>
      <p>reg
Proof (sketch). We first sketch the CONP-hardness of the problem. Consider a language
with role names P, T1, T2, T3, F1, F2, and F3, as well as concept name A. Fix a query
α = hP
_</p>
      <p>^
v16=u1,v26=u2,v36=u3
i=1,2,3
hSivi [Aui ]i
i ,
where all vi and ui range over {⊤, ⊥}; Sivi is Ti if vi = ⊤ and Fi if vi = ⊥; Aui is A
if ui = ⊤ and ¬A if vi′ = ⊥.</p>
      <p>Consider the complement of the NP-complete problem 3CNF-SAT whose input
is a conjunction Φ of clauses of the form ℓ1 ∨ ℓ2 ∨ ℓ3, where the ℓi are literals, that
is, propositional variables or their negations, and whose answer is ‘yes’ if Φ is not
satisfiable and ‘no’ otherwise.</p>
      <p>For each variable x in Φ the ABox A uses an individual cx. Also, for each clause γ
in Φ it uses an individual cγ . Finally, it uses an individual c.</p>
      <p>For each clause γ in Φ the ABox A contains the assertion P (c, cγ ). Also, for each
literal ℓi in each clause γ = ℓ1 ∨ ℓ2 ∨ ℓ3 of Φ the ABox A contains the assertion
Ti(cγ , cx) if ℓi = x and the assertion Fi(cγ , cx) if ℓi = x¯.</p>
      <p>It is a technicality to show that (c, c) is a certain answer to α over the KB (A, ∅) if
and only if Φ is not satisfiable.</p>
      <p>The CONP upper bound can be obtained by a careful analysis of the proof of
Theorem 4. ❑</p>
      <p>In fact, we can see that the reduction for hardness is done for the empty TBox, that
is all the complexity is ‘contained’ in the formulation of the fixed query (which does
not use the transitive closure operator +).</p>
      <p>While this theorem looks positive in light of the general undecidability result, the
complexity might still be too high for practical applications, as it leads to algorithms
which run in exponential time in the size of the ABox. To lower the complexity and
obtain tractable algorithms, one could also consider fragments of DLXPath that do not
allow any form of negation, neither in node nor in path formulas. We denote such
fragments of DLXPathcore and DLXPathreg by DLXPathcpoorse and DLXPathrpeogs, respectively.
This last fragment is nothing else but nested 2RPQs [22], a language that has already
been studied for DL-Lite knowledge bases [4], where an NLOGSPACE tight complexity
bound was shown for the problem of computing certain answers for both DL-Lite core
and DL-Lite R. Furthermore, by adapting known results from graph databases, it is not
difficult to extend this result to show that the problem is already NLOGSPACE hard even
for DLXPathcpoorse queries (see e.g., [2] for a survey on such techniques).</p>
      <p>Note that from this result we conclude that nesting and inverse in regular
expressions, as well as fixed DL-Lite R TBoxes do not increase the tractable data complexity
of query evaluation.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Combined complexity of DLXPath Query Evaluation</title>
      <p>Even if data complexity is the most important measure in practice, combined
complexity, that is complexity under the assumption that both TBox, ABox and the query are
given as input, allows us to get a better understanding of the query answering problem,
and often provides a blueprint of how to solve the problem in practice. That is why
we continue our study in this direction. Formally, we consider the following family of
problems, where X ranges over {core, R}, y over {core, reg}, and z is either nothing,
or ‘path-pos’, or ‘pos’.</p>
      <p>DL-Lite X CERTAIN ANSWERS OVER DLXPathz
y
Input: DL-Lite X KB K, DLXPathzy query α,</p>
      <p>and pair (c1, c2) of individuals</p>
      <p>Question: Is (c1, c2) ∈ Certain(α, K)?</p>
      <p>As it immediately follows from Theorem 1, the problem remains undecidable for
full DLXPathcore and, hence, for full DLXPathreg. However, the last result also
follows from the connection of DLXPath with propositional dynamic logic (PDL) and
some well known properties of PDL. Since this connection will be heavily used in the
remainder of the paper we now discuss it in more detail.</p>
      <p>First of all, we note that in PDL community a different terminology is used: for
example, interpretations are called Kripke structures, concept names are called
propositional letters or variables, role names are atomic programs, and inverse operator is
converse. Though, to be consistent with the rest of the paper we stay with the DL
terminology.</p>
      <p>As already mentioned, node formulas of DLXPathpath-pos are, essentially, PDL with
reg
converse (CPDL) formulas (see [16] for a good introduction on the topic). Formally, the
syntax of plain propositional dynamic logic (PDL) is the same as DLXPathpath-pos (i.e.,
reg
CPDL), except that it does not allow the inverse Pj− of role names as path expressions.
The standard problem in the PDL community is the satisfiability of a node formula ϕ;
that is, checking whether there exists an interpretation I and an element d in its domain
such that ϕ holds in d (written, I, d |= ϕ).</p>
      <p>Lutz et al. showed that the satisfiability of PDL formulas extended with arbitrary
path negation (such a logic is denoted PDL¬) is undecidable [20], which already implies
the undecidability of the certain answers problem DLXPathreg: indeed, a PDL¬ formula
ϕ is satisfiable if and only if the DLXPathreg query [¬ϕ] has the certain answer (c, c)
over the empty KB with individual c in the language. Note, however, that it does not
immediately imply undecidability in data complexity shown in Theorem 1.</p>
      <p>Turning our attention to the certain answers problem for DLXPathpath-pos queries,
we again use the connection with the theory of PDL. It is well-known that the
satisfiability problem for PDL is EXPTIME-complete [16]. It remains in EXPTIME even if
these formulas are allowed to use the transitive closure operator + only over role names.
The same holds for CPDL [16] and PDL(¬), that is, the extension of PDL which allows
path negation in a limited form—only on role names [20]. Similarly to the previous
undecidability result, the EXPTIME lower bounds for these classes of PDL formulas
already imply EXPTIME-hardness of the certain answers problem for DLXPathpath-pos,
core
even for empty KBs. Also, in [4] hardness was established even for DLXPathpos
reg .</p>
      <p>
        The upper bound is, however, much more challenging. To deal with DL-Lite R
knowledge bases, in particular with role inclusions, we join the aforementioned
extensions of PDL with inverse and negation on role names, and consider the language
CPDL(¬) whose node formulas obey the same grammar (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) as PDL (and DLXPath),
and path formulas are defined as follows:
      </p>
      <p>α, β := ε | R | [ϕ] | α ∪ β | α · β | R | α+.</p>
      <p>We need the following result to establish complexity of the certain answers problem.
Theorem 3. Checking satisfiability of a CPDL(¬) node formula can be done in
EXPTIME.</p>
      <p>The proof makes use of ideas from [20] and [26]. Although it is not strictly related
to description logics and knowledge representation, we state the result explicitly as we
believe it might be of interest to the PDL community.</p>
      <p>Of course, satisfiability results do not transfer directly to query answering over KBs.
However, widening and recasting the ideas from [12] and [13], we obtain the desired
upper bound.</p>
      <p>Lemma 1. The problem DL-LiteR CERTAIN ANSWERS OVER DLXPathpath-pos is in
reg
EXPTIME.</p>
      <p>Summing up, we obtain the following theorem (the hardness results follows from
[16] and [4] and are included just for completeness).</p>
      <p>Theorem 4. The problem DL-LiteR CERTAIN ANSWERS OVER DLXPathpath-pos is
reg
EXPTIME-complete. It remains EXPTIME-hard for DLXPathrpeogs queries with
DLLitecore KBs, and for DLXPathpath-pos even with empty KBs.</p>
      <p>core</p>
      <p>The only remaining fragment is that of DLXPathcpoorse queries, that is, the restriction
of the DLXPathcore language that use neither binary nor unary negation. In the previous
section we saw that data complexity of answering DLXPathpos queries is the same as
answering RPQs and 2RPQs, regardless of whether we made use of the regular or the
core fragment. For combined complexity, the case is now different. We have already
seen that query answering remains EXPTIME-hard for DLXPathpos. We now show
that the restriction to the core fragment limits the complexity by almost one exponential.
Theorem 5. The problem DL-LiteR CERTAIN ANSWERS OVER DLXPathcpoorse is
NPcomplete. It remains NP-hard for DL-Litecore.</p>
      <p>Proof (sketch). For designing an NP algorithm we rethink the ideas from the proof of
Theorem 4.4 in [3], where a similar problem was solved in the context of XML. First,
we show that a pair (c1, c2) is a certain answer to a query α over a satisfiable KB K
if and only if (c1, c2) ∈ JαKCan(K), where Can(K) is the canonical model of K (see
e.g., [7] for definition). Second, if (c1, c2) is an answer, then this is witnessed by a
subinterpretation of Can(K) of polynomial size. Hence, the algorithm can just guess this
sub-interpretation.</p>
      <p>To show hardness we give a reduction from the NP-complete problem 3CNF-SAT.
Suppose we are given a conjunction Φ of clauses of the form ℓ1 ∨ ℓ2 ∨ ℓ3, where ℓk are
literals, that is propositional variables or their negations (we can assume that all literals
in each clause are distinct). Let x1, . . . , xn be the variables in Φ.</p>
      <p>Consider the KB K with the ABox A = {A(c)} and the TBox T consisting of the
inclusions</p>
      <p>A ⊑ ∃(X1v)−, for v ∈ {⊤, ⊥},
∃Xiv−1 ⊑ ∃(Xiu)−, for v, u ∈ {⊤, ⊥}, 1 &lt; i ≤ n.</p>
      <p>For every variable xi and truth value v ∈ {⊤, ⊥} let ψxvi be the node formula
h(Xn⊤ ∪ Xn⊥) · · · (Xi⊤+1 ∪ Xi⊥+1) · Xiv · (Xi⊤−1 ∪ Xi⊥−1) · · · (X1⊤ ∪ X1⊥)i.
Let ϕ be the node formula hα[ψ]i, where</p>
      <p>α = ((X1⊤)− ∪ (X1⊥)−) · . . . · ((Xn⊤)− ∪ (Xn⊥)−),
and ψ is a conjunction of node formulas of the following form, for each clause γ in Φ,
using variables x, y, z:</p>
      <p>_
(vx,vy,vz) assignment of (x,y,z)
satisfying γ</p>
      <p>ψxvx ∧ ψyvy ∧ ψzvz .</p>
      <p>Complexity DLXPathpos</p>
      <p>DLXPathpath-pos DLXPath
Data
Combined</p>
      <p>NLOGSPACE-c CONP-c undec.</p>
      <p>NP-c∗ / EXPTIME-c† EXPTIME-c undec.</p>
      <p>One can show that (c, c) is a certain answer to [ϕ] over K iff Φ is satisfiable.
❑</p>
      <p>It is worth to mention, that the result holds even if the queries are not allowed to
use the transitive closure operator + at all, being, essentially, very restricted form of
unions of CQs but with the same complexity of query answering. Hence, the border
between tractable and intractable combined complexity of the problem lies somewhere
very close to here, leaving 2RPQs on one side and DLXPathcpoorse with CQs on the other.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>In this paper we conducted a detailed study about using the XPath language to query
ontologies of the DL-Lite family. The results, summarised in Table 2, show that although
the problem is generally undecidable, by limiting the amount of negation we can get
decidable and even tractable fragments. The deep connection between XPath, DL and
PDL allowed us to use ideas developed in other areas and gave insight on how query
evaluation can be affected by certain aspects of the language. Although this connection
was explored previously (see e.g., discussion in Chapters 13 and 14 in [6]), we believe
that the time is ripe for a comprehensive survey describing how techniques from one
area can be transferred to another and hope that the results of this paper can motivate
such a survey.</p>
      <p>From a practical point of view, we would like to find and test the classes of queries
that are of particular practical interest and have either tractable general algorithms or
reliable heuristics. Here we primarily want to tackle positive and path-positive fragments
of XPath, as these queries were implemented and tested by the XML community, and
many good heuristics have been developed over the years.</p>
      <p>
        Acknowledgements. We thank Diego Figueira for his precious comments. Reutter is
funded by FONDECYT grant 11130648 and the Millennium Nucleus Center for
Semantic Web Research under Grant NC120004.
3. M. Benedikt, W. Fan, and F. Geerts. XPath satisfiability in the presence of DTDs. J. ACM,
55(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), 2008.
4. M. Bienvenu, D. Calvanese, M. Ortiz, and M. Sˇimkus. Nested regular path queries in
description logics. In KR, 2014.
5. M. Bienvenu, M. Ortiz, and M. Sˇimkus. Conjunctive regular path queries in lightweight
description logics. In IJCAI, 2013.
6. P. Blackburn, J. F. A. K. v. Benthem, and F. Wolter. Handbook of Modal Logic, Volume 3
(Studies in Logic and Practical Reasoning). Elsevier Science Inc., New York, NY, USA,
2006.
7. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning
and efficient query answering in description logics: The DL-Lite family. J. of Automated
Reasoning, 39(3):385–429, 2007.
8. D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Vardi. Containment of conjunctive
regular path queries with inverse. In 7th International Conference on Principles of Knowledge
Representation and Reasoning (KR), pages 176–185, 2000.
9. D. Calvanese, T. Eiter, and M. Ortiz. Answering regular path queries in expressive
description logics: An automata-theoretic approach. In AAAI, pages 391–396, 2007.
10. D. Calvanese, E. Kharlamov, W. Nutt, and C. Thorne. Aggregate queries over ontologies. In
R. Elmasri, M. Doerr, M. Brochhausen, and H. Han, editors, ONISW, pages 97–104. ACM,
2008.
11. D. Calvanese, M. Ortiz, and M. Sˇimkus. Containment of regular path queries under
description logic constraints. In IJCAI, pages 805–812, 2011.
12. G. De Giacomo and M. Lenzerini. Boosting the Correspondence between Description Logics
and Propositional Dynamic Logics. In AAAI, pages 205–212, 1994.
13. G. De Giacomo and M. Lenzerini. TBox and ABox Reasoning in Expressive Description
      </p>
      <p>Logics. In KR, pages 316–327, 1996.
14. B. Glimm, C. Ogbuji, S. Hawke, I. Herman, B. Parsia, A. Polleres, and A. Seaborne.</p>
      <p>SPARQL 1.1 entailment regimes, 2013. W3C Recommendation 21 March 2013,
http://www.w3.org/TR/2013/REC-sparql11- entailment-20130321/.
15. V. Gutie´rrez-Basulto, Y. A. Iba´n˜ez-Garc´ıa, R. Kontchakov, and E. V. Kostylev. Conjunctive
queries with negation over dl-lite: A closer look. In RR, pages 109–122, 2013.
16. D. Harel, D. Kozen, and J. Tiuryn. Dynamic Logic. MIT Press, 2000.
17. S. Harris and A. Seaborne. Sparql 1.1 query language. W3C working draft, 14, 2010.
18. E. V. Kostylev and J. L. Reutter. Answering counting aggregate queries over ontologies of
the DL-Lite family. In AAAI, 2013.
19. L. Libkin, W. Martens, and D. Vrgocˇ. Querying graph databases with XPath. In ICDT, pages
129–140, 2013.
20. C. Lutz and D. Walther. PDL with Negation of Atomic Programs. Journal of Applied
Non</p>
      <p>
        Classical Logics, 15(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ):189–213, 2005.
21. B. Motik, B. Cuenca Grau, I. Horrocks, Z. Wu, A. Fokoue, and C. Lutz. OWL 2 Web
      </p>
      <p>Ontology Language Profiles (2nd Edition), 2012. W3C Recommendation.
22. J. Pe´rez, M. Arenas, and C. Gutierrez. nSPARQL: A navigational language for RDF. Web</p>
      <p>Semantics: Science, Services and Agents on the World Wide Web, 8(4):255 – 270, 2010.
23. I. Robinson, J. Webber, and E. Eifrem. Graph databases. ” O’Reilly Media, Inc.”, 2013.
24. R. Rosati. The limits of querying ontologies. In Proc. of the 11th Int. Conf. on Database</p>
      <p>Theory (ICDT), volume 4353 of LNCS, pages 164–178. Springer, 2007.
25. M. Y. Vardi. The complexity of relational query languages (extended abstract). In STOC,
pages 137–146, 1982.
26. M. Y. Vardi and P. Wolper. Automata-Theoretic Techniques for Modal Logics of Programs.</p>
      <p>
        J. Comput. Syst. Sci., 32(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ):183–221, 1986.
27. XML Path Language (XPath) 2.0 (Second Edition). www.w3.org/TR/xpath20, 2010.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Artale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>The DL-Lite family and relations</article-title>
          .
          <source>J. Artif. Intell. Res. (JAIR)</source>
          ,
          <volume>36</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>69</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Barcelo</surname>
          </string-name>
          <article-title>´. Querying graph databases</article-title>
          .
          <source>In PODS</source>
          , pages
          <fpage>175</fpage>
          -
          <lpage>188</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>