=Paper= {{Paper |id=Vol-1350/paper-34 |storemode=property |title=The Combined Complexity of Reasoning with Closed Predicates in Description Logics |pdfUrl=https://ceur-ws.org/Vol-1350/paper-34.pdf |volume=Vol-1350 |dblpUrl=https://dblp.org/rec/conf/dlog/NgoOS15 }} ==The Combined Complexity of Reasoning with Closed Predicates in Description Logics== https://ceur-ws.org/Vol-1350/paper-34.pdf
            The Combined Complexity of Reasoning
          with Closed Predicates in Description Logics∗

                  Nhung Ngo1 , Magdalena Ortiz2 , and Mantas Šimkus2
                               1
                           Free University of Bozen-Bolzano
                               ngo@inf.unibz.it
                         2
                           Vienna University of Technology
              ortiz@kr.tuwien.ac.at|simkus@dbai.tuwien.ac.at



        Abstract. We investigate the computational cost of combining the open and
        closed world assumptions in Description Logics. Unlike previous works, which
        have considered data complexity and focused mostly on lightweight DLs, we
        study combined complexity for a wide range of DLs, from lightweight to very
        expressive ones. From existing results for the standard setting, where all predicates
        are interpreted under the open world assumption, and the well-known relationship
        between closed predicates and concept constructors like disjunction and nominals,
        we infer bounds on the combined complexity of reasoning in the presence of
        closed predicates. We show that standard reasoning requires exponential time even
        for weak logics like EL, while answering conjunctive queries becomes hard at
        least for coNE XP T IME, and in most cases even for 2E XP T IME. An important
        stepping stone for our results, that is interesting on its own right, is to prove
        that conjunctive query answering in (plain) ALCO is hard for coNE XP T IME
        in combined complexity. This singles out nominals as a previously unidentified
        source of additional complexity when answering queries over expressive DLs.


1     Introduction

As fragments of classical first-order predicate logic, description logics (DLs) have an
open-world semantics. That is, knowledge bases (KBs) are interpreted as the set of
all relational structures that satisfy what is explicitly stated in the KB, and where any
statement whose truth is not directly implied by the knowledge base can be interpreted
in an arbitrary way. However, this open-world view of DLs is not the most adequate in
all cases, and in particular, when DLs are used to describe domain-specific knowledge to
be leveraged when querying data sources, but the sources stem from traditional (closed-
world) databases that fully describe the instances that are to be included in a relation.
For example, when the students enrolled in some course are extracted from a database,
this information should be considered complete, and query answering algorithms should
exploit this to exclude irrelevant models and infer more query answers.
     Combining open and closed world reasoning is not a new topic in DLs [3], but it
has received renewed attention in recent years [20,19,7,30]. A prominent proposal for
achieving partial closed world reasoning is to use DBoxes instead of ABoxes as the
∗
    This work has been supported by the Austrian Science Fund (FWF) projects T515 and P25207.
assertional component of KBs [30]. Syntactically, a DBox looks just like an ABox, but
semantically, it is interpreted like a database: the instances of the concepts and roles in
the DBox are given exactly by the assertions it contains, and the unique name assumption
is made for the active domain of the individuals occurring in it. We follow a recent
generalization of this setting, where instead of replacing ABoxes by DBoxes, we enrich
the terminological component of KBs with a set of concepts and roles that are to be
interpreted as closed predicates [19]. In this way, some ABox assertions are interpreted
under closed semantics, as in DBoxes, while others are considered open, as in ABoxes.
    There are not many works studying the complexity of reasoning with closed pred-
icates. For the DL-Lite family and for EL, the data complexity of ontology mediated
query answering has been considered [7,19]. The authors of these works consider a
conjunctive query together with a terminological component (a TBox and possibly a set
of closed predicates), and study the complexity of answering such a query over an input
data instance (an ABox or a DBox). Under the standard open-world semantics for all
predicates, this is a central problem that has received great attention in the last decade in
the DL community. Most research focuses on the cases where the problem is tractable,
or even first-order rewritable. Unfortunately, the tractability of query answering is lost
in the presence of DBoxes, even for the core fragments of DL-Lite and EL [7]. In a
nutshell, closed predicates cause the convexity property to be lost, allowing a KB to
entail a disjunction of facts without entailing any of the disjuncts. An in-depth analysis
of this and a careful classification of tractable cases can be found in [19].
    In this paper, we take a closer look at the computational complexity of reasoning
in the presence of closed predicates. Unlike previous works, we consider the combined
complexity of reasoning, that is, not only the data is considered as an input, but also
the terminological information, and in the case of query answering, the query. Rather
than focusing on a few lightweight DLs, we consider a range of logics including very
expressive ones, and use existing results in the literature to infer many tight bounds on the
computational complexity of query answering. It was shown already in [30] that closed
predicates can be simulated, under the standard open world semantics, in any extension
of ALC that supports nominals, and conversely, nominals can be simulated by closed
predicates (the latter does not depend on any of the availability of any the constructs
of ALC). It is also easy to show that closed predicates suffice to easily express full
disjunction and atomic negation in any logic supporting qualified existential restrictions,
hence adding closed predicates to plain EL already results in the full expressiveness of
ALCO, and makes standard reasoning require exponential time in the worst case.
    For query answering, we build on the reduction from ALC to EL to show that the
constructors that make query answering 2E XP T IME-hard in extensions of ALC (namely
inverses [17], or transitive roles together with role hierarchies [6]), have the same effect
in the analogous extensions of EL in the presence of closed predicates (i.e., ELI and
ELHtrans ). However, since the precise complexity of query answering in ALCO remains
open, we cannot infer tight bounds for the extensions of EL with closed predicates that do
not support these additional constructs. This leads us to the main technical contribution
of the paper: we show that conjunctive query answering over ALCO (with the standard
open-world semantics) is coNE XP T IME-hard. Hence the same holds in the presence
of closed predicates for EL and its extensions. Although we leave a matching upper
bound open for future work, we exhibit nominals (or closed predicates) as a previously
unidentified source of increased complexity for query answering in expressive DLs.

2    Preliminaries
We assume the reader is familiar with DLs and, in particular, with EL and ALCO.
We use NC and NR for concept names and roles, respectively. The notions of a TBox
T , an ABox A, and an interpretation I = (∆I , ·I ) are defined in the usual way. The
notions of satisfaction I |= T and I |= A are also as usual. We make the standard name
assumption (SNA), i.e., aI = a for all I and individuals a. For combining the open- and
closed-world semantics, we enrich KBs with a set Σ of closed predicates. That is, we
consider knowledge bases (KBs) K = (T , Σ, A), where T is a TBox, Σ ⊆ NC ∪ NR ,
and A is an ABox. We call Σ the set of closed predicates in K. For such a K and an
interpretation I, we write I |= K if
(a) I |= T and I |= A,
(b) for all concept names A ∈ Σ, if e ∈ AI , then A(e) ∈ A, and
(c) for all roles r ∈ Σ, if (e, e0 ) ∈ rI , then r(e, e0 ) ∈ A.
In case Σ = ∅, K is boils down to a usual DL KB and I |= K captures the usual notion
of satisfaction. In this case, we may simply write K = (T , A).
    Note that in this paper KBs with closed predicates have the semantics as in [19],
which relies on the SNA. However, all complexity results of this paper can be recast for
the semantics of [7] that employs a weaker form of unique name assumption.

3    Standard Reasoning
Interpreting some predicates as closed allows one to simulate negation, disjunction, and
nominals in plain EL, making it as expressive as full ALCO.
Theorem 1. Assume a consistent ALCO KB K = (T , A). For every nominal {a} that
appears in K, let Da be a fresh concept name. Then we can construct in linear time an
EL KB K0 = (T 0 , Σ, A0 ) with closed predicates such that:
 1. Every model I of K0 is a model of K. Moreover, DaI = {a}I for every {a} in K.
 2. Every model I of K can be transformed into a model of K0 by modifying the
    interpretation of concept names and roles that do not appear in K.
Proof. The extension of EL with atomic negation (i.e., negation is applied to concept
names only), denoted EL¬ , is known to be a notational variant of ALC [1]. Hence we
simply show how to construct K0 = (T 0 , Σ, A0 ) for a given ELO¬ KB K = (T , A).
We do this in two steps: first we eliminate nominals and obtain from K = (T , A) an
EL¬ KB K1 = (T1 , Σ1 , A1 ). Then we transform K1 = (T1 , Σ1 , A1 ) into the desired
K0 = (T 0 , Σ, A0 ) in EL. Let
    Σ1 = {Da | {a} appears in K},           A1 = A ∪ {Da (a) | {a} appears in K}.
    This ensures that DaI = {a}I in every model of A1 where the predicates in Σ1 are
interpreted as closed. Hence we can simply replace each concept {a} by Da in T to
obtain the desired T1 . Next, for eliminating negation from (T1 , Σ1 , A1 ), we let
Σ = Σ1 ∪ {D⊥ , D1 , D2 , Du }          A0 = A1 ∪ {Du (a1 ), Du (a2 ), D1 (a1 ), D2 (a2 )}
To obtain T 0 from T1 , we replace every negated concept name ¬A by a fresh concept
name Ā, and add the following axioms, where rA is a fresh role name:
        A u Ā v D⊥             (1)           ∃rA .D1 v A         (3)
               > v ∃rA .Du         (2)             ∃rA .D2 v Ā        (4)
Note that, since there are no assertions of the form D⊥ (a) in A0 , we have D⊥   I
                                                                                   = ∅ in
                       0
every model I of K , and hence D⊥ behaves as the special concept ⊥. This and axiom
(1) ensure disjointness of A and Ā, while axioms (2) – (4) together with the assertions
for the closed predicates Du , D1 and D2 ensure that e ∈ AI or e ∈ (Ā)I for every
e ∈ ∆I . Indeed, let I be a model of K0 and let e ∈ ∆I be arbitrary. Then by axiom (2)
there exists e0 ∈ ∆I such that (e, e0 ) ∈ rI and e0 ∈ DuI . But since Du is closed, then
e0 ∈ {a1 , a2 }. If e0 = a1 , then e0 ∈ D1I , so e ∈ (∃rA .D1 )I and e ∈ AI by axiom (3).
Analogously, if e0 = a2 , we can use axiom (4) to infer that e ∈ (Ā)I . This ensures that
Ā has exactly the same extension as ¬A in every model of K, and the claim follows.
    This easy reduction allows us to extend to EL hardness results known for ALC. We
can also infer upper bounds for logics with closed predicates, from known results under
the standard open-world semantics, by exploiting the fact that in DLs that contain ALC,
closed predicates can be simulated using nominals (see [26] and Prop. 1 in [30]):
Theorem 2. For every DL KB (T , Σ, A) there exists a logically equivalent KB of the
form (T ∪ T 0 , ∅, A), where T 0 is a set of ALCO axioms whose size is polynomially
bounded by the size of A.
    This already allows us to obtain an almost complete picture of the landscape for
standard reasoning tasks. The following theorem is stated for KB satisfiability, but note
that it also applies to other traditional reasoning tasks like subsumption and instance
checking since they are mutually reducible to each other in ALC, and hence also in any
logic containing EL with closed predicates.
Corollary 1. The following bounds for deciding satisfiability of (T , Σ, A) hold:
 1. E XP T IME-complete if T and A are in any DL containing EL and contained in
    SHOQ or SHOI.
 2. NE XP T IME-complete if T and A are in any DL containing ELIF and contained in
    SHOIQ.
 3. N2E XP T IME-complete if T and A are in SRIQ or SROIQ, and in 2E XP T IME
    if T and A are in SROQ or SROI.
Proof. The lower bound of item (1) follows from Theorem 1 and the well known
E XP T IME-hardness of KB satisfiability in ALC [29]. Similarly, the hardness of items (2)
and (3) follows from Theorem 1 together with the NE XP T IME-hardness of ALCOIF
[31], and the N2E XP T IME-hardness of SROIQ [13]. For the upper bounds, we use
Theorem 2 and the fact that KB satisfiability is decidable in the mentioned bounds for
the listed logics: SHOQ and SHOI in E XP T IME, SROQ and SROI in 2E XP T IME,
SHOIQ in NE XP T IME, SROIQ in N2E XP T IME (see [4,13] and references therein).
4   Query Answering

In this section we consider the query answering problem over DL KBs. We cannot easily
transfer complexity upper bounds from known KB satisfiability since, in general, queries
are not naturally expressible in the syntax of DLs, and encoding them as part of a KB
usually requires exponential space. We focus in conjunctive queries (CQs), whose syntax
and semantics are defined in the usual way. In a nutshell, a CQ is a conjunction of atoms
of the forms A(x) or r(x, y), for a concept name A or a role name r, and variables x, y.
In what follows, all queries are Boolean queries with all variables existentially quantified.
The decision problem we consider is query entailment: deciding whether a given query
q is true in all the models of a given KB (T , Σ, A).
    It is well known that, under the classical open-world semantics, CQ entailment is
hard for 2E XP T IME in most expressive DLs, but the complexity drops to E XP T IME in
Horn fragments that disallow disjunction. Unfortunately, since the presence of closed
predicates causes disjunction to be expressible, 2E XP T IME-hardness extends to many ex-
tensions of EL. For CQs, 2E XP T IME-hardness can be shown whenever the DL supports
inverse roles [17], or a single left-identity axiom r ◦ t v t, or a transitive super role of
some role [6]. If we consider query languages that extend CQs, like positive queries or
(fragments of) conjunctive (2-way) regular path queries (C(2)RPQs), the same hardness
holds already for plain ALC [25], and hence for EL with closed predicates.
    Below we denote by ELLI the extension of EL with a single left-identity axiom
r ◦ t v t, and by ELHtrans the extension of EL with role inclusions and a transitive role.
Note that both logics are sublogics of EL++ .

Theorem 3. Deciding (T , Σ, A) |= q is hard for 2E XP T IME in all the following cases:

 1. T and A are in ELI and q is a CQ.
 2. T and A are in ELLI or in ELHtrans , and q is a CQ.
 3. T and A are in EL and q is either a positive query, a ∗-free CRPQ, or a ∗-free
    C2RPQ with only two variables.

Proof. We have shown in Theorem 1 that for every ALC KB K there is an EL KB
K0 that has essentially the same models, and may only differ in the interpretation of
symbols not occurring in K. Hence, for every query q, we have K |= q iff K0 |= q. This
translation can be applied to extensions of ALC, and results in a KB with the same
properties in the analogous extension of EL. In particular, an ALCI KB is rewritten into
a ELI one, and an SH KB into an ELHtrans one. From this an existing results for ALC
and its extensions, we obtain the desired lower bounds: item 1 follows from [18], item 2
follows from [6], and item 3 follows from [25].

   Matching upper bounds are known, even for significantly more expressive queries
and logics: in the standard setting, with no closed predicates, entailment of positive
two-way regular path queries (P2RPQs) is in 2E XP T IME for any DL contained in ZIQ,
ZOQ, ZOI, SHIQ, SHOQ, or SHOI [4]. From this and Theorem 2, we get the
same upper bound for ZOQ, ZOI, SHOQ, SHOI and their sublogics.
Corollary 2. Let q be a P2RPQ. Then deciding (T , Σ, A) |= q is 2E XP T IME complete
for T and A in any DL containing EL and contained in ZOQ, ZOI, SHOQ, SHOI.
The same holds for q a CQ if T and A are in a DL containing ELI or ELLI .
    Corollary 2 implies that query entailment in the presence of closed predicates is
almost always 2E XP T IME-complete in combined complexity. But there are some excep-
tions. On the one hand, the interaction of nominals, inverses, and counting makes query
entailment very challenging. In the plain open-world setting, entailment of conjunctive
queries is coN2E XP T IME-hard for ALCOIF [10], and it has been shown to be decid-
able [28], but no elementary upper bounds on its complexity are known. For its extension
with transitive roles, and for the well known SHOIQ, decidability remains open. Hence,
in the presence of closed predicates, we do not get any interesting upper bounds for DLs
that simultaneously support inverses and counting, like ALCIF and SHIQ. Moreover,
obtaining such bounds seems very hard. We remark that the authors of [7] proved that
query entailment in ALCIF with closed predicates and query entailment under the
standard open-world semantics in ALCOIF are mutually reducible.
    On the other hand, the mentioned 2E XP T IME lower bounds for CQs require the
presence of either inverse roles, left identity axioms, or transitivity and role hierarchies.
For EL (with closed predicates), ALC, and their extensions that have neither inverses
nor left identities, we only have the E XP T IME lower bound from KB satisfiability, and
the 2E XP T IME upper bound of Corollary 2. Without closed predicates, CQ entailment
for plain ALC, and even for ALCHQ, is feasible in single exponential time [18,24]. A
natural question is whether nominals, or equivalently, closed predicates, can be added to
ALCHQ without increasing the worst-case complexity of CQ entailment. Unfortunately,
the answer is negative (unless coNE XP T IME = E XP T IME), as we show next.

A coNE XP T IME lower bound for CQ entailment in ALCO
In this section, we show that deciding whether (T , A) 6|= q for a given CQ q and a
given ALCO KB (T , A) (with the standard open-world semantics), is hard for non-
deterministic single exponential time. By Theorem 1, the same applies to EL in the
presence of closed predicates.
     Before we start with the proof, we recall a useful property of ALCO: for query
answering, it is enough to focus on forest-shaped models. A forest is a set F of non-
empty words such that w · c ∈ F with w non-empty implies w ∈ F . An interpretation I
is forest-shaped if there is a bijection f from its domain to a forest, such that
  – f (aI ) has length one for every individual a, and
  – (e, e0 ) ∈ rI implies that either e0 = a for some individual a, or f (e0 ) is of the form
     f (e) · c for some symbol c.
     For many DLs and query languages, it has been shown that query entailment can be
decided over forest shaped interpretations. This applies also to CQs over ALCO KBs:
Lemma 1 ([9,4]). Let K be a given ALCO KB and let q be a CQ. Then K 6|= q iff there
is a forest shaped interpretation I such that I |= K and I 6|= q.
   Now we show our lower bound. The proof is by reduction from the following
coNE XP T IME-complete variant of the tiling problem:
Definition 1 (Domino System). A domino system D is a triple (T, H, V ), where
T = {0, . . . , k − 1}, k ≥ 0, is a finite set of tile types and H, V ⊆ T × T rep-
resent the horizontal and vertical matching conditions. Let D be a domino system
and c = c0 , . . . , cn−1 an initial condition, i.e. an n-tuple of tile types. A mapping
τ : {0, . . . , 2n+1 − 1} × {0, . . . , 2n+1 − 1} → T is a solution for D and c iff for all
x, y < 2n+1 , the following holds (where ⊕i denotes addition modulo i):
  – if τ (x, y) = t and τ (x ⊕2n+1 1, y) = t0 , then (t, t0 ) ∈ H
  – if τ (x, y) = t and τ (x, y ⊕2n+1 1) = t0 , then (t, t0 ) ∈ V
  – τ (i, 0) = ci for i < n.

    For the reduction, we do not need a full ALCO knowledge base, but a simple ABox
with single concept assertion CD,c (a) for a complex ALCO concept CD,c . We show
how to translate a given domino system D and initial condition c = c0 · · · cn−1 into an
assertion CD,c (a) and query qD,c such that each forest-shaped model I of CD,c (a) that
satisfies I 6|= qD,c encodes a solution to D and c, and conversely each solution to D and
c gives rise to a model of CD,c (a) with I 6|= qD,c .
    Our reduction is based on the proof of coNE XP T IME-hardness of rooted query
entailment in ALCI [17], and also resembles the similar proof for S [6]. In fact, the first
part of the concept CD,c , which generates forest models that encode a potential solutions,
is essentially as in [17]. The second part and the query are quite different, since they
exploit nominals to detect errors in potential solutions.
Constructing the ABox. We now define the complex concept CD,c , and the desired
                                                                              1       2
ABox is {CD,c (a)}. We assume that CD,c is a conjunction of the form CD,c         u CD,c  .
                                                                            1
    For convenience, let m = 2n + 2. The purpose of the first conjunct CD,c is to enforce
a binary tree of depth m, whose edges are labeled with a single role r, and whose leaves
are labeled with the numbers 0, . . . , 2m − 1 of a binary counter C, implemented using
concept names B0 , . . . , Bm . Intuitively, each of these leaves ` stores a position in the
2n+1 × 2n+1 -grid to be tiled: the bits B0 , . . . Bn encode the horizontal position x, and
the bits Bn+1 , . . . Bm encode the vertical position y. We also use a concept name Di for
each tile type i ∈ T . Each leaf g storing a position (x, y) has as r-children three ‘grid
nodes’ gh , gright , and gup labeled G, which satisfy all the following conditions:

 1. gh represents the grid node with position (x, y), and stores the same bit address as g
    (that is, gh and g coincide on the interpretation of all Bi ).
 2. gright and gup represent the right- and up-neighbor of g, and respectively store the
    addresses (x ⊕2n+1 1, y) and (x, y ⊕2n+1 1).
 3. gh is labeled Gh , while gright and gup are labeled Gs .
 4. gh (resp., gright , gup ) satisfies exactly one concept Di , representing the assigned tile
    type τ (x, y) (resp., τ (x ⊕2n+1 1, y), τ (x, y ⊕2n+1 1)).
 5. The tiling of the gh nodes respects the initial condition, that is, if `h stores the
    position (i, 0), then it satisfies Dci .
 6. The tiling of gright and gup respect the matching conditions, that is, if gh satisfies Di ,
    gright satisfies Dj , and gup satisfies Dj 0 , then (Di , Dj ) ∈ H and (Di , Dj 0 ) ∈ V .

    The tree we have described almost describes a solution for D, except for the crucial
fact that different copies of the same node in the grid may have different types assigned.
That is, for an address (x, y), the gright and gup nodes with address (x, y) need not
                                                                         1
satisfy the same Di as the gh with address (x, y). We call a model I of CD,c (a) proper
if it satisfies the following condition:
(?) For every pair g ∈ GIh , g 0 ∈ GIs such that g ∈ Bi iff g 0 ∈ Bi for all 0 ≤ i ≤ m,
    there exists some i < k such that {g, g 0 } ⊆ DiI .
                                      1
    We can use an ALC concept CD,c        to enforce a tree as above, such that deciding the
                                                                                  1
existence of a solution for D and c reduces to finding a proper model of CD,c          . Such
constructions exist in the literature, and in fact, the concept we described is just a minor
                                    1              6
modification of the conjunction CD,c    u · · · u CD,c given in [17], hence we omit its rather
technical definition. Instead, we rely on the following claim:

Lemma 2 (implicit in [17]). Let D be a domino system and c an initial condition. Then
                                1
we can build an ALC concept CD,c   such that there exists a solution for D and c iff there
                            1                                1
exists a proper model of CD,c (a). Moreover, the size of CD,c    and the time needed to
construct it are polynomially bounded by the size of D.
                                                                                  1
     We construct below a query qD,c that does not match a forest model of CD,c       (a) iff
(?) is satisfied. By Lemmas 2 and 1, this suffices to decide whether there exists a solution
                                                                           2
for D and c. But before defining qD,c , we define the second conjunct CD,c     of CD,c . Its
                                                               1
purpose is to add nodes and labels to the forest models of CD,c (a) that allow us to test
for (?) using a CQ.
                     2
     For defining CD,c , we use the following additional alphabet symbols:
  – two individual names ai and āi and one concept name Ai for each bit Bi , 0 ≤ i ≤ m,
  – an individual name tj for each tile type j < k,
  – a concept T stating that some individual stands for a tile type.
    Each G node g is linked via r-arcs to the individuals ai , āi that encode its bit address.
We also link g nodes to the individuals that stand for the tile types, but we do it differently
for the Gh nodes and the Gs nodes, as follows:
  – If g is a Gh -node with tile type Di , then g has an r-arc to ti .
  – If g is a Gs -node with tile type Di , then g has an r-arc to each tj with j 6= i.
Finally, we make both ai and āi instances of Ai , for each bit i, and we make all tile
types tj instances of T . Formally, this is all ensured using the conjunction C12 u C22 of
the following two concepts:
                        l
      C12 = ∀rm+1 .            Bi → ∃r.{ai } u ¬Bi → ∃r.{āi } u
                       0≤i≤m
                       l                                               l                   
                             Di → (Gh → ∃r.{ti }) u (Gs → (                     ∃r.tj ))
                     0≤i