Signature-Based ABox Abduction in ALC is Hard?

                                     Patrick Koopmann

      Institute for Theoretical Computer Science, Technische Universität Dresden, Germany


        Abstract. In ABox abduction, we are given a knowledge base together with an
        ABox—the observation—that is not logically entailed by the knowledge base,
        and are looking for another ABox—the hypothesis—that, when added to the
        knowledge base, would make the observation entailed. This has applications in
        explaining negative entailments and missing query answers, as well as in diag-
        nosis. To get useful hypotheses, in signature-based abduction, we additionally
        provide a signature of abducible names, and require the hypothesis to use only
        names from that signature. In the variant we are considering, the hypothesis may
        otherwise use fresh individual names, as well as complex concepts constructed
        in arbitrary way using the names in the signature. It was recently shown that this
        variant of abduction is in N2E XP T IME NP , and that hypotheses may require con-
        cepts that are of triple exponential size. We complement those results by showing
        a matching N2E XP T IME NP lower bound, and show that in the worst case, hy-
        potheses may also use a double exponential number of fresh individual names.


1 Introduction

Since inferences performed by description logic reasoners can be complex, and real on-
tologies often contain 10,000s of axioms, explanations of description logic reasoning
has since longer been in the focus of research [26]. In particular for explaining posi-
tive entailments, that is, entailments that hold for a given knowledge base, there is a
plethora of research, mostly focused on using justifications [33,4,30,17], but recently
also on using proofs [1,2]. Explaining negative entailments, that is, entailments that do
not hold for a given knowledge base, has been less in the focus of attention. Here, com-
mon approaches are showing the user a counter example in form of a description logic
interpretation [5,3], or using abduction [29,23]. In abduction, one is given a knowledge
base and an observation, a formula that is not logically entailed by the knowledge base,
and is interested in finding a missing piece in the knowledge base that would make the
observation logically entailed, called a hypothesis for the observation. Abduction can
be used in different ways to explain negative entailments to ontology users [23,7,8] and
to repair missing entailments [34]. Another application of abduction is diagnosis: here,
the observation describes symptoms, for instance of a medical condition or a faulty ma-
chine [27,8], and the hypothesis is supposed to give a possible explanation for how the
symptoms came into place.
?
    This work was supported by the DFG in grant 389792660 as part of TRR 248.
    Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons Li-
    cense Attribution 4.0 International (CC BY 4.0).
62      Patrick Koopmann

   To illustrate, we reuse a simplified example from [21] from the geology domain. The
observation is that the ground under a street has become unstable: {Unstable(street)},
and the background knowledge involves the following axioms:
     1. EvaporiteFormation u ∃borders.(WaterWay u ¬∃lining.WaterProof)
                v ∃affectedBy.Dissolution
     2. EvaporiteFormation u ∃affectedBy.Dissolution v ∀above.Unstable
     3. (Waterway t Street) u EvaporiteFormation v ⊥
     4. Waterway(canal)          5. Street(street),
These state: 1. an evaporite formation bordering to a waterway without waterproof lin-
ing is affected by dissolution; 2. if an evaporite formation is affected by dissolution,
then everything above it is unstable; 3. waterways and streets are different from evapor-
ite formations; 4. canal is a waterway and 5. street is a street. A meaningful hypothesis
could then be the following, which uses e as a fresh individual referring to a possible
unknown formation under the street, as well as the complex concept ∀lining.⊥:
                  H = { EvaporiteFormation(e), above(e, street),
                          borders(e, canal), ∀lining.⊥(canal) }
    Depending on whether observation and hypothesis consist of concepts, facts, termi-
nological axioms or knowledge bases, one distinguishes between concept abduction [6],
ABox abduction [9,8,32,31,7,12,10,16,19], TBox abduction [11,34,15,14], and knowl-
edge base abduction [24,13]. In this paper, we focus on ABox abduction, where, as in
the above example, both the observation and the hypothesis consist of a set of facts
about some set of individuals, a variant especially suited for diagnosis and for generat-
ing counter examples as in [23].
    To avoid trivial answers (such as the observation itself), usually, the abduction prob-
lem is specified with additional constraints on the hypothesis. Those may be purely of
logical nature [13], or based on syntactic properties. For instance, in [11], the user
specifies syntactical patterns of axioms that the hypothesis must conform to. Another
approach is to fix a finite set of abducible axioms—either directly as part of the in-
put [31,8], or indirectly by posing strict conditions on the shape of axioms, such as
being flat ABox assertions using only names and individuals from the input [32,10].
However, such a restriction might not be feasible in applications where we do not know
the exact shape of the axioms we are looking for, for instance because they involve un-
known individuals or complex concepts as in the example above. Another approach is
to instead specify a set of abducible names or predicates, from which hypotheses can
then be constructed in arbitrary fashion using the constructors of the respective descrip-
tion logic. In the context of diagnosis, this set would contain names that are connected
to causes of symptoms rather than the symptoms themselves, specific terms rather than
generic categories, and possibly describing aspects that can be verified to check the hy-
pothesis. In the example above, examples for names that would not be included would
be Unstable (because it is about the observation, not about the cause), Waterway and
Street (because we know all waterways and streets in the area), affectedBy and Dis-
solution (because they are too unspecific, and we are not looking for the processes, but
the objects that caused the street to become unstable.)
                            Signature-Based ABox Abduction in ALC is Hard              63

    This form of abduction, which we call signature-based abduction, has been theo-
retically investigated for DL-Lite in [7], and for more expressive DLs in [21]. There
is also a practical implementation for signature-based ABox abduction [9] and one for
signature-based KB abduction [24]. These two practical implementation reduce abduc-
tion to uniform interpolation via (extensions of) the tool L ETHE [20], which can lead to
triple exponentially large solutions [25]. As shown in [21], this is indeed in theory un-
avoidable. However, some crucial questions were left open in [21]: for the correspond-
ing decision problem whether there exists a hypothesis for a given abduction problem,
using ALC as description logic and allowing for both fresh individuals and complex
concepts in the hypothesis, only an upper bound of N2E XP T IME NP is provided, while
the lower bound is left open. Furthermore, for the number of fresh individuals one may
need to introduce in the worst case, also only an upper bound was developed. In this
paper, we show that those bounds are indeed tight, by proving that the general version
of the signature-based ABox abduction problem for ALC is N2E XP T IME NP -complete,
and that hypotheses for those problems may require a number of fresh individuals that
is double exponential in the size of the abduction problem.


2 Preliminaries

We recall the description logic ALC and define the signature-based ABox abduction
problem formally.
    Let NC , NR and NI be pair-wise disjoint, countably infinite sets of respectively con-
cept, role and individual names. A signature is a finite set Σ ⊆ NC ∪ NR . Concepts are
then built according to the following syntax rule, where A ∈ NC and r ∈ NR :

                 C := > | ⊥ | A | ¬C | C t C | C u C | ∃r.C | ∀r.C

     A TBox is a finite set of general concept inclusion axioms (GCIs) of the form C v D
and concept equivalence axioms of the form C ≡ D, where C and D are concepts. An
ABox is a finite set of concept assertions of the form C(a) and role assertions of the
form r(a, b), where C is a concept, r ∈ NR and a, b ∈ NI . A knowledge base (KB) K
is a union of a TBox and an ABox, and thus generalises both notions. The contents of a
KB are called axioms. For a concept/axiom/KB E, we denote its signature, that is, the
concept and role names that occur in E, by sig(E).
     The semantics of KBs is defined in terms of interpretations. An interpretation I is
a tuple h∆I , ·I i of a countable but possibly infinite domain ∆I and an interpretation
function ·I that assigns to each individual name a ∈ NI a domain element aI ∈ ∆I ,
to each concept name A ∈ NC a set AI ⊆ ∆I , and to each role r ∈ NR a relation
rI ⊆ ∆I × ∆I . The interpretation function is extended as follows to concepts:

                  >I = ∆ I        ⊥=∅         (¬C)I = ∆I \ C I
           (C u D)I = C I ∩ DI          (C t D)I = C I ∪ DI
             (∃r.C)I = {d ∈ ∆I | there is hd, ei ∈ rI s.t. e ∈ C I }
             (∀r.C)I = {d ∈ ∆I | for every hd, ei ∈ rI we have e ∈ C I }.
64      Patrick Koopmann

Such an interpretation I satisfies an axiom α, in symbols I |= α if α = C v D and
C I ⊆ DI , α = C ≡ D and C I = DI , α = C(a) and aI ∈ C I , or α = r(a, b) and
haI , bI i ∈ rI . If I satisfies all axioms in a KB K, we write I |= K and call I a model
of K. If a KB K has no model, we say it is inconsistent and write K |= ⊥. If an axiom α
is satisfied in every model of a KB K, we write K |= α and say α is entailed by K.
    We are now ready to define the signature-based ABox abduction problem for ALC.
Definition 1. A signature based ABox abduction problem is given by a tuple A =
hK, Φ, Σi of a KB K called the background knowledge, an ABox Φ called the observa-
tion, and a signature Σ of abducible names, for which we want to determine whether
there exists a hypothesis H, which is an ABox that satisfies the following conditions:

A1 K ∪ H 6|= ⊥,                A2 K ∪ H |= Φ, and             A3 sig(H) ⊆ Σ.

Intuitively, the hypothesis should not contradict what we know (Condition A1), it should
be sufficient to explain the observation (Condition A2), and it should only be con-
structed using the abducible names (Condition A3). Apart from that, we put no further
restriction on H: it may use individual names that occur neither in K nor in Φ (fresh in-
dividuals), and it may contain assertions C(a) where C is a complex concept, arbitrarily
composed using the constructors of ALC and the names in Σ.
    Regarding the corresponding decision problem, the following was shown in [21]:
Theorem 1. Existence of hypotheses for signature-based ABox abduction problems is
in N2E XP T IME NP .
However, a corresponding lower bound was not yet provided, which is the contribution
of this paper. In addition, [21] provided the following bounds on the size of hypotheses.
Theorem 2. There exists a family of signature-based ABox abduction problems for
which every hypothesis is of size triple exponential in the size of the problem. Further-
more, for every signature-based ABox abduction problem, if there exists a hypothesis,
then there exist one that is of size at most triple exponential in the size of the problem.
The family of abduction problems used for the lower bound is such that only a single
assertion is needed for the hypothesis. Therefore, an open problem remained how many
fresh individuals may be required in a hypothesis. Inspection of the proof used for the
upper bound of Theorem 2 gives at least an upper bound for this.
Theorem 3. For every signature-based ALC-ABox abduction problem, if there exists
a hypothesis, then there exist one that uses a number of fresh individual names that is
at most double exponential in the size of the abduction problem.
Proof. This is a consequence of the following lemmas from [21]: 1. every hypothesis
can be converted into a so-called hypothesis abstraction (Lemma 1), 2. every hypothesis
abstraction can be restricted to refer to at most double exponentially many individual
names (Lemma 2), and 3. those bounds are preserved when translating the hypothesis
abstraction back into a hypothesis (Lemma 3).                                        t
                                                                                     u
We will also show that this bound is tight: in our reduction, we create a class of
signature-based abduction problems whose hypotheses always need a double exponen-
tial number of fresh individuals.
                             Signature-Based ABox Abduction in ALC is Hard              65

3 Overview of the Reduction
To prove hardness for N2E XP T IME NP , we provide a reduction from the word problem
for non-deterministic, double exponential Turing machines with NP oracle. Specifically,
given such a Turing machine M and a word w, we will construct an abduction problem
AM,w in polynomial time s.t. M accepts w iff there exists a hypothesis for AM,w .
The individual names in this hypothesis will be organised into a N × N × N -cube,
                             n
where N = {0, . . . , 2n · 22 }, and n is polynomial in the size of w. That is, for every
hx, y, zi ∈ N × N × N , there will be a corresponding individual ax,y,z . The individuals
on one side of the cube, that this, those of the form ax,y,0 , will encode an accepting
configuration history for the Turing machine. The remaining individuals supply the
“guessing space” for the oracle via the models of the hypothesis. Specifically, for a
fixed y, the individuals ax,y,z will be such that they cannot possibly encode a successful
computation history for the oracle in any model of the hypothesis and the background
knowledge.
    Our abduction problem is composed of different components, that we define as
separate abduction problems:
 1. we reuse a construction from [21] to obtain three double exponential counters;
 2. those are used to create the different individuals for all coordinates in N × N × N ;
 3. we then enforce that every hypothesis forms a cube in which every coordinate is
    represented by exactly one individual;
 4. finally, we make sure that hypothesis corresponds to a computation history of the
    Turing machine as described above.


4 The Double Exponential Counters
We encode numbers with 2n bits using chains of 2n individuals, where each individual
is an instance of the concept name Bit if it represents a bit with value 1, and otherwise
an instance of ¬Bit. An abduction problem that exactly provides this is constructed
in the proof for Theorem 5 in [21,22] to show the triple exponential lower bound on
the size of hypothesis with complex concepts. This abduction problem is of the form
Ac = hKc , Goal(a), Σc i, where {r, Bit, B} ⊆ Σc . Kc is such that, for any individual
name b, the only way to entail Goal(b) using only names from Σc is by enforcing that
every path of r-successors starting from b encodes a 2n -bit counter as required, starting
                    n
from a value of 22 − 1 and counting down to 0, ending at an individual satisfying B.
(The construction also increments this counter along another role s, which is however
not relevant for our purposes here.) This also means that no such path can contain any
                                                                    n
cycles or shortcuts before reaching the instance of B after 2n · 22 r-successors.
     We will need three such double exponential counters, for which purpose we rename
all names in Ac to obtain the three signature-disjoint variants Ax = hKx , Goalx (a), Σx i,
Ay = hKy , Goaly (a), Σy i and Az = hKz , Goalz (a), Σz i. Specifically, these use now
roles rx , ry and rz to connect the individuals in the counting sequence, the concept
names Bitx , Bity and Bitz for the respective bit values, and Bx , By and Bz to mark
the end points of the counters. All counters count backwards: that is, they start at the
maximal value, and then step-wise decrease along the role-successors.
66       Patrick Koopmann

5 Creating the Coordinates
To generate all coordinates in our cube, we now need to connect those abduction prob-
lems appropriately. Specifically, we need the counter for a coordinate d ∈ {x, y, z}
to start not only at the individual name a (the root of the cube), but also at any other
individual name along the corresponding side of the cube.
    For d ∈ {x, y, z}, we use a concept name Maxd to mark individuals that are at the
beginning of the sequence for the d-counter (specifically: at the last bit of the maximal
counter value). This is established using the following axioms:
                                          l
              Maxd (a)       Maxd v               ∀re .Maxd u ∀rd .¬Maxd
                                        e∈{x,y,z}\d
                                           l
                              ¬Maxd v               ∀re .¬Maxd
                                        e∈{x,y,z}

    Intuitively, the counter for the coordinate d should be initialised at all points where
Maxd is satisfied. However, we cannot initialise the counters directly, but need to use the
respective abduction problem. Here, we use that Goald (b) is only entailed for individual
names b from which every path along rd corresponds to a double exponential counter
for d. To make sure all counters are initialised at the right positions, we use another
concept name AllStarted that requires that all reachable individuals that satisfy Maxd
also satisfy Goald . Recall that every counter for d reaches its end at a domain element
satisfying B.

                                                       Bx t By t Bz v AllStarted
           l         l
                              ∃rd .(AllStarted u (¬Maxe t Goale )) v AllStarted
        d∈{x,y,z} e∈{x,y,z}

     The concept name Coords now requires all counters to have started:

                   AllStarted u Goalx u Goaly u Goalz v Coords.

Set Kcoord to be the union of Kx , Ky , Kz and these new axioms. The abduction problem

                     Acoord = hKcoord , Coords(a), Σx ∪ Σy ∪ Σz i

now produces the required counters in its hypotheses, reachable by the corresponding
paths of rx , ry and rz -successors.


6 Enforcing the Grid Shape
So far, our abduction problem would still allow for a solution with just a single indi-
vidual, which would produce the counters using a complex concept. This changes with
the next extension, which makes sure that hypotheses need to use a number of fresh
individuals that is double exponential in the size of the input, and that those individuals
are organised into a three-dimensional grid via the three roles rx , ry and rz . Rather than
                             Signature-Based ABox Abduction in ALC is Hard              67

enforcing this grid shape directly, we define a concept to detect whether a hypothesis is
not of the desired grid shape, and then require for the observation to be entailed that the
root individual is not an instance of this concept.
    For every two d, e ∈ {x, y, z} st. d 6= e, we add the following axiom

                       NonSquare v ∃rd .∃re .D u ∃re .∃rd .¬D,

NonSquare is only satisfiable by domain elements for which there is a path along rd -re
that ends in a different domain element than a path along re -rd . As in an interpretation,
there may be more rd and re successors than specified in the ABox, we use the follow-
ing axioms to make sure NonSquare cannot be satisfied by an individual in an ABox
in which the next two rd and re -successors form a square.

                 ∃rd .D ≡ ∀rd .D                        ∃re .D ≡ ∀re .D
            ∃rd .∃re .D ≡ ∀rd .∃re .D              ∃re .∃rd .D ≡ ∀re .∃rd .D

The only way to entail ¬NonSquare(b) for an individual b, and without using D or
NonSquare, is now by providing an rd -re -path and an re -rd -path that both end on
the same individual, and thus by creating one cell of a grid. Next, we define the con-
cept NonGrid to detect appearances of NonSquare anywhere on a path between the
root element and the end of our paths marked by the concept names Bx , By and Bz .
Specifically, for all d ∈ {x, y, z}, we add:

              NonGrid v ¬(Bx t By t Bz ) u (NonSquare t ∃rd .NonGrid)
        ∃rd .¬NonGrid v ∀rd .¬NonGrid

The first axiom ensures that, if a domain element is in NonGrid, then there must be a
path to some element in NonSquare that does not go through Bx t By t Bz . Together
with the first axiom, the second axiom makes sure that, once one path fails to satisfy
NonGrid at some point, all predecessors will fail to do so as well, unless NonSquare
is already satisfied. Consequently, to satisfy ¬NonGrid, we have to ensure that on all
paths, ¬NonSquare remains satisfied before the paths reach Bx t By t Bz . The only
way to do so is by organising the role successors into a grid as required.
    The final axiom to be added is the following:

                              Coords v NonGrid t Grid

Every instance of Coords is either in NonGrid or Grid. To entail Grid(a), we need to
entail Coords(a) and ¬NonGrid(a), which is only possible by having all coordinates
organised into a cube.
    Denote the resulting KB with all axioms mentioned until now by Kgrid . Our abduc-
tion problem for this section is defined as Agrid v= {Kgrid , Grid(a), Σx ∪Σy ∪Σz }. As
now every hypothesis needs an individual for every coordinate in the cube, we obtain
the following theorem.
Theorem 4. There exists an unbounded family of ALC ABox abduction problems such
that every hypothesis uses a number of fresh individuals that is at least double expo-
nential in the size of the abduction problem.
68      Patrick Koopmann

7 Encoding the Turing machine
It remains to encode the Turing machine to finish the reduction. Let
                            Mo = hQo , Σo , Γo , ∆o , qo0 , b, Fo i
be a non-deterministic Turing machine with states Qo , input alphabet Σo , tape alpha-
bet Γo , transition relation ∆o ⊆ (Qo × Γo ) × (Qo × Γo × {−1, +1}), initial state qo0 ,
blank symbol b and accepting states Fo . We assume that there exists a polynomial p
such that for input words w, Mo stops after at most p(|w|) steps. Furthermore, let
                            M = hQ, Σ, Γ, δ, q0 , q? , b, F, Mo i
be a non-deterministic Turing machine that uses Mo as oracle, where Q is the set of
states, Σ the input alphabet, Γ the tape alphabet,
                  ∆ ⊆ (Q × Γ × Γo ) → (Q × Γ × Γo × {−1, +1})
the transition relation that now operates on two tapes (the work tape and the oracle tape),
q0 the starting state, q? the oracle query state, q− the oracle answer state, and F the set
of accepting states. The definition of runs and accepting runs is as usual, however, for
convenience, we define it with two differences: 1) our tape head is always on the same
position for the work tape and the oracle tape, and 2) when in the oracle state, the Turing
machine M goes into state q− if the oracle rejects the current content of the oracle
tape, and otherwise M rejects the input (instead of going into a special state reserved
for positive query answers). That 1) is without loss of generality can be seen using a
standard trick for reducing multi tape machines to single tape machines: specifically, to
simulate a different tape head for the oracle tape, we introduce a “marked” tape symbol
γT for every tape symbol γ in the original TM, and use those marked symbols to mark
the virtual position of the head on the oracle tape, and similarly for the work tape. In
order to move the two tapes in different positions, we just move back and forth between
the cells on the tapes that are marked in this way. This is possible with only quadratic
time overhead.
    For Difference 2), we use the following argument: assume we have a Turing ma-
chine that, based on the result of the oracle, would go either into a state q+ (if the oracle
accepts) or into a state q− (if the oracle rejects). We can simulate this behaviour with
our type of Turing machine by first guessing the result of the oracle. If we guess for
a positive answer, we verify it without the oracle (since the Turing machine is non-
deterministic), if we guess for a negative answer, we use the oracle to verify this.
    We assume that there is a polynomial q s.t. for words w on the input, M stops
                         steps, that is, it accepts a language in N2E XP T IME NP . Fix one
                  q(|w|)
after at most 22
such input word w. We choose our n used in the previous sections s.t. n ≥ p(q(|w|)),
this way ensuring that n bounds the length of accepting runs, as well as the maximal
length of the tape content, both for M and the oracle machine Mo . Our reduction will
now proceed in such a way that for the resulting abduction problem, every hypothesis
will encode a successful run on the side of the cube for which the z-coordinate is 0:
specifically, we will represent the content of the two tapes of M along the x-axis, and
succeeding configurations along the y-axis.
                               Signature-Based ABox Abduction in ALC is Hard                     69

   For convenience, we assume Q, Γ , Qo and Γo to be pairwise disjoint, and Q ∪ Γ ∪
Qo ∪ Γo ⊆ NC , so that we can identify states and tape symbols directly with concept
names. We however need to make sure that every domain element can represent at most
one state and tape symbol at a time:
                 q u q0 v ⊥                    for every q, q 0 ∈ Q, q 6= q 0                    (1)
                       0
                 γuγ v⊥                        for every γ, γ 0 ∈ Γ, γ 6= γ 0                    (2)
               γo u γo0 v ⊥                    for every γo , γo0 ∈ Γo , γo 6= γo0               (3)
The input word w = w0 . . . wm is encoded as follows:
   w0 (a), rx (a, a1 ), w1 (a1 ), rx (a1 , a2 ), . . . , wm (am ), rx (am , am+1 ), b(am+1 )    (4)
To make sure the blank symbol is propagated along the available space on the tape,
we use the following TBox axiom. Recall that Maxy and Maxz are satisfied on all
individuals where the y and z-counters are initialised, and Bx at the end of the counting
sequence along rx .
                            b u Maxy u Maxz u ¬Bx v ∀rx .b                                     (5)
We represent the current state on every node in the grid, and mark the position of the
tape head with a special concept name Head:
                                                     Head(a)                                     (6)
                                     Maxy u Maxz v q0                                            (7)
This completes the encoding of the initial configuration.
    We further make sure that also on the other configurations, every individual along
the x-axis gets assigned the same state q ∈ Q:
                                           q ≡ ∀rx .q                                            (8)
and that at most one individual satisfies Head. For the latter, we use the concept
LeftOfHead to mark individuals left of the head. q ∈ Q:
                             q ≡ ∀rx .q                                                          (9)
         (Maxx u ¬Head) v LeftOfHead                                                            (10)
             LeftOfHead v ∀rx (Head t LeaftOfHead)                                              (11)
                        Head v ¬LeftOfHead u ∀rx .(¬Head u ¬LeftOfHead)                         (12)
   ¬LeftOfHead u ¬Head v ∀rx .(¬Head u ¬LeftOfHead)                                             (13)
    The transitions that do not use the oracle are encoded using the following TBox
axioms. For every q ∈ Q \ (F ∪ {q? }), γ ∈ Γ , γo ∈ Γo :
        ∃rx .(q u Head u γ u γo )                                                             (14)
                G
                                                                     0            0   0
             v                                      ∀ry .(Head u q u ∀rx .(γ u γo ))          (15)
                  hhq,γ,γo i,hq 0 ,γ 0 ,γo0 ,−1ii∈∆
                  G
                t                   0    0    0
                                                      ∀ry .∀rx .(γ 0 u γo0 u q 0 u ∀rx .Head) (16)
                      hhq,γ,γo i,hq ,γ ,γo ,+1ii∈∆

   γ u ¬Head v ∀ry .γ                                                                           (17)
  γo u ¬Head v ∀ry .γo                                                                          (18)
70       Patrick Koopmann

   The following axioms now propagate the acceptance back to the root, so that we can
require it for the observation to be entailed. Here, we use a fresh concept name Accept.
                                      G
                                              q v Accept                            (19)
                                          q∈F
                                  ∃ry .Accept v Accept                                    (20)
                                  ∃rx .Accept v Accept                                    (21)

     Our signature ΣT,w for the final abduction problem is defined as

                     ΣT,w = Σx ∪ Σy ∪ Σz ∪ Q ∪ Γ ∪ Γo ∪ Head                              (22)

    It remains to encode the oracle. While we want the run of M to be represented in
the hypothesis, we cannot do the same for the oracle, as we need to quantify over all
possible runs (and ensure that none of them is accepting). We thus use a fresh concept
name γo∗ for every γo ∈ Γo , since Γo is already used in ΣT,w . The following axioms,
for every γo ∈ Γo , copy the information from the oracle tape to the input tape of the
oracle machine, when the oracle is queried:

                                        q? u γo v γo∗ .

Similarly, we use a concept name Tape∗ outside of the signature ΣT,w for the tape
positions of the oracle machine. The following axiom ensures that the head of the oracle
tape is always at the left-most position at the beginning:

                                  Maxx u Maxz v Tape∗

Except for axioms 14–16, the rest is encoded using the same axioms as in 7 – 18,
however with the role ry replaced by rz , Tape replaced by Tape∗ , the tape alphabet
now using the names γo∗ instead of γo , and of course all states and transitions now
referring to the Turing machine Mo .
    To detect whether the oracle is rejecting, instead of the axioms 14–16, we represent
the transitions as follows for every q ∈ Qo \ Fo and γ ∈ Γo .
                                G
    ∃rx .(q u Head∗ u γ ∗ ) v                               ∀rz .(Head∗ u q 0 u ∀rx .(γ 0 )∗ )
                                  hhq,γi,hq 0 ,γ 0 ,−1ii∈∆o
                            G
                          t            0   0
                                                        ∀rz .∀rx .((γ 0 )∗ u q 0 u ∀rx .Head∗ )
                                hhq,γi,hq ,γ ,+1ii∈∆

                             t Reject
                 ∃x.Reject v Reject
                 ∃z.Reject v Reject

Those state that, if every computation path in Mo ends in a dead end that is not an
accepting state, then in every model, Reject will be satisfied by the domain elements
that represent the beginning of the computation.
    Finally, we connect both components, the encoding of the Turing machine M and
the encoding of the Turing machine Mo , by introducing a concept name OracleChecked
that is satisfied if on every configuration between the accepting configuration and the
                              Signature-Based ABox Abduction in ALC is Hard                 71

initial configuration, either there is no oracle call (¬q? ), or there is an oracle call (q? ),
the oracle rejects, and the next state is q− .

                                                   G
                                                             q v OracleChecked
                                                       q∈F
                             ¬q? u ∃ry .OracleChecked v OracleChecked
            q? u Reject u ∃ry .(OracleChecked u q− ) v OracleChecked

    To complete the construction, we add the following GCI:

                   Accept u OracleChecked u Grid v Observation

    Denote the knowledge base consisting of all axioms shown so far by KT,w . The
final abduction problem is now the tuple AT,w = hKT,w , Observation(a), ΣT,w i.

Lemma 1. There is a hypothesis for AT,w iff T accepts w.

    Together with the upper bound from [21], we thus obtain the following theorem.

Theorem 5. Signature-based ABox abduction in ALC is N2E XP T IME NP -complete.


8 Conclusion

One conclusion one could draw from the high complexity is that it may make sense
to look into more restricted variants of ABox abduction, such as flat ABox abduction,
where complex concepts are not allowed in the hypothesis: here hypotheses can get at
most exponentially large, and the decision problem (for ALC) is only CO NE XP T IME-
complete [21]. On the other hand, especially in the context of description logics, it is
often observed that high theoretical complexity results do not mean that there cannot be
implementations that work well in practice: for instance, reasoning in SROIQ, the DL
underlying the web ontology language standard OWL, is N2E XP T IME-complete [18],
while modern optimised DL reasoners process even large OWL ontologies in prac-
tice [28]. An example more related to signature-based abduction is uniform interpola-
tion, which for ALC may also lead to triple exponentially large ontologies [25], while
systems like L ETHE [20] and FAME [35] can often successfully compute uniform inter-
polants of realistic ontologies in practice.
    The current prototypes for signature-based abduction for ALC [9,24] performed
reasonably well in an evaluation on real ontologies, however, they do not introduce
fresh individual names. Instead, the more general approach in [24] uses additional DL
constructors like inverse roles and nominals to express connections between named
individuals. Arguably, the results of abduction would become more user friendly if
they would use fresh individual names instead. An approach could be to first compute
hypotheses with those systems, and afterwards simplify concepts by introduction of
fresh individuals. The results in this paper indicate however that this could turn out
challenging in practice, as the number of those individuals may become large in theory.
72       Patrick Koopmann

References
 1. Alrabbaa, C., Baader, F., Borgwardt, S., Koopmann, P., Kovtunova, A.: Finding small
    proofs for description logic entailments: Theory and practice. In: Albert, E., Kovacs, L.
    (eds.) LPAR-23: 23rd International Conference on Logic for Programming, Artificial In-
    telligence and Reasoning. EPiC Series in Computing, vol. 73, pp. 32–67. EasyChair (2020).
    https://doi.org/10.29007/nhpp
 2. Alrabbaa, C., Baader, F., Borgwardt, S., Koopmann, P., Kovtunova, A.: Finding good proofs
    for description logic entailments using recursive quality measures. In: Platzer, A., Sut-
    cliffe, G. (eds.) Proceedings of the 28th International Conference on Automated Deduction
    (CADE’21). LNCS (2021), to appear.
 3. Alrabbaa, C., Hieke, W., Turhan, A.Y.: Counter model transformation for explaining non-
    subsumption in EL. In: Beierle, C., Ragni, M., Stolzenburg, F., Thimm, M. (eds.) Proceed-
    ings of the 7th Workshop on Formal and Cognitive Reasoning (FCR-2021) co-located with
    the 44th German Conference on Artificial Intelligence (KI-2021). CEUR Workshop Proceed-
    ings, vol. 2961, pp. 9–22. CEUR-WS.org (2021)
 4. Baader, F., Peñaloza, R., Suntisrivaraporn, B.: Pinpointing in the description logic EL+ . In:
    KI 2007: Advances in Artificial Intelligence, 30th Annual German Conference on AI, KI
    2007, Osnabrück, Germany, September 10-13, 2007, Proceedings. pp. 52–67 (2007)
 5. Bauer, J., Sattler, U., Parsia, B.: Explaining by example: Model exploration for ontology
    comprehension. In: Grau, B.C., Horrocks, I., Motik, B., Sattler, U. (eds.) Proceedings of the
    22nd International Workshop on Description Logics (DL 2009), Oxford, UK, July 27-30,
    2009. CEUR Workshop Proceedings, vol. 477. CEUR-WS.org (2009), http://ceur-ws.org/
    Vol-477/paper 37.pdf
 6. Bienvenu, M.: Complexity of abduction in the EL family of lightweight description logics.
    In: Proceedings of KR 2008. pp. 220–230. AAAI Press (2008), http://www.aaai.org/Library/
    KR/2008/kr08-022.php
 7. Calvanese, D., Ortiz, M., Simkus, M., Stefanoni, G.: Reasoning about explanations for neg-
    ative query answers in DL-Lite. J. Artif. Intell. Res. 48, 635–669 (2013). https://doi.org/10.
    1613/jair.3870
 8. Ceylan, İ.İ., Lukasiewicz, T., Malizia, E., Molinaro, C., Vaicenavicius, A.: Explanations for
    negative query answers under existential rules. In: Calvanese, D., Erdem, E., Thielscher, M.
    (eds.) Proceedings of KR 2020. pp. 223–232. AAAI Press (2020). https://doi.org/10.24963/
    kr.2020/23
 9. Del-Pinto, W., Schmidt, R.A.: ABox abduction via forgetting in ALC. In: The Thirty-
    Third AAAI Conference on Artificial Intelligence, AAAI 2019. pp. 2768–2775. AAAI Press
    (2019). https://doi.org/10.1609/aaai.v33i01.33012768
10. Du, J., Qi, G., Shen, Y., Pan, J.Z.: Towards practical ABox abduction in large description
    logic ontologies. Int. J. Semantic Web Inf. Syst. 8(2), 1–33 (2012). https://doi.org/10.4018/
    jswis.2012040101
11. Du, J., Wan, H., Ma, H.: Practical TBox abduction based on justification patterns. In: Pro-
    ceedings of the Thirty-First AAAI Conference on Artificial Intelligence. pp. 1100–1106
    (2017), http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14402
12. Du, J., Wang, K., Shen, Y.: A tractable approach to ABox abduction over description logic
    ontologies. In: Brodley, C.E., Stone, P. (eds.) Proceedings of the Twenty-Eighth AAAI Con-
    ference on Artificial Intelligence. pp. 1034–1040. AAAI Press (2014), http://www.aaai.org/
    ocs/index.php/AAAI/AAAI14/paper/view/8191
13. Elsenbroich, C., Kutz, O., Sattler, U.: A case for abductive reasoning over ontologies. In:
    Proceedings of the OWLED’06 Workshop on OWL: Experiences and Directions (2006),
    http://ceur-ws.org/Vol-216/submission 25.pdf
                               Signature-Based ABox Abduction in ALC is Hard                   73

14. Haifani, F., Koopmann, P., Tourret, S.: Abduction in EL via translation to FOL. In: Schmidt,
    R.A., Wernhard, C., Zhao, Y. (eds.) SOQE 2021: The 2nd Workshop on Second-Order Quan-
    tifier Elimination and Related Topics. CEUR Workshop Proceedings, CEUR-WS.org (2021)
15. Haifani, F., Koopmann, P., Tourret, S.: Introducing connection minimal abduction for EL
    ontologies. In: Explainable Logic-Based Knowledge Representation (XLoKR 2021) (2021)
16. Halland, K., Britz, K.: ABox abduction in ALC using a DL tableau. In: 2012 South African
    Institute of Computer Scientists and Information Technologists Conference, SAICSIT ’12.
    pp. 51–58 (2012). https://doi.org/10.1145/2389836.2389843
17. Horridge, M.: Justification Based Explanation in Ontologies. Ph.D. thesis, University
    of Manchester, UK (2011), https://www.research.manchester.ac.uk/portal/files/54511395/
    FULL TEXT.PDF
18. Kazakov, Y.: RIQ and SROIQ are harder than SHOIQ. In: Brewka, G., Lang, J. (eds.) Princi-
    ples of Knowledge Representation and Reasoning: Proceedings of the Eleventh International
    Conference, KR 2008, Sydney, Australia, September 16-19, 2008. pp. 274–284. AAAI Press
    (2008), http://www.aaai.org/Library/KR/2008/kr08-027.php
19. Klarman, S., Endriss, U., Schlobach, S.: ABox abduction in the description logic
    ALC. Journal of Automated Reasoning 46(1), 43–80 (2011). https://doi.org/10.1007/
    s10817-010-9168-z
20. Koopmann, P.: LETHE: forgetting and uniform interpolation for expressive description log-
    ics. Künstliche Intell. 34(3), 381–387 (2020). https://doi.org/10.1007/s13218-020-00655-w
21. Koopmann, P.: Signature-based abduction with fresh individuals and complex concepts
    for description logics. In: Zhou, Z. (ed.) Proceedings of the Thirtieth International Joint
    Conference on Artificial Intelligence, IJCAI 2021. pp. 1929–1935. ijcai.org (2021), https:
    //doi.org/10.24963/ijcai.2021/266
22. Koopmann, P.: Signature-based abduction with fresh individuals and complex concepts for
    description logics (extended version). CoRR abs/2105.00274 (2021), https://arxiv.org/abs/
    2105.00274
23. Koopmann, P.: Two ways of explaining negative entailments in description logics using ab-
    duction. In: Explainable Logic-Based Knowledge Representation (XLoKR 2021) (2021)
24. Koopmann, P., Del-Pinto, W., Tourret, S., Schmidt, R.A.: Signature-based abduction for ex-
    pressive description logics. In: Calvanese, D., Erdem, E., Thielscher, M. (eds.) Proceedings
    of the 17th International Conference on Principles of Knowledge Representation and Rea-
    soning, KR 2020. pp. 592–602. AAAI Press (2020). https://doi.org/10.24963/kr.2020/59
25. Lutz, C., Wolter, F.: Foundations for uniform interpolation and forgetting in expressive de-
    scription logics. In: Proceedings of IJCAI 2011. pp. 989–995. IJCAI/AAAI (2011). https:
    //doi.org/10.5591/978-1-57735-516-8/IJCAI11-170
26. McGuinness, D.L.: Explaining Reasoning in Description Logics. Ph.D. thesis, Rutgers Uni-
    versity, NJ, USA (1996). https://doi.org/10.7282/t3-q0c6-5305
27. Obeid, M., Obeid, Z., Moubaiddin, A., Obeid, N.: Using description logic and Abox abduc-
    tion to capture medical diagnosis. In: Wotawa, F., Friedrich, G., Pill, I., Koitz-Hristov, R.,
    Ali, M. (eds.) Advances and Trends in Artificial Intelligence. From Theory to Practice. pp.
    376–388. Springer International Publishing, Cham (2019)
28. Parsia, B., Matentzoglu, N., Gonçalves, R.S., Glimm, B., Steigmiller, A.: The OWL reasoner
    evaluation (ORE) 2015 competition report. J. Autom. Reason. 59(4), 455–482 (2017). https:
    //doi.org/10.1007/s10817-017-9406-8
29. Peirce, C.S.: Deduction, induction, and hypothesis. Popular science monthly 13, 470–482
    (1878)
30. Peñaloza, R.: Axiom-Pinpointing in Description Logics and Beyond. Ph.D. thesis, Technis-
    che Universität Dresden, Germany (2009)
74      Patrick Koopmann

31. Pukancová, J., Homola, M.: Tableau-based ABox abduction for the ALCHO description
    logic. In: Proceedings of the 30th International Workshop on Description Logics (2017),
    http://ceur-ws.org/Vol-1879/paper11.pdf
32. Pukancová, J., Homola, M.: The AAA ABox abduction solver. Künstliche Intell. 34(4), 517–
    522 (2020). https://doi.org/10.1007/s13218-020-00685-4
33. Schlobach, S., Cornet, R.: Non-standard reasoning services for the debugging of description
    logic terminologies. In: Gottlob, G., Walsh, T. (eds.) Proc. of the 18th Int. Joint Conf. on
    Artificial Intelligence (IJCAI 2003). pp. 355–362. Morgan Kaufmann, Acapulco, Mexico
    (2003), http://ijcai.org/Proceedings/03/Papers/053.pdf
34. Wei-Kleiner, F., Dragisic, Z., Lambrix, P.: Abduction framework for repairing incomplete EL
    ontologies: Complexity results and algorithms. In: Proceedings of the Twenty-Eighth AAAI
    Conference on Artificial Intelligence. pp. 1120–1127. AAAI Press (2014), http://www.aaai.
    org/ocs/index.php/AAAI/AAAI14/paper/view/8239
35. Zhao, Y., Schmidt, R.A.: FAME(Q): an automated tool for forgetting in description logics
    with qualified number restrictions. In: Fontaine, P. (ed.) Automated Deduction - CADE 27 -
    27th International Conference on Automated Deduction, Natal, Brazil, August 27-30, 2019,
    Proceedings. Lecture Notes in Computer Science, vol. 11716, pp. 568–579. Springer (2019).
    https://doi.org/10.1007/978-3-030-29436-6 34