=Paper=
{{Paper
|id=Vol-3009/paper2
|storemode=property
|title=Abduction in EL via Translation to FOL
|pdfUrl=https://ceur-ws.org/Vol-3009/paper2.pdf
|volume=Vol-3009
|authors=Fajar Haifani,Patrick Koopmann,Sophie Tourret
|dblpUrl=https://dblp.org/rec/conf/kr/HaifaniKT21
}}
==Abduction in EL via Translation to FOL==
<pdf width="1500px">https://ceur-ws.org/Vol-3009/paper2.pdf</pdf>
<pre>
       Abduction in EL via Translation to FOL ?

        Fajar Haifani1 , Patrick Koopmann2 , and Sophie Tourret1,3
1
    Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken
                  Germany ({fhaifani,stourret}@mpi-inf.mpg.de)
    2
      Institute for Theoretical Computer Science, Technische Universität Dresden,
                     Germany (patrick.koopmann@tu-dresden.de)
            3
              Université de Lorraine, CNRS, Inria, LORIA, Nancy, France


       Abstract. We present a technique for performing TBox abduction in
       the description logic EL. The input problem is converted into first-order
       formulas on which a prime implicate generation technique is applied, then
       EL hypotheses are reconstructed by combining the generated positive and
       negative implicates.

       Keywords: Abduction · Description Logic EL· Prime Implicates · First-
       order Logic.


1    Introduction
Description logic (DL) ontologies are used to formalize terminological knowledge
in diverse areas such as medicine, biology, or the semantic web. Common rea-
soning tasks for ontologies are subsumption checking, which is to decide whether
one description of a concept generalizes another, and classification, which is to
compute the entire subsumption hierarchy of an ontology, that is, the set of all
subsumption relationships that hold between atomic concepts in the ontology.
Both problems can nowadays be computed rather efficiently using highly opti-
mized description logic reasoners [24]. Modern ontologies can become very large
and complex: for instance, the medical ontology SNOMED CT4 contains over
350,000 statements, while the Gene Ontology GO defines over 50,000 concepts.
But even for less complex ontologies, it is easy to introduce bugs, and possi-
ble inferences may not always be transparent to ontology users and engineers.
Correspondingly, services for explaining those inferences can be very helpful.
Typical techniques for explaining why subsumption can be inferred include jus-
tifications [4, 16, 27] and proofs [1, 2], both of which have been included in the
popular ontology development tool Protégé [17, 18].
     In order to explain why a subsumption relationship cannot be inferred, one
can use abduction [21]. The general setup in abduction is that we have an
?
  Funded by the DFG grant 389792660 as part of TRR 248 – CPEC,
  see https://perspicuous-computing.science
  Copyright © 2021 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0).
4
  https://www.snomed.org
                                   Abduction in EL via Translation to FOL       47

input consisting of a knowledge base K and some logical statement α—the
observation—that is not logically entailed by K. The aim is then to compute
a hypothesis, a set of axioms whose addition to K would result in the entail-
ment α. This way, the hypothesis not only provides for a possible explanation, it
also offers a fix to repair the entailment in case it is supposed to hold. One way
to avoid trivial hypotheses (such as the axiom α itself) is to restrict solutions
syntactically, for instance by specifying a signature of predicates that can be
used in the solution [20, 22].
    In the context of DLs, depending on the shape of the observation to be
explained, abduction can be classified into different categories:
 – In concept abduction observation and hypothesis are DL concepts. That
   means we are looking for a concept that is subsumed by a user-given concept
   (with respect to an ontology) [5].
 – In ABox abduction, observations and hypotheses have the form of ABoxes,
   that is, they describe facts, which is the more typical scenario for diagno-
   sis [6–9, 11, 15, 19, 20, 25, 26].
 – In TBox abduction, observation and hypothesis consist of TBox axioms,
   which can be used to find fixes for missing subsumption relationships [10,28].
 – Finally, in knowledge base abduction, the most general form, both hypotheses
   and observations can be arbitrarily composed of TBox and ABox axioms [13,
   22].
We are interested in explaining missing subsumption relationships, which is what
is done by TBox abduction, in which background knowledge, observation, and
hypothesis all consist of TBox axioms, that is, DL axioms expressing subsump-
tion relationships between concepts. Optionally, a set of abducible concepts can
be specified as part of the input, of which every hypothesis needs to be composed.
We work in the lightweight DL EL, used in many large scale ontologies [3].
    TBox abduction has been considered before, albeit in different settings.
In [10], instead of a set of abducibles, a set of justification patterns is given,
which is a set of syntactical patterns that hypotheses have to fit in. In contrast,
[28] generalises both this work and the one in [10], in that it uses an arbitary
oracle function to decide whether a solution is admissible or not (which may
use abducibles, justification patterns, or something else). The work in [28] also
considers EL, and abduction under various minimality notions such as subset
minimality and size minimality. It is there shown that deciding the existence
of hypotheses is tractable, while for certain minimality criteria, it can become
coNP-complete. In addition to the theoretical results, the authors of [28] present
practical algorithms, and an evaluation of an implementation. Different to our
approach, they use an external DL reasoner to decide entailment relationships.
In contrast, we present an approach that directly exploits first-order reasoning,
and thus has the potential to be generalisable to more expressive DLs.
    The TBox abduction technique we present proceeds in three steps. The first
step is the translation of the abduction problem into a first-order formula Φ.
Then, we compute the prime implicates of Φ, that is, a minimal set of logical
consequences of Φ that subsume all other consequences of Φ. In the final step, we
48      Fajar Haifani, Patrick Koopmann and Sophie Tourret

       C1 v C2
                                                         PI g+
                                                            Σ (Φ)


          T         translation   Φ      PI generation              recombination   S


                                                         PI g−
                                                            Σ (Φ)
                                  Σ


              Fig. 1. EL abduction using prime implicate generation in FOL.


construct TBoxes that are hypotheses of the original abduction problem, from
the prime implicates of Φ.
    Figure 1 illustrates this process with some more details. The translation to
first-order relies on the assumption that the TBox T is normalized and this
results in a set of definite Horn clauses Φ. Prime implicate (PI) generation in
first-order logic can be done with an off-the-shelf tool [12, 23] or, in our case,
by slightly altering a resolution-based theorem prover, such as included in the
SPASS theorem prover [29]. Since the input is Horn, PI g+ Σ (Φ), the set of ground
positive prime implicates of Φ, contains only unit clauses. The recombination
step looks at the negative ground prime implicates in PI g−   Σ (Φ) one after the
other and attempts to match each literal in one such clause with clauses in
PI g+
    Σ (Φ). If this succeeds, the result is a solution to the original problem. We
denote S the set of all solutions obtained in this way.
    In this paper, we describe in details the translation and recombination steps
and prove that the S obtained in that way contains only solutions to the original
problem.


2     Preliminaries
We begin with an introduction of the relevant notions, first in EL, then in first-
order logic.

2.1   The Description Logic EL
Let NC and NR be pair-wise disjoint, countably infinite sets of respectively atomic
concepts and roles. Generally, we use letters A, B, E, F ,... for atomic concepts,
and r for roles, possibly annotated, letters C, D, possibly annotated, denote EL
concepts, built according to the syntax rule

                             C ::= > | A | C u C | ∃r.C .
        d
We use {C1 , . . . , Cm } as abbreviation for C1 u . . . u Cm , and identify the empty
conjunction (when m = 0) with >. An EL TBox T is a finite set of concept
inclusions (CIs) of the form C v D. We use C ≡ D as abbreviation for {C v D,
D v C}. The semantics of EL is defined as usual using DL interpretations, which
are tuples I = (∆I , ·I ) made of a domain ∆I and an interpretation function
                                        Abduction in EL via Translation to FOL                 49

·I that maps atomic concepts A ∈ NC to sets AI ⊆ ∆I and roles r ∈ NR to
relations rI ⊆ ∆I × ∆I . The interpretation function ·I is extended to complex
concepts as follows:

                         >I = ∆I          (C u D)I = C I ∩ DI
                   (∃r.C)I = {d ∈ ∆I | ∃(d, e) ∈ rI s.t. e ∈ C I }

We say that I satisfies a CI C v D, in symbols I |= C v D, if and only if
C I ⊆ DI , and if I satisfies all axioms in a TBox T , we write I |= T and call I a
model of T . If a CI C v D is satisfied in every model of T , we write T |= C v D
and say that C v D is entailed by T . In this case, we say that C is subsumed
by D and call C a subsumee of D and D a subsumer of C.
    Note that, if we identify NC with the set of unary predicates and NR with the
set of binary predicates, I is a classical first-order structure for those predicates.
We can thus define satisfaction of first-order formulas in DL interpretations as
usual, and also relate first-order formulas with CIs and TBoxes via the usual
entailment relation.

2.2   Abduction in EL
The abduction problem we are concerned with in this paper is the following.
Definition 1. Given a TBox T , a set of atomic concepts Σ ⊆ NC and a concept
inclusion C1 v C2 , called the observation, where C1 and C2 are atomic concepts
and T 6|= C1 v C2 , the corresponding TBox abduction problem, denoted by the
tuple hT , Σ, C1 v C2 i, is to find a TBox

  H ⊆ {Ai1 u · · · u Ain v Bi1 u · · · u Bim | {Ai1 , . . . , Ain , Bi1 , . . . , Bim } ⊆ Σ}

where m > 0, n ≥ 0 and such that T ∪ H |= C1 v C2 and T ∩ H = ∅. Such a
solution H to the abduction problem is called a hypothesis.
As an example, suppose we have a TBox T = {C1 v ∃r.A, ∃r.B v C2 }. For the
abduction problem hT , {A, B}, C1 v C2 i, the set H = {A v B} is a solution
since T ] H |= C1 v C2 . Note that since EL TBoxes are always consistent, the
consistency condition usually required on T ∪ H is not needed in our setting.

2.3   Prime Implicate Generation in First Order Logic
We consider first-order logic with unary and binary predicates, which we identify
with the corresponding elements in NC and NR . Let V be a set of variables, and
ΣFOL be a signature such that ΣFOL = NC ∪ NR , where NC and NR are the sets
of EL atomic concepts and roles, used as unary and binary predicate symbols
respectively. Furthermore, let F be a set of function symbols, which includes a
set C of constants as nullary function symbols. A term t, possibly annotated, is
either a variable x ∈ V, a constant c ∈ C, or of the form f (t1 , . . . , tm ), where
f is an m-ary function symbol and t1 , . . ., tm are terms. Atoms are either of
50     Fajar Haifani, Patrick Koopmann and Sophie Tourret

the form A(t1 ) or r(t1 , t2 ), where t1 and t2 are terms. Literals are either atoms
κ (positive literals) or negated atoms ¬κ (negative literals). Clauses, denoted
ϕ with possible annotations, are disjunctions of literals implicitly universally
quantified. Formulas, denoted Φ, are described using universal and existential
quantification, conjunction, disjunction, and negation. For convenience, by abuse
of notation, we may sometimes identify sets of formulas or clauses with the
conjunction over those.
    Skolemization removes existentially quantified variables from formulas in the
usual way: for each existentially quantified variable, it adds a fresh function
with variable arguments taken from the universally quantified variables outside
the scope of the variable under consideration (if no such variables exist, we
use a nullary function, ie. a constant). For example, ∀x.¬A(x) ∨ ∃y.r(x, y) is
Skolemized into ∀x.¬A(x) ∨ r(x, f (x)). Here, we assume that the formula is in
negation normal form, that is, every negation symbol occurs only in front of
an atom. Furthermore, we assume that such Skolem functions are taken from
a set NS not in the signature. In our context, we only require a single Skolem
constant, namely sk0 , since Skolemized atoms, literals, and clauses range over
Skolem terms either built over a given variable or sk0 instead of ranging over
variables. We let Tx (NS ) be the set of Skolem terms built on x, where x is either
sk0 or a variable in V.
    Since we only consider formulas over unary and binary predicates, they can
be satisfied by DL interpretations as defined above, understood naturally as
first-order structures. We call interpretations I with ∆I = Tsk0 (NS ) Herbrand
interpretations, which for convenience, by abuse of notation, we may treat as
sets of ground atoms s.t. A(t) ∈ I if and only if t ∈ AI and r(t, t0 ) ∈ I if and
only if (t, t0 ) ∈ rI .

Definition 2 (Prime Implicate). Given a set of clauses Φ, a clause ϕ is an
implicate of Φ if Φ |= ϕ. Additionally, ϕ is a prime implicate of Φ if for any
other implicate ϕ0 of Φ s.t. ϕ0 |= ϕ, it also holds that ϕ |= ϕ0 . Given a Skolemized
Φ, the set PI g+
               Σ (Φ) is the set of all positive ground prime implicates of Φ and
the set PI g−
           Σ  (Φ) is the set of all negative ground prime implicates where all of
the predicate symbols belong to Σ.

Example 3. Given Φ as follows:

         Φ = {A(sk0 ), ¬B(sk0 ), ¬A(x) ∨ r(x, sk(x)), ¬A(x) ∨ E(sk(x)),
                ¬G(x) ∨ ¬r(x, y) ∨ ¬E(y) ∨ B(sk0 )},

where sk0 is a constant and sk is a function symbol, the ground prime implicates
of Φ are PI g+                                     g−
             Σ (Φ) = {A(sk0 ), E(sk(sk0 ))} and PI Σ (Φ) = {¬B(sk0 ), ¬G(sk0 ) ∨
¬E(sk(sk0 ))}. They are implicates because all of them are consequences of Φ
(they are entailed by Φ). Moreover, these implicates are also prime implicates
because it is not possible to obtain an implicate of Φ by removing a literal from
one of them.

In some cases, the set of prime implicates can even be infinite.
                                   Abduction in EL via Translation to FOL      51

Example 4. Consider Φ0 , a slight variation on Φ from the previous example,
where E(sk(x)) is replaced by A(sk(x)), so that:

        Φ0 = {A(sk0 ), ¬B(sk0 ), ¬A(x) ∨ r(x, sk(x)), ¬A(x) ∨ A(sk(x)),
                ¬G(x) ∨ ¬r(x, y) ∨ ¬E(y) ∨ B(sk0 )},

Whereas the negative ground prime implicates are unchanged (PI g−        0
                                                                     Σ (Φ ) =
   g−
PI Σ (Φ)), there are now infinitely many positive ground prime implicates, be-
cause PI g+   0
         Σ (Φ ) contains A(sk0 ), A(sk(sk0 )), A(sk(sk(sk0 ))) . . .


3   From EL to FOL


We assume the EL TBox in the input to be in normal form as defined in [3],
which means that every CI is of the form A v B, A1 u A2 v B, ∃r.A v B or
A v ∃r.B, where r ∈ NR and A, A1 , A2 , B ∈ NC ∪ {>}. Every EL TBox can be
transformed in polynomial time into this normal form through the introduction
of fresh concept names. This transformation preserves all entailments in the
signature of the original ontology, even after the addition of CIs (assuming the
same CIs are added also to the input ontology, and they don’t use any of the
introduced concept names). Note that this transformation extends the signature,
and thus potentially the solution space of the abduction problem: to avoid this,
we may simply intersect the set of abducibles Σ with the signature of the original
input ontology.
   At the level of TBoxes, we use the standard, semantics-preserving translation
from EL to FOL [3].


Definition 5 (TBox translation). The function π translates concepts into
first-order formulas:

                           π(>, x) = true
                           π(A, x) = A(x)
                        π(∃r.C, x) = ∃y.(r(x, y) ∧ π(C, y))
                      π(C u D, x) = π(C, x) ∧ π(D, x)

We lift π to concept inclusions with: π(C v D) = ∀x.¬π(C, x)∨π(D, x). Finally,
we lift π to TBoxes with:

                       π(T ) = {π(C v D) | C v D ∈ T }.
52       Fajar Haifani, Patrick Koopmann and Sophie Tourret

     If T is in normal form, π(T ) contains only axioms of the following shapes:

              shapes from CIs with > : ∀x.true
                                         ∀x.B(x)
                                         ∀x.¬A(x) ∨ ∃y.r(x, y),
                                         ∀x.¬(∃y.r(x, y)) ∨ B(x)
           shapes from CIs without > : ∀x.¬A(x) ∨ B(x),
                                         ∀x.¬A1 (x) ∨ ¬A2 (x) ∨ B(x),
                                         ∀x.¬(∃y.r(x, y) ∧ A(y)) ∨ B(x),
                                         ∀x.¬A(x) ∨ ∃y.(r(x, y) ∧ B(y)).

In practice, since the symbol > will be preprocessed, only the four last forms are
relevant in our work.
    The function π is semantics-preserving in the following sense:
Theorem 6 (First-Order Semantics Preservation [3]). For any EL TBox T
and possibly complex concepts C and D, T |= C v D if and only if π(T ) |=
π(C v D).
   The translation of a TBox abduction problem, built on top of the standard
TBox translation, is less straightforward.
   The translation Π(T , C1 v C2 ) of hT , Σ, C1 v C2 i to first-order logic is the
Skolemization of
                            π(T ] T 0 ) ∧ ¬π(C1 v C20 )
where sk0 is used as the unique fresh Skolem constant such that the Skolem-
ization of ¬π(C1 v C20 ) results in {C1 (sk0 ), ¬C20 (sk0 )}, and where several other
things happen that are explained in the following paragraphs.
    Before Skolemization, T is first preprocessed by replacing every occurrence
of the concept > in T with that of the fresh atomic concept A> and we add
∃r.A> v A> and B v A> to T for every role r and atomic concept B. This
simulates for A> the implicit property that C v > holds for any C no matter
what the TBox is. In particular, this ensures that whenever there is a positive
prime implicate B(t) or r(t, t1 ), A> (t) also becomes a prime implicate.
    Finally, we define T 0 by renaming all atomic concepts A from T using fresh
symbols A0 . Notice that this is also done for C2 , renamed to C20 in Π(T , C1 v C2 ).
Thanks to this encoding trick, the inferences can be traced back from C1 (sk0 )
or from ¬C20 (sk0 ) and do not interfere with each other, making it easier to
relate ground prime implicates and concept inclusions and capturing interesting
solutions that would otherwise not be found.
    We now look at a complete encoding example in details.
Example 7. Suppose given T over the signature where NC = {A, B, C1 , C2 , E, F,
G, H} and NR = {r1 , r2 } as follows.

                   T = {C1 v ∃r1 .A, C1 v B, C1 v ∃r2 .G,
                         ∃r2 .E v B, ∃r1 .F v H, B u H v C2 }
                                        Abduction in EL via Translation to FOL        53

Consider the abduction problem hT , NC , C1 v C2 i. Preprocessing the > symbol
turns T into:

                T = {C1 v ∃r1 .A, C1 v B, C1 v ∃r2 .G,
                       ∃r2 .E v B, ∃r1 .F v H, B u H v C2 }
                      ]{∃r1 .A> v A> , ∃r2 .A> v A> }
                      ]{J v A> | J ∈ {A, B, C1 , C2 , E, F, G, H}}

and the renaming of atomic concepts produces:

              T 0 = {C10 v ∃r1 .A0 , C10 v B 0 , C10 v ∃r2 .G0 ,
                     ∃r2 .E 0 v B 0 , ∃r1 .F 0 v H 0 , B 0 u H 0 v C20 }
                    ]{∃r1 .A0> v A0> , ∃r2 .A0> v A0> }
                    ]{J v A0> | J ∈ {A0 , B 0 , C10 , C20 , E 0 , F 0 , G0 , H 0 }}

The translation of T results in ΦT , the Skolemization of π(T ):

       ΦT = {¬C1 (x) ∨ r1 (x, sk1 (x)), ¬C1 (x) ∨ A(sk1 (x)),
               ¬C1 (x) ∨ B(x),
               ¬C1 (x) ∨ r2 (x, sk2 (x)), ¬C1 (x) ∨ G(sk2 (x)),
               ¬r2 (x, y) ∨ ¬E(y) ∨ B(x),
               ¬r1 (x, y) ∨ ¬F (y) ∨ H(x),
               ¬B(x) ∨ ¬H(x) ∨ C2 (x),
              ]{¬r1 (x, y) ∨ A> (y) ∨ A> (x), ¬r2 (x, y) ∨ A> (y) ∨ A> (x)}
              ]{¬J(x) ∨ A> (x) | J ∈ {A, B, C1 , C2 , E, F, G, H}}

ΦT 0 , the Skolemization of T 0 , is similar except that the atomic concepts are
renamed. Finally, Π(T , C1 v C2 ) = ΦT ∪ ΦT 0 ∪ {C1 (sk0 ), ¬C20 (sk0 )}.
    Let us now look in more detail at the reason for replacing > with A> . First,
remember that > is not considered as a part of NC and represents an empty
conjunction. Since it is translated to true in first-order logic, it never appears in
a translated formula, as can be seen in the text following Definition 5. However,
our technique is only looking for the ground prime implicates. There are many
other non-ground prime implicates that are generated and that we only use in
the hope of finally grounding them. It would not be practical to have to sort out
the non-ground PIs due to the > symbol from the rest of them. Consider the
following example.
Example 8. Given Σ = {A, E, F, G} and the TBox

     T = {> v ∃r1 .F, C1 v ∃r2 .A, A v ∃r3 .G, ∃r2 .E v J1 , ∃r1 .J1 v J0 ,
            ∃r2 .> v J3 , J0 u J3 v C2 }

Some of the ground PIs of Π(T , C1 v C2 ) are as follows:
54        Fajar Haifani, Patrick Koopmann and Sophie Tourret

 – A(sk2 (sk0 )), F (sk1 (sk2 (sk0 ))), G(sk3 (sk2 (sk0 ))) are positive PIs,
 – ¬E 0 (sk1 (sk2 (sk0 ))) ∨ ¬A0> (sk3 (sk2 (sk0 ))) is a negative PI,
where the Skolem functions sk1 , sk2 , and sk3 , are from the Skolemization of r1 ,
r2 , and r3 respectively in the first three axioms.
     Let us take a close look at the reason why F (sk1 (sk2 (sk0 ))) is an implicate.
We observe that A(sk2 (sk0 )) is obtained as a consequence of the presence of
C1 v ∃r2 .A in T . Thanks to the addition of A v A> to T , A> (sk2 (sk0 )) is also
an implicate. Finally due to the presence of the renaming of > v ∃r1 .F in T ,
namely A> v ∃r1 .F , the clause F (sk1 (sk2 (sk0 ))) is also an implicate. With a
direct translation of >, we would only be able to derive F (sk1 (x)) and there
would be no way to obtain a ground implicate from this clause.
     Similarly, it is only thanks to the presence of ∃r3 .A> v A> in T that the
prime implicate ¬E 0 (sk1 (sk2 (sk0 ))) ∨ ¬A0> (sk3 (sk2 (sk0 ))) can be deduced.


4      Building EL hypotheses from prime implicates
We use the ground prime implicates of Φ = Π(T , C1 v C2 ) to produce solutions
to the abduction problem hT , Σ, C1 v C2 i in EL. To do this, we identify sub-
sumers of C1 and subsumees of C2 that can be generated from the ground prime
implicates in Φ, and that can be matched with each other based on the Skolem
terms they contain.
Definition 9 (Solution reconstruction). Let hT , Σ, C1 v C2 i be an abduc-
tion problem and let Φ = Π(T , C1 v C2 ), and assume there are prime implicates

     {A1 (t1 ), . . . , An (tn )} ⊆ PI g+          0 0                 0   0        g−
                                       Σ (Φ) and ¬B1 (t1 ) ∨ . . . ∨ ¬Bm (tm ) ∈ PI Σ (Φ)

such that {ti }i∈[1,n] = {t0i }i∈[1,m] . We can then obtain a hypothesis H for the
abduction problem

                   H = {l1 (t) v l2 (t) | t ∈ L and T 6|= l1 (t) v l2 (t)},

such that
 – L = {ti }di∈[1,n] = {t0i }i∈[1,m] ,
 – l1 (t) = d{A | A(t) ∈ {A1 (t1 ), . . . , An (tn )}}, and
 – l2 (t) = {B | B 0 (t) ∈ {B10 (t01 ), . . . , Bm
                                                 0
                                                   (t0m )}}.
We denote by S(Φ, Σ) the set containing all such H.
Example 10. From Example 7,

                    T = { C1 v ∃r1 .A, C1 v B,
                           C1 v ∃r2 .F, B u ∃r1 .E v C2 ,
                           ∃r1 .A> v A> , ∃r2 .A> v A> }
                           ]{J v A> | J ∈ {C1 , C2 , A, B, E, F, G}}.

Consider the prime implicates from Π(T , C1 v C2 )
                                      Abduction in EL via Translation to FOL           55

 – {B(sk0 ), A(sk1 (sk0 ))} ⊆ PI g+
                                 Σ (Φ), and
 – ¬B (sk0 ) ∨ C (sk1 (sk0 )) ∈ PI g−
     0           0
                                   Σ (Φ).

From the Skolem term sk0 and sk1 (sk0 ), we have l1 (sk0 ) v l2 (sk0 ) = B v B
and l1 (sk1 (sk0 )) v l2 (sk1 (sk0 )) = A v C respectively. However, B v B is a
tautology. Consequently, H = {A v C} is one possible hypothesis.

Theorem 11. Given a TBox abduction problem hT , Σ, C1 v C2 i, and denot-
ing Φ the set Π(T , C1 v C2 ), the set S(Φ, Σ) contains only hypotheses for the
abduction problem.

Proof. Let ΦT denote the Skolemization of π(T ). If ¬B10 (t01 ) ∨ . . . ∨ ¬Bm   0
                                                                                  (t0m ) ∈
    g−
PI Σ (Φ), then ΦT ∪ {C1 (sk0 ), ¬C2 (sk0 )} |= ¬B1 (t1 ) ∨ . . . ∨ ¬Bm (tm ) because
using, e.g., resolution, any inference to derive the primed disjunction can be
imitated without primes. Moreover, A(t) ∈ PI g+                                 g+
                                                      Σ (Φ) implies A(t) ∈ PI Σ (ΦT ∪
{C1 (sk0 )}) since only non-primed clauses are relevant to entail positive literals.
Thus ΦT ∪ {C1 (sk0 )} |= A1 (t1 ) ∧ . . . ∧ An (tn ).
    By combining the two results for compatible positive and negative clauses,
we obtain ΦT ∪ {C1 (sk0 ), ¬C2 (sk0 )} |= π(l1 (t), t) ∧ ¬π(l2 (t), t) for all t ∈ L
where L, l1 (t) and l2 (t) are as defined in Definition 9. In turn, this implies
ΦT ∪ {C1 (sk0 ), ¬C2 (sk0 )} |= ∃x.(π(l1 (t), x) ∧ ¬π(l2 (t), x)) for all t ∈ L. Thus
ΦT ∪ {C1 (sk0 ), ¬C2 (sk0 )} ∪ {∀x.(¬π(l1 (t), x) ∨ π(l2 (t), x)) for all t ∈ L}. Hence,
for any hypothesis H ∈ S(Φ, Σ), we know that ΦT ∪ {C1 (sk0 ), ¬C2 (sk0 )} ∪ π(H)
is unsatisfiable. Consequently, because Skolemization is satisfiability-preserving,
π(T ) ∪ ¬π(C1 v C2 ) ∪ π(H) is also unsatisfiable, hence π(T ) ∪ π(H) |= π(C1 v
C2 ). By Theorem 6, this in turn means that T ∪ H |= C1 v C2 .                           t
                                                                                         u


5    Conclusion

We have introduced a technique to compute hypotheses to solve TBox abduction
problems in EL. This technique turns the input TBox into a first-order formula,
of which the ground prime implicates can be used to build hypotheses answering
the problem.
    We have not shown the prime implicate computation step itself for two rea-
sons. First, existing tools such as SOLAR [23] and GPiD [12] can compute prime
implicates automatically in first-order logic. Second, there is an issue with the
termination of such tools for problems with cyclic dependencies in the CIs, e.g.,
such that T |= A v ∃r.A, because in that case, the set of prime implicates may
be unbounded. However, there exists only a finite number of hypotheses to be
found and we conjecture that prime implicates with polynomial nesting depth
of Skolem terms are sufficient for finding all relevant hypotheses. We plan to
verify this conjecture and implement a prime implicate generation tool on top
of SPASS [29] as the next step of this work.
    An orthogonal concern that we are currently addressing is to relate the ob-
tained hypotheses with connection-minimality [14], a notion we recently intro-
duced to characterize solutions to the abduction problem in EL that we find
56      Fajar Haifani, Patrick Koopmann and Sophie Tourret

particularly interesting, because they connect the subsumers of C1 to the sub-
sumees of C2 in a less contrived way than other solutions. We believe that the
present technique is sound and complete with respect to finding connection-
minimal hypotheses, but that remains to be formally proved.
    Finally, as our approach works by generating ground prime implicates start-
ing from a set of ground literals, it should also be possible to use it for a mix
of ABox and TBox abduction, where our background knowledge consists of a
DL knowledge base containing both a TBox and ground facts, the observation
is another set of ground facts, and we are looking for hypotheses in the form of
a TBox. This would have applications for TBox learning from data.


References

 1. Alrabbaa, C., Baader, F., Borgwardt, S., Koopmann, P., Kovtunova, A.: Finding
    small proofs for description logic entailments: Theory and practice. In: Albert, E.,
    Kovacs, L. (eds.) LPAR-23: 23rd International Conference on Logic for Program-
    ming, Artificial Intelligence and Reasoning. EPiC Series in Computing, vol. 73, pp.
    32–67. EasyChair (2020). https://doi.org/10.29007/nhpp
 2. Alrabbaa, C., Baader, F., Borgwardt, S., Koopmann, P., Kovtunova, A.: Find-
    ing good proofs for description logic entailments using recursive quality measures.
    In: Platzer, A., Sutcliffe, G. (eds.) Automated Deduction - CADE 28 - 28th In-
    ternational Conference on Automated Deduction, Virtual Event, July 12-15, 2021,
    Proceedings. Lecture Notes in Computer Science, vol. 12699, pp. 291–308. Springer
    (2021). https://doi.org/10.1007/978-3-030-79876-5 17
 3. Baader, F., Horrocks, I., Lutz, C., Sattler, U.: An Introduction to Description
    Logic. Cambridge University Press (2017). https://doi.org/10.1017/9781139025355
 4. Baader, F., Suntisrivaraporn, B.: Debugging SNOMED CT using axiom pinpoint-
    ing in the description logic EL+ . In: Proc. of the 3rd Conference on Knowledge Rep-
    resentation in Medicine (KR-MED’08): Representing and Sharing Knowledge Us-
    ing SNOMED. CEUR-WS, vol. 410 (2008), http://ceur-ws.org/Vol-410/Paper01.
    pdf
 5. Bienvenu, M.: Complexity of abduction in the EL family of lightweight description
    logics. In: Proceedings of KR 2008. pp. 220–230. AAAI Press (2008), http://www.
    aaai.org/Library/KR/2008/kr08-022.php
 6. Calvanese, D., Ortiz, M., Simkus, M., Stefanoni, G.: Reasoning about explanations
    for negative query answers in DL-Lite. J. Artif. Intell. Res. 48, 635–669 (2013).
    https://doi.org/10.1613/jair.3870
 7. Ceylan, İ.İ., Lukasiewicz, T., Malizia, E., Molinaro, C., Vaicenavicius, A.: Explana-
    tions for negative query answers under existential rules. In: Calvanese, D., Erdem,
    E., Thielscher, M. (eds.) Proceedings of KR 2020. pp. 223–232. AAAI Press (2020).
    https://doi.org/10.24963/kr.2020/23
 8. Del-Pinto, W., Schmidt, R.A.: ABox abduction via forgetting in ALC. In: The
    Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019. pp. 2768–
    2775. AAAI Press (2019). https://doi.org/10.1609/aaai.v33i01.33012768
 9. Du, J., Qi, G., Shen, Y., Pan, J.Z.: Towards practical ABox abduction in large
    description logic ontologies. Int. J. Semantic Web Inf. Syst. 8(2), 1–33 (2012).
    https://doi.org/10.4018/jswis.2012040101
                                      Abduction in EL via Translation to FOL          57

10. Du, J., Wan, H., Ma, H.: Practical TBox abduction based on justification pat-
    terns. In: Proceedings of the Thirty-First AAAI Conference on Artificial In-
    telligence. pp. 1100–1106 (2017), http://aaai.org/ocs/index.php/AAAI/AAAI17/
    paper/view/14402
11. Du, J., Wang, K., Shen, Y.: A tractable approach to ABox abduction over descrip-
    tion logic ontologies. In: Brodley, C.E., Stone, P. (eds.) Proceedings of the Twenty-
    Eighth AAAI Conference on Artificial Intelligence. pp. 1034–1040. AAAI Press
    (2014), http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8191
12. Echenim, M., Peltier, N., Sellami, Y.: A generic framework for implicate gener-
    ation modulo theories. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) Au-
    tomated Reasoning - 9th International Joint Conference, IJCAR 2018, Held as
    Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17,
    2018, Proceedings. Lecture Notes in Computer Science, vol. 10900, pp. 279–294.
    Springer (2018). https://doi.org/10.1007/978-3-319-94205-6 19, https://doi.org/
    10.1007/978-3-319-94205-6 19
13. Elsenbroich, C., Kutz, O., Sattler, U.: A case for abductive reasoning over on-
    tologies. In: Proceedings of the OWLED’06 Workshop on OWL: Experiences and
    Directions (2006), http://ceur-ws.org/Vol-216/submission 25.pdf
14. Haifani, F., Koopmann, P., Tourret, S.: Introducing connection-minimal abduction
    for EL ontologies. XLoKR workshop (2021)
15. Halland, K., Britz, K.: ABox abduction in ALC using a DL tableau. In: 2012 South
    African Institute of Computer Scientists and Information Technologists Confer-
    ence, SAICSIT ’12. pp. 51–58 (2012). https://doi.org/10.1145/2389836.2389843
16. Horridge, M.: Justification Based Explanation in Ontologies. Ph.D. thesis, Uni-
    versity of Manchester, UK (2011), https://www.research.manchester.ac.uk/portal/
    files/54511395/FULL TEXT.PDF
17. Horridge, M., Parsia, B., Sattler, U.: Explanation of OWL entailments in protege 4.
    In: Bizer, C., Joshi, A. (eds.) Proceedings of the Poster and Demonstration Session
    at the 7th International Semantic Web Conference (ISWC2008), Karlsruhe, Ger-
    many, October 28, 2008. CEUR Workshop Proceedings, vol. 401. CEUR-WS.org
    (2008), http://ceur-ws.org/Vol-401/iswc2008pd submission 47.pdf
18. Kazakov, Y., Klinov, P., Stupnikov, A.: Towards reusable explanation services
    in protege. In: Artale, A., Glimm, B., Kontchakov, R. (eds.) Proceedings of the
    30th International Workshop on Description Logics, Montpellier, France, July 18-
    21, 2017. CEUR Workshop Proceedings, vol. 1879. CEUR-WS.org (2017), http:
    //ceur-ws.org/Vol-1879/paper31.pdf
19. Klarman, S., Endriss, U., Schlobach, S.: ABox abduction in the descrip-
    tion logic ALC. Journal of Automated Reasoning 46(1), 43–80 (2011).
    https://doi.org/10.1007/s10817-010-9168-z
20. Koopmann, P.: Signature-based abduction with fresh individuals and complex
    concepts for description logics. In: Zhou, Z. (ed.) Proceedings of the Thirti-
    eth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual
    Event / Montreal, Canada, 19-27 August 2021. pp. 1929–1935. ijcai.org (2021).
    https://doi.org/10.24963/ijcai.2021/266
21. Koopmann, P.: Two ways of explaining negative entailments in description logics
    using abduction. In: Explainable Logic-Based Knowledge Representation (XLoKR
    2021) (2021)
22. Koopmann, P., Del-Pinto, W., Tourret, S., Schmidt, R.A.: Signature-based abduc-
    tion for expressive description logics. In: Calvanese, D., Erdem, E., Thielscher, M.
58      Fajar Haifani, Patrick Koopmann and Sophie Tourret

    (eds.) Proceedings of the 17th International Conference on Principles of Knowl-
    edge Representation and Reasoning, KR 2020. pp. 592–602. AAAI Press (2020).
    https://doi.org/10.24963/kr.2020/59
23. Nabeshima, H., Iwanuma, K., Inoue, K., Ray, O.: SOLAR: an automated de-
    duction system for consequence finding. AI Commun. 23(2-3), 183–203 (2010).
    https://doi.org/10.3233/AIC-2010-0465, https://doi.org/10.3233/AIC-2010-0465
24. Parsia, B., Matentzoglu, N., Gonçalves, R.S., Glimm, B., Steigmiller, A.: The OWL
    reasoner evaluation (ORE) 2015 competition report. J. Autom. Reason. 59(4),
    455–482 (2017). https://doi.org/10.1007/s10817-017-9406-8
25. Pukancová, J., Homola, M.: Tableau-based ABox abduction for the ALCHO de-
    scription logic. In: Proceedings of the 30th International Workshop on Description
    Logics (2017), http://ceur-ws.org/Vol-1879/paper11.pdf
26. Pukancová, J., Homola, M.: The AAA ABox abduction solver. Künstliche Intell.
    34(4), 517–522 (2020). https://doi.org/10.1007/s13218-020-00685-4
27. Schlobach, S., Cornet, R.: Non-standard reasoning services for the debugging of
    description logic terminologies. In: Gottlob, G., Walsh, T. (eds.) Proc. of the 18th
    Int. Joint Conf. on Artificial Intelligence (IJCAI 2003). pp. 355–362. Morgan Kauf-
    mann, Acapulco, Mexico (2003), http://ijcai.org/Proceedings/03/Papers/053.pdf
28. Wei-Kleiner, F., Dragisic, Z., Lambrix, P.: Abduction framework for repairing in-
    complete EL ontologies: Complexity results and algorithms. In: Proceedings of
    the Twenty-Eighth AAAI Conference on Artificial Intelligence. pp. 1120–1127.
    AAAI Press (2014), http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/
    view/8239
29. Weidenbach, C., Schmidt, R.A., Hillenbrand, T., Rusev, R., Topic, D.: Sys-
    tem description: Spass Version 3.0. In: Pfenning, F. (ed.) Automated Deduc-
    tion - CADE-21, 21st International Conference on Automated Deduction, Bre-
    men, Germany, July 17-20, 2007, Proceedings. Lecture Notes in Computer Science,
    vol. 4603, pp. 514–520. Springer (2007). https://doi.org/10.1007/978-3-540-73595-
    3 38, https://doi.org/10.1007/978-3-540-73595-3 38

</pre>