The Concept Difference for EL-Terminologies
                            using Hypergraphs∗

                    Andreas Ecke                                  Michel Ludwig                      Dirk Walther
           Theoretical Computer Science               Theoretical Computer Science          Theoretical Computer Science
              TU Dresden, Germany                        TU Dresden, Germany                   TU Dresden, Germany
                  ecke@tcs.inf.tu-                        Center for Advancing                  Center for Advancing
                    dresden.de                            Electronics Dresden                   Electronics Dresden
                                                            michel@tcs.inf.tu-                     dirk@tcs.inf.tu-
                                                               dresden.de                            dresden.de

ABSTRACT                                                                  comes more challenging as well. For instance, the ontology
Ontologies are used to represent and share knowledge. Nu-                 SNOMED CT contains now definitions for about 400 000
merous ontologies have been developed so far, especially in               terms, and the ‘NCBI organismal classification’ ontology
knowledge intensive areas such as the biomedical domain.                  even for about 850 000 terms. In particular, the need to
As the size of ontologies increases, their continued devel-               have automated tool support for detecting and represent-
opment and maintenance is becoming more challenging as                    ing differences between versions of an ontology is growing in
well. Detecting and representing semantic differences be-                 importance for ontology engineering. Current support from
tween versions of ontologies is an important task for which               ontology editors, such as Protegé, SWOOP, OBO-Edit, and
automated tool support is needed. In this paper we investi-               OntoView, is mostly based on syntactic differences and does
gate the logical difference problem using a hypergraph rep-               not capture the semantic differences between ontologies. An
resentation of EL-terminologies. We focus solely on the con-              early detection of possibly unwanted semantic changes can
cept difference wrt. a signature. For computing this differ-              contribute to an error-resilient authoring process of ontolo-
ence it suffices to check the existence of simulations between            gies.
hypergraphs whereas previous approaches required a combi-
nation of different methods.                                              The aim of this paper is to propose and investigate the
                                                                          logical difference problem using a hypergraph representa-
1.    INTRODUCTION                                                        tion of ontologies. The logical difference problem was intro-
Ontologies are widely used to represent domain knowledge.                 duced in [7], where the logical difference is taken to be the
They contain specifications of objects, concepts and relation-            set of queries formulated in a vocabulary of interest, called
ships that are often formalised using a logic-based language              signature, that produce different answers when evaluated
over a vocabulary that is particular to an application do-                over ontologies that are to be compared. In this paper we
main. Ontology languages based on description logics [2]                  concentrate on ontologies expressed as terminologies in the
have been widely adopted, e.g., description logics are under-             lightweight description logic EL [1, 3] and on queries that
lying the Web Ontology Language (OWL) and its profiles.1                  are concept inclusions formulated in EL. Even though EL-
Numerous ontologies have already been developed, in par-                  terminologies merely serve as a starting point for this inves-
ticular, in knowledge intensive areas such as the biomedical              tigation, we can illustrate the elegance of the hypergraph-
domain.2 Ontologies constantly evolve, they are regularly                 based approach and the advantages over existing approaches
extended, corrected and refined. As the size of ontologies                to computing the logical difference. The relevance of EL is
increases, their continued development and maintenance be-                emphasised by the fact that many ontologies are largely for-
                                                                          mulated in EL, notable examples being SNOMED CT and
∗We thank the reviewers of the workshop DChanges 2013                     NCI.
for their comments. The authors acknowledge the support
of the German Research Foundation (DFG), Andreas Ecke                     An EL-terminology can easily be translated into a directed
within GRK 1763 (QuantLA), and Michel Ludwig and Dirk                     hypergraph by taking the signature symbols as nodes and
Walther within the Resilience and Bio Path of the Cluster
of Excellence ‘Center for Advancing Electronics Dresden’.                 treating the axioms as hyperedges. For instance, the axiom
1
  http://www.w3.org/TR/owl2-overview/                                     A v ∃r.B is translated into the hyperedge ({xA }, {xr , xB }),
2
  http://bioportal.bioontology.org                                        and the axiom A ≡ B1 u B2 into the three hyperedges
                                                                          ({xA }, {xB1 }), ({xA }, {xB2 }) and ({xB1 , xB2 }, {xA }), where
                                                                          each node xY corresponds to the signature symbol Y , respec-
                                                                          tively. A feature of the translation of axioms into hyperedges
                                                                          is that all information about the axiom and the logical oper-
                                                                          ators in it is preserved. We can actually treat the ontology
This work is licensed under the Creative Commons Attribution-             and its hypergraph interchangeably. The existence of cer-
ShareAlike 3.0 Unported License (CC BY-SA 3.0). To view a copy            tain simulations between hypergraphs characterises the fact
of the license, visit http://creativecommons.org/licenses/by-sa/3.0/.     that the corresponding terminologies are logically equivalent
DChanges 2013, September 10th, 2013, Florence, Italy.                     and, thus, no logical difference exists. If no simulation ex-
ceur-ws.org Volume 1008, http://ceur-ws.org/Vol-1008/paper3.pdf
ists, we can directly extract the axioms responsible for the      C I 6= ∅, C I ⊆ DI , or C I = DI . We write I |= α if I
concept inclusion that witnesses the logical difference from      satisfies the axiom α. An interpretation I satisfies a TBox T
the hypergraph.                                                   if I satisfies all axioms in T ; in this case, we say that I is
                                                                  a model of T . An axiom α follows from a TBox T , written
The main advantages of the hypergraph-based approach to           T |= α, if for all models I of T , we have that I |= α.
logical difference are: (i) an elegant algorithm for detecting    Checking that T |= α can be done in polynomial time in the
the existence of concept differences (solely involving check-     size of T and α [1, 3].
ing for simulations in hypergraphs), even for large or cyclic
terminologies; (ii) a straightforward way to construct con-       A signature Σ is a finite set of symbols from NC and NR . The
cept inclusions that witness the logical difference between       signature sig(C), sig(α) or sig(T ) of the concept C, axiom α
two terminologies, even for cyclic terminologies; and (iii) a     or TBox T is the set of concept and role names occurring in
simple computation of explanations, i.e., sets of axioms that     C, α or T , respectively. An ELΣ -concept C is an EL-concept
entail such concept inclusions. Currently, the algorithms im-     such that sig(C) ⊆ Σ.
plemented for detecting the logical difference work for large
but acyclic terminologies such as SNOMED CT [5–7]. The            Two TBoxes T and T 0 are logically equivalent wrt. a sig-
algorithm in [6] can also handle “small” cyclic terminologies,    nature Σ, written T ≡Σ T 0 , if for all EL-axioms α with
but the concept inclusions witnessing a difference cannot         sig(α) ⊆ Σ: T |= α iff T 0 |= α. In other words, two TBoxes
easily be constructed using that algorithm.                       are logically equivalent wrt. a signature if the same axioms
                                                                  formulated in the signature follow from them. In this case,
The paper is organised as follows. We start by reviewing          the TBoxes are also said to be Σ-inseparable. Conserva-
some notions regarding the description logic EL, the logical      tive extensions are a special case of logical equivalence: for
difference problem, and ontology hypergraphs. In Section 3,       T ⊆ T 0 and Σ = sig(T ), T 0 is a conservative extension of
we introduce two simulation notions, a forward and a back-        T wrt. Σ iff T ≡Σ T 0 . Deciding the logical equivalence of
ward simulation, one for each type of concept inclusion that      EL-TBoxes wrt. a signature is ExpTime-complete [9].
may witness the logical difference between two terminolo-
gies. In each case we show that the existence of a simula-        To be able to better deal with complex concepts in a TBox,
tion between two terminologies corresponds to the absence         we assume that there are no nested existential restrictions.
of difference witnesses. We analyse the computational com-        We say that a TBox T is flattened if all conjunctions C u D
plexity of checking for simulations, and we sketch how to         and existential restrictions ∃r.E in T are such that C, D are
construct counter-examples. In Section 4, we discuss previ-       concept names or conjunctions, and E is a concept name.
ous approaches to computing the logical difference in [5] and     We ignore the nesting of binary conjunctions and treat them
explain the advantages of the hypergraph-based approach           as n-ary conjunctions of n concept names, where n ≥ 2.
introduced in this paper. Finally we conclude the paper.          The axioms of a flattened TBox are of the form X ./ Y ,
                                                                  where X, Y ∈ {>} ∪ {B1 u · · · u Bn | n > 0, Bi ∈ NC } ∪
2.    PRELIMINARIES                                               {∃r.A | r ∈ NR , A ∈ NC } and ./ ∈ {v, ≡}. Any EL-TBox
We start by briefly reviewing the lightweight description         can be flattened by appropriately replacing nested complex
logic EL and some notions related to the logical difference,      concepts C by fresh concept names XC and adding concept
together with some basic results.                                 equations XC ≡ C to the TBox that define the new symbols.
                                                                  It can be readily seen that this transformation is tractable
                                                                  and that it does not change the meaning of the original
2.1 The Logic EL                                                  TBox. The following lemma makes this precise.
Let NC and NR be mutually disjoint sets of concept names
and role names. We assume these sets to be countably infi-
nite. We typically use A, B to denote concept names and r            Lemma 1. For every EL-TBox T , there is a flattened EL-
to denote role names. The set of EL-concepts C is defined         TBox T ’ of polynomial size in the size of T such that T ≡Σ
inductively as:                                                   T 0 with Σ = sig(T ).

     • > and all concept names in NC are EL-concepts,
                                                                  For the remainder of the paper we assume that TBoxes are
     • if C, D are EL-concepts, then C u D and ∃r.C are EL-       flattened.
       concepts, where r ∈ NR .
                                                                  2.2    Terminologies in Normal Form
An EL-TBox T is a finite set of axioms, where an axiom            An important motivating feature of EL is that it exhibits
can be a concept inclusion C v D, or a concept equation           a low complexity for standard reasoning tasks. However,
C ≡ D, where C, D range over EL-concepts.                         as we have seen above, deciding the logical equivalence of
                                                                  EL-TBoxes wrt. a signature already requires exponential
The semantics of EL is defined using interpretations I =          time.3 To gain tractability for deciding the logical equiva-
(∆I , ·I ), where the domain ∆I is a non-empty set, and ·I is a   lence, TBoxes are restricted to a particular form as in [5, 7].
function mapping each concept name A to a subset AI of ∆I
and every role name r to a binary relation rI over ∆I . The
extension C I of a concept C is defined inductively as follows:       Definition 1. An EL-TBox T is called an EL-terminology
>I := ∆I , (C u D)I := C I ∩ DI and (∃r.C)I := {x ∈ ∆I |          if it satisfies the following conditions:
∃y ∈ C I : (x, y) ∈ rI }. An interpretation I satisfies a         3
                                                                    Note that it is tractable to check the logical equivalence of
concept C, an axiom C v D or C ≡ D if, respectively,              two EL-TBoxes without restricting the signature [1, 3].
   • all concept inclusions and equations in T are of the           As the set Diff Σ (T1 , T2 ) is infinite in general, we make use
     form A v C, A ≡ C, where A is a concept name, and              of the following “primitive witnesses” theorem from [5] that
                                                                    states that we only have to consider two specific types of
   • no concept name A occurs more than once on the left-           concept differences.
     hand side of an axiom in T .

                                                                       Theorem 1 (Primitive witnesses). Let T1 and T2 be
The restriction to EL-terminologies yields that deciding log-       EL-terminologies and Σ a signature. If α ∈ Diff Σ (T1 , T2 ),
ical equivalence wrt. a signature becomes tractable [5, 7].         then either C v A or A v D is a member of Diff Σ (T1 , T2 ),
                                                                    where A ∈ sig(α) is a concept name and C, D are EL-
Definitions in terminologies can be cyclic, which may cause         concepts occurring in α.
difficulties for reasoning algorithms. A terminology is cyclic
if a concept name refers to itself along concept inclusions
and equations. To be precise, for a terminology T , let ≺T          We define cWtnlhs    Σ (T1 , T2 ) as the set of all concept names
be a binary relation over NC such that A ≺T B if there is           A from Σ such that there exists an ELΣ -concept C with
an axiom of the form A v C or A ≡ C in T such that                  A v C ∈ Diff Σ (T1 , T2 ). Similarly, cWtnrhs     Σ (T1 , T2 ) is the set
B ∈ sig(C). A terminology T is acyclic if the transitive            of all concept names A ∈ Σ such that there exists an ELΣ -
closure of ≺T is irreflexive; otherwise T is cyclic. An acyclic     concept C with C v A ∈ Diff Σ (T1 , T2 ). The concept names
terminology can be unfolded (i.e. the process of substituting       in cWtnlhsΣ (T1 , T2 ) are called left-hand side witnesses and the
concept names by their definitions stops).                          concept names in cWtnrhs      Σ (T1 , T2 ) right-hand side witnesses.
                                                                    Note that these sets are subsets of Σ, and by Theorem 1
In this paper we do not restrict terminologies to be acyclic.       their union is a finite and succinct representation of the set
However, we have to take care of certain cycles. In our             Diff Σ (T1 , T2 ), which is typically infinite.
approach we want all conjunctions to be unfolded. That is,
for any conjunction A1 u · · · u Am in T , we substitute any        Checking for the concept difference between two terminolo-
Ai with B1 u · · · u Bn if Ai ≡ B1 u · · · u Bn ∈ T . To this end   gies equals checking for the existence of left- and right-hand
we need to handle the cycles along such concept equations.          side witnesses. As a corollary of Theorem 1, we have that:
Formally, a terminology T has unfoldable conjunctions if it         Diff Σ (T1 ,T2 ) = ∅iff cWtnlhs                     rhs
                                                                                                Σ (T1 ,T2 ) = ∅ and cWtnΣ (T1 ,T2 ) = ∅.
does not contain any concept equations A1 ≡ F1 , . . . , An ≡
Fn , where F1 , . . . , Fn are conjunctions of concept names such   2.4      Ontology Hypergraphs
that Ai+1 ∈ sig(Fi ) for every 1 ≤ i < n, and A1 ∈ sig(Fn ).        Hypergraphs are a generalisation of graphs with many ap-
Any terminology can be rewritten such that it has unfoldable        plications in computer science and discrete mathematics.
conjunctions without changing the logical consequences (cf.         In knowledge representation hypergraphs have been used
proof of Lemma 1 in [5]). We say that a concept name A is           implicitly to define reachability-based modules of ontolo-
conjunctive in T iff there exist concept names B1 , . . . , Bn ,    gies [11], and explicitly to define locality-based modules [10].
n > 0, such that A ≡ B1 u . . . u Bn ∈ T ; otherwise A is said      In this paper we also make the notion of a hypergraph ex-
to be non-conjunctive in T . Note that after the unfolding          plicit by transforming terminologies into hypergraphs in or-
of conjunctions (and removing of cycles) in a terminology T         der to be able to define simulations on the graphs.
no concept name that appears as a conjunct is defined as a
conjunction in T .                                                  A directed hypergraph is a tuple G = (V, E), where V is a non-
                                                                    empty set of nodes (or vertices), and E is a set of directed
To simplify the presentation we assume that terminologies           hyperedges of the form e = (S, S 0 ), where S, S 0 ⊆ V. We use
do not contain trivial axioms of the form A ≡ > or A ≡ B,           hypergraphs to represent terminologies as follows.
where A and B are concept names.

An EL-terminology T is normalised if it consists of EL-               Definition 3. For a normalised terminology T and a sig-
concept inclusions and equations of the following forms:            nature Σ, the ontology hypergraph GTΣ of T for Σ is a directed
                                                                    hypergraph GTΣ = (V, E) defined as follows:
   • A ≡ ∃r.B, A ≡ ∃r.>, A ≡ B1 u . . . u Bm , and                   V = { xA | A ∈ NC ∩ (Σ ∪ sig(T )) }
   • A v ∃r.B, A v ∃r.>, A v B1 u . . . u Bn ,                            ∪ { xr | r ∈ NR ∩ (Σ ∪ sig(T )) }
                                                                          ∪ {x> }
where m ≥ 2, n ≥ 1, and A, B, Bi are concept names such
that every conjunct Bi is non-conjunctive in T .                    and

                                                                     E = { ({xA }, {xBi }) | A v B1 u . . . u Bn ∈ T , 1 ≤ i ≤ n }
2.3    Logical Difference                                                 ∪ { ({xA }, {xBi }) | A ≡ B1 u . . . u Bn v T , 1 ≤ i ≤ n }
The logical difference between two TBoxes witnessed by con-
cept inclusions over a signature Σ is defined as follows.                 ∪ { ({xA }, {xr , xY }) | A v ∃r.Y ∈ T , Y ∈ NC ∪ {>} }
                                                                          ∪ { ({xA }, {xr , xY }) | A ≡ ∃r.Y ∈ T , Y ∈ NC ∪ {>} }
   Definition 2. The Σ-concept difference between two EL-                 ∪ { ({xr , xY }, {xA }) | A ≡ ∃r.Y ∈ T , Y ∈ NC ∪ {>} }
TBoxes T1 and T2 for a signature Σ is the set Diff Σ (T1 , T2 )           ∪ { ({xB1 , . . . , xBn }, {xA }) | A ≡ B1 u . . . u Bn ∈ T }
of all EL-concept inclusions α such that sig(α) ⊆ Σ, T1 |= α,
and T2 6|= α.
An ontology hypergraph GTΣ contains a node for > and for            (vi) {x} →T {z} if {x} →T {yi } for all i ∈ {1, . . . , n},
every role and concept name in Σ or T . Hyperedges in                    ({y1 , . . . , yn }, {z}) ∈ E.
GTΣ represent axioms in T . Every hyperedge is directed
and can be understood as an implication, i.e., ({xA }, {xB })
represents T |= A v B. The complex hyperedges are of                Note that the relation →T associates nodes xA that repre-
the form ({xA }, {xr , xB }) and ({xr , xB }, {xA }) represent-     sent concept names A either with nodes xB that stand for
ing T |= A v ∃r.B and T |= ∃r.B v A, and of the form                concept names B or with pairs of nodes {xr , z} representing
({xB1 , ..., xBn }, {xA }) standing for T |= B1 u . . . u Bn v A.   concepts of the form ∃r.A or ∃r.>. The binary relation →T
Note that due to the normalisation of T , conjunctions al-          is reflexive and transitive on single nodes by Conditions (i)
ways have more than one conjunct (i.e. n ≥ 2).                      and (ii). Moreover, in Condition (vi) transitivity of →T is
                                                                    extended to hyperedges with complex left-hand sides, repre-
                                                                    senting axioms of the form A ≡ B1 u. . .uBn . The other con-
   Example 1. Let T = {A ≡ B1 uB2 uB3 , B3 v ∃r.B4 , B4 v           ditions handle pairs of nodes. Condition (iii) states that any
B1 } and Σ = {B5 }. Then the ontology hypergraph GTΣ of T           indirectly reachable pair {xr , z} via an intermediate node
for Σ can be depicted as follows:                                   is also directly reachable via →T , while Condition (iv) en-
                                                                    sures the same property for indirectly reachable nodes via
                                                                    intermediate pairs. Condition (v) is a special case of (iv)
        xB5    x>                                                   for handling pairs involving > as ontology hypergraphs for
                         xB1                                        normalised terminologies T do not contain hyperedges from
         xB4                                                        nodes xA representing concept names to x> representing >
                        xB2                                         (T does not contain any axioms of the form A v > or
                                               xA                   A ≡ >).
          xr
                          xB3                                       It can be readily seen that the relation →T can be computed
                                                                    in polynomial time.

                                                                    We emphasise here that the relation →T does not coincide
3.   LOGICAL DIFFERENCE USING                                       with the usual reachability notion in a hypergraph. The
     HYPERGRAPHS                                                    following example shows that →T connects reachable nodes
Our approach for detecting logical differences wrt. Σ is based      but not all reachable nodes are connected via →T . This
on finding appropriate simulations between the hypergraphs          means that the usual reachability relation does not correctly
GTΣ1 and GTΣ2 such that every node xA in GTΣ1 with A ∈ Σ is         mimic logical consequences entailed by T .
simulated by the node xA in GTΣ2 . It is well known that the
existence of a simulation between two graph structures can
be used to characterise some notion of equivalence between            Example 2. Let T = {A v ∃r.B 0 , ∃r.B 0 v B, ∃r.B v
                                                                     0
the graphs [4], for example reachability. In this paper we aim      A }. It holds that {xA } →T {xB }, i.e. T |= A v B, and
to capture logical entailment wrt. a signature by defining the      the node xB is reachable from xA (in terms of standard
simulation relations appropriately.                                 graph reachability). However, xA0 is also reachable from xA
                                                                    whereas {xA } 6→T {xA0 } and T 6|= A v A0 .
We first introduce an auxiliary relation →T over the nodes
of the ontology hypergraph GTΣ of the terminology T . The
relation →T is a special reachability notion in GTΣ that mim-       The notion of reachability induced by the relation →T can
ics reasoning wrt. T . The definition of →T is related to           be characterised in terms of entailment.
the completion algorithm for classification in EL [1] and
OWL 2 QL [8]. Afterwards we define two types of sim-                   Lemma 2. Let GTΣ = (V, E) be the ontology hypergraph
ulations between the hypergraphs of two terminologies T1            of a normalised terminology T for a signature Σ. Then we
and T2 , one type of simulation for each type of witness.           have for every A, B, r ∈ Σ ∪ sig(T ):


   Definition 4. Let GTΣ = (V, E) be the ontology hypergraph         (i) T |= A v B iff {xA } →T {xB };
of a normalised terminology T for a signature Σ. The re-
                                                                     (ii) T |= A v ∃r.B iff {xA } →T {xr , xB 0 } and {xB 0 } →T
lation →T ⊆ V(1) × V(2) is inductively defined as follows,
                                                                          {xB } for some B 0 ∈ Σ ∪ sig(T );
where V(k) = { S ⊆ V | 0 < |S| ≤ k }:
                                                                    (iii) T |= A v ∃r.> iff {xA } →T {xr , xY } for some Y ∈
                                                                          Σ ∪ sig(T ) ∪ {>}.
  (i) {x} →T {x} for every x ∈ V;

 (ii) {x} →T {z} if {x} →T {y}, ({y}, {z}) ∈ E;                     As described above, we want to check for every concept
                                                                    name A ∈ Σ whether A belongs to cWtnlhs       Σ (T1 , T2 ) or to
(iii) {x} →T {xr , z} if {x} →T {y}, ({y}, {xr , z}) ∈ E;
                                                                    cWtnrhs
                                                                          Σ (T1 , T2 ). For the former, we check for the existence
(iv) {x} →T {z} if {x} →T {xr , y}, {y} →T {y 0 }, and              of a forward simulation, and for the latter, for the existence
     ({xr , y 0 }, {z}) ∈ E;                                        of a backward simulation between the ontology hypergraphs
                                                                    GTΣ1 and GTΣ2 . We define the simulations in the following
 (v) {x} →T {z} if {x} →T {xr , y}, ({xr , x> }, {z}) ∈ E;          subsections.
3.1      Forward Simulation                                              We now prove that the existence of a forward simulation
Based on the relation →T we can now give the definition of               between a node xA1 in GT1 and a node xA2 in GT2 exactly
the forward simulation, which connects nodes in GTΣ1 with                captures the property that T1 |= A1 v C entails that T2 |=
nodes in GTΣ2 that are reachable via →T1 and →T2 , respec-               A2 v C for every Σ-concept C.
tively.
                                                                            Lemma 3. Let T1 , T2 be normalised terminologies, and
                                                                         let Σ be a signature such that GT1 ,→fΣ GT2 . Then for ev-
   Definition 5.    Let GTΣ1 = (V1 , E1 ), GTΣ2 = (V2 , E2 ) be ontol-
ogy hypergraphs of two normalised terminologies T1 and T2                ery ELΣ -concept C and for every (xA1 , xA2 ) ∈ ,→fΣ with
for a signature Σ. A relation ,→fΣ ⊆ V1 × V2 is a forward Σ-             T1 |= A1 v C it holds that T2 |= A2 v C.
simulation between GTΣ1 and GTΣ2 if the following conditions
hold:                                                                       Lemma 4. Let T1 , T2 be normalised terminologies, and
                                                                         let Σ be a signature such that cWtnlhs
                                                                                                             Σ (T1 , T2 ) = ∅. Then
(if )   if xA ,→fΣ xA0 , then for every B ∈ Σ with {xA } →T1             GT1 ,→fΣ GT2 .
        {xB } it holds that {xA0 } →T2 {xB };
(iif ) if xA ,→fΣ xA0 , then for every r ∈ Σ such that {xA } →T1
       {xr , xX } there is a xX 0 ∈ V2 such that {xA0 } →T2              We obtain Theorem 2 as a consequence of the previous two
                                                                         lemmas.
       {xr , xX 0 } and xX ,→fΣ xX 0 .

We write GTΣ1 ,→fΣ GTΣ2 iff there exists a forward Σ-simulation             Theorem 2. Let T1 , T2 be normalised terminologies, and
                                                                                                                                    f
,→fΣ ⊆ V1 × V2 such that (xA , xA ) ∈ ,→fΣ for every A ∈ Σ.              let Σ be a signature. Then cWtnlhs
                                                                                                         Σ (T1 , T2 ) = ∅ iff GT1 ,→Σ
                                                                         GT2 .

For a node xA in GTΣ1 to be forward simulated by xA0 in GTΣ2 ,
Condition (if ) enforces that every Σ-concept name B that                3.2    Backward Simulation
is entailed by A in T1 must also be entailed by A0 in T2 .               We now turn to right-hand side witnesses, i.e. we want to
Condition (iif ) ensures a similar requirement for concepts              devise an algorithm that checks whether cWtnrhs  Σ (T1 , T2 ) = ∅.
of the form ∃r.X with X ∈ sig(T1 ) ∪ {>} such that T1 |=                 Analogously as for the left-hand side witnesses, we introduce
A v ∃r.X while propagating the simulation to the successor               a backward simulation which has the property that a node
node xX .                                                                xA1 in GTΣ1 is simulated by a node xA2 in GTΣ2 iff T1 |= C v A1
                                                                         entails T2 |= C v A2 for every Σ-concept C. Intuitively, the
                                                                         hypergraph has to be traversed backwards to identify all
  Example 3. Let T1 = {A v ∃r.A}, T2 = {A v ∃r.X, X v                    essential concepts C for which T1 |= C v A1 . In particular,
A u Y, Y v ∃r.X}, and Σ = {A, r}. Then one can see                       concept names A1 for which there does not exist an ELΣ -
that Diff Σ (T1 , T2 ) = ∅. Furthermore, wrt. GTΣ1 it only holds         concept C with T1 |= C v A1 do not have to be simulated by
that {xA } →T1 {xA }, {xA } →T1 {xr , xA }. Regarding GTΣ2 ,             a node in GTΣ2 since such concept names cannot become right-
we have {xA } →T2 {xA }, {xA } →T2 {xr , xX }, {xX } →T2                 hand side witnesses. We identify such concept names A1 by
{xA }, {xX } →T2 {xr , xX }. Hence, one can see that S =                 checking whether the node xA1 is Σ-entailed in the following
{(xA , xA ), (xA , xX )} is a forward Σ-simulation between GTΣ1          sense.
and GTΣ2 with (xA , xA ) ∈ S. A graphical representation of
the ontology hypergraphs GTΣ1 , GTΣ2 and of the simulation S                Definition 6. Let GTΣ = (V, E) be the ontology hypergraph
can be found below.                                                      of a normalised terminology T for a signature Σ. Moreover,
                                                                         let VΣ ⊆ V be the smallest set of nodes defined inductively
                                                                         as follows:
                                      xY
             GTΣ1                                          GTΣ2
                                                                           (i) x> ∈ VΣ ;
                                      xX
               xr                                     xr                  (ii) if xA ∈ V such that there exists B ∈ Σ with {xB } →T
                                                                               {xA }, then xA ∈ VΣ ;

                                                                         (iii) if xB ∈ VΣ with B ∈ NC ∪ {>}, ({xB , xr }, {xA }) ∈ E,
        x>          xA                           xA          x>                and r ∈ Σ, then xA ∈ VΣ ;

   Example 4. Let T1 = {A v ∃r.X, X v AuB}, T2 = {A v                    (iv) if xB1 , . . . , xBn ∈ VΣ with ({xB1 , . . . , xBn }, {xA }) ∈ E,
X u Y, X v ∃r.A, Y v ∃r.B}, and Σ = {A, B, r}. Then,                          then xA ∈ VΣ .
for instance, A v ∃r.(A u B) ∈ Diff Σ (T1 , T2 ). It holds that          We then say that a node x ∈ V is Σ-entailed in GTΣ iff x ∈ VΣ .
{xA } →T1 {xr , xX }, {xX } →T1 {xA }, {xX } →T1 {xB },
{xA } →T2 {xr , xA }, {xA } →T2 {xr , xB }. However, for
x = xA or x = xB it does not hold that {x} →T2 {xA }                     The node x> is always Σ-entailed for every signature Σ. A
and {x} →T2 {xB }, i.e. the node xX in GTΣ1 cannot be sim-               node x is Σ-entailed if it is reachable via →T from a node xB
ulated by xA or xB in GTΣ2 as Condition (if ) cannot be sat-             with B ∈ Σ, or if its direct predecessors in the ontology
isfied. Thus, one can see that there cannot exist a forward              hypergraph are Σ-entailed. In particular, every node xA
Σ-simulation S between GTΣ1 and GTΣ2 with (xA , xA ) ∈ S.                with A ∈ Σ is Σ-entailed.
   Example 5. Let T = {A ≡ ∃r.X, X ≡ B1 uB2 }. For Σ1 =              In the following, we write GTΣ1 ,→bΣ GTΣ2 iff there exists a
{B1 , B2 , r}, all the nodes are Σ1 -entailed in GTΣ1 . However,     backward Σ-simulation ,→bΣ ⊆ V1 × V2 with (xA , xA ) ∈ ,→bΣ
for Σ2 = {B1 , B2 } only the nodes xB1 , xB2 xX , and x> are         for every A ∈ Σ.
Σ2 -entailed in GTΣ2 , whereas for Σ3 = {B1 , r} only the node
x> is Σ3 -entailed in GTΣ3 . Note that T |= C v A holds for          For a node xA in GTΣ1 to be backward simulated by xA0
C = ∃r.(B1 u B2 ) and sig(C) ⊆ Σ1 but sig(C) 6⊆ Σ2 and               in GTΣ2 , Conditions (ib ) and (iib ) are the equivalent of the
sig(C) 6⊆ Σ3 .                                                       Conditions (if ) and (iif ), respectively, for forward simu-
                                                                     lations. Condition (iiib ) handles axioms of the form A ≡
   Lemma 5. Let GTΣ = (V, E) be the ontology hypergraph              B1 u. . .uBn in T1 . Note that we quantify over the conjuncts
of a normalised terminology T for a signature Σ, and let             of A0 in T2 since, intuitively speaking, fewer conjuncts suffice
xA ∈ V. Then the node xA is Σ-entailed in GTΣ iff there              to preserve logical entailments. Take, for instance, the two
exists an ELΣ -concept C such that T |= C v A.                       normalised terminologies T1 = {A ≡ B1 u B2 }, T2 = {A v
                                                                     B1 u B2 , B1 v A} and the signature Σ = {A, B1 , B2 }; then
To compute all the nodes in a given graph GT that are Σ-             cWtnrhs
                                                                           Σ (T1 , T2 ) = ∅ and, in particular, T2 |= B1 u B2 v A
entailed, we can proceed as follows. In a first step identify        holds as well.
all the nodes x that fulfill conditions (i) and (ii) by using the
relation →T . Subsequently, propagate the Σ-entailed status
to other nodes using conditions (iii) and (iv). It can be               Example 6. Let T1 = {A ≡ ∃r.X, X ≡ B1 u B2 }, T2 =
readily seen that these computation steps can be performed           {A ≡ XuY, X ≡ ∃r.B1 , Y ≡ ∃r.B2 }, and Σ = {A, B1 , B2 , r}.
in polynomial time.                                                  First we observe that the nodes xB1 , xB2 , xX , and xA are
                                                                     Σ-entailed in GTΣ1 . As only {xBi } →T1 {xBi } for i ∈ {1, 2},
Before we can give the definition of the backward simula-            one can see that the node xBi in GTΣ1 can be simulated by the
tion, we have to introduce the following notion: we associate        node xBi in GTΣ2 for i ∈ {1, 2}. Due to non-conjT2 (xBi ) =
with every node xA in a hypergraph GT a set of concept               {xBi } for i ∈ {1, 2} and non-conjT1 (xX ) = {xB1 , xB2 }, we
names non-conj(xA ) which are “essential” to entail A in T           can infer that the node xX in GTΣ1 can be simulated both
(also see [5] for a similar notion).                                 by xB1 and xB2 in GTΣ2 (there does not exist X 0 ∈ Σ with
                                                                     {xX 0 } →T1 {xX }). Finally, as non-conjT2 (xA ) = {xX , xY },
  Definition 7. Let GTΣ = (V, E) be an ontology hypergraph.          we conclude that the node xA in GTΣ1 can be simulated by xA
For xA ∈ V, let non-conj(xA ) be defined as follows                  in GTΣ2 due to Condition (iib ) (Condition (ib ) is trivially sat-
                                                                     isfied). Overall,
    • if ({xB1 , . . . , xBn }, {xA }) ∈ E, let                      S = {(xA , xA ), (xX , xB1 ), (xX , xB2 ), (xB1 , xB1 ), (xB2 , xB2 )}
                  non-conjT (xA ) = {xB1 , . . . , xBn };
                                                                     is a backward Σ-simulation between GTΣ1 and GTΣ2 such that
    • otherwise, let non-conjT (xA ) = {xA }.                        (Z, Z) ∈ S for every Z ∈ NC ∩ Σ. A graphical representation
                                                                     of the ontology hypergraphs GTΣ1 , GTΣ2 and of the simulation S
                                                                     can be found below.
For a graph GTΣ = (V, E) we have ({xB1 , . . . , xBn }, {xA }) ∈ E
iff A ≡ B1 u . . . u Bn ∈ T . Hence, it holds for every ELΣ -                                                           xA
concept C that T |= C v A iff T |= C v X for every                                  xA
X ∈ { X | xX ∈ non-conjT (xA ) }.                                       GTΣ1                                                       GTΣ2

                                                                                            xX
We can now give the definition of a backward simulation.
                                                                               xr                                 xY         xX
  Definition 8.    Let GTΣ1
                        = (V1 , E1 ),       GTΣ2
                                        = (V2 , E2 ) be the             x>                                                              x>
ontology hypergraphs of the normalised terminologies T1
and T2 for a signature Σ. A relation ,→bΣ ⊆ V1 × V2 is a                            xB1    xB2              xB2        xr         xB1
backward Σ-simulation between GTΣ1 and GTΣ2 if the following
conditions hold:
                                                                         Example 7. Let T1 = {A ≡ B1 u B2 }, T2 = {A ≡ B1 u
                                                                     B 0 }, and Σ = {A, B1 , B2 }. First we observe that there
 (ib )   if xA ,→bΣ xA0 , then for every B ∈ Σ with {xB } →T1
                                                                     does not exist a concept name Z ∈ Σ with {xZ } →T2
         {xA } it holds that {xB } →T2 {x0A };
                                                                     {xB 0 }, i.e. the nodes xB1 , xB2 in GTΣ1 cannot be simulated
(iib ) if xA ,→bΣ xA0 and ({xX , xr }, {xA }) ∈ E1 such that         by xB 0 in GTΣ2 as Condition (ib ) would be violated. Hence,
       r ∈ Σ and xX is Σ-entailed in GTΣ1 , then for every xBi0 ∈    as non-conjT1 (xA ) = {xB1 , xB2 } and as non-conjT2 (xA ) =
       non-conjT2 (xA0 ) there exists ({xXi0 , xr }, {xBi0 }) ∈ E2   {xB1 , xB 0 }, we can conclude that there cannot exist a back-
       such that xX ,→bΣ xXi0 ;                                      ward Σ-simulation such that xA in GTΣ1 is simulated by xA
                                                                     in GTΣ2 as Condition (iiib ) cannot be fulfilled.
(iiib ) if xA ,→bΣ xA0 and ({xB1 , . . . xBn }, {xA }) ∈ E1 where
        xBi are Σ-entailed in GTΣ1 for every 1 ≤ i ≤ n, then
        for every x0 ∈ non-conjT2 (xA ) there exists an x ∈          We can now establish the correctness and completeness prop-
        non-conjT1 (xA ) with x ,→bΣ x0 .                            erties regarding backward simulations.
   Lemma 6. Let T1 , T2 be normalised terminologies, and               for users to have a concrete concept inclusion C v A or
let Σ be a signature such that GT1 ,→bΣ GT2 . Then for ev-             A v D in Diff Σ (T1 , T2 ) that corresponds to a witness A. We
ery ELΣ -concept C and for every (xA1 , xA2 ) ∈ ,→bΣ with              now sketch how to read such concept inclusions directly off
T1 |= C v A1 it holds that T2 |= C v A2 .                              a hypergraph using Example 7.

                                                                       Recall that xB1 , xB2 in GTΣ1 cannot be simulated by xB 0
   Lemma 7. Let T1 , T2 be normalised terminologies, and               in GTΣ2 as T2 6|= B1 v B 0 and T2 6|= B2 v B 0 , i.e. for the
let Σ be a signature such that cWtnrhs
                                   Σ (T1 , T2 ) = ∅. Then              Σ-concept C = B1 u B2 it holds that T1 |= C v B1 u B2 ,
GT1 ,→bΣ GT2 .                                                         but T2 6|= C v B1 u B 0 . Hence, we have T1 |= C v A but
                                                                       T2 6|= C v A.

We obtain Theorem 3 as a consequence of the previous two               In general, if a node xA in GTΣ1 cannot be simulated by xA
lemmas.                                                                in GTΣ2 , there exists a node x in GTΣ2 which is the main cause
                                                                       for the failure to find a simulation (x = xB 0 in the example
   Theorem 3. Let T1 , T2 be normalised terminologies, and             above). By following the path from that node to the node xA
let Σ be a signature with A ∈ Σ. Then cWtnrhs
                                            Σ (T1 , T2 ) = ∅           in GTΣ2 and by constructing conjunctions over all the failing
iff GT1 ,→bΣ GT2 .                                                     possibilities to fulfill the simulation conditions (B1 u B2 in
                                                                       the example above) one can construct an example inclusion
3.3    Computational Complexity                                        C v A (or A v C) that matches the difference witness A.
Given two hypergraphs GTΣ1 = (V1 , E1 ) and GTΣ2 = (V2 , E2 ),         The correctness of the algorithm described above can be seen
one can proceed as follows to check whether GTΣ1 ,→fΣ GTΣ2             by using Lemma 2. It is known that such concepts C can be
holds. First, let S0f ⊆ V1 × V2 be the set of all the pairs            of exponential size [5], and consequently, we cannot hope to
that fulfill Conditions (if ). Subsequently, iterate over the          devise an algorithm that is guaranteed to run in polynomial
                                                                       time.
elements contained in the set Sif and remove those pairs
                                                           f
which do not satisfy Conditions (iif ) to obtain the set Si+1 .
                           f     f                                     4.   COMPARISON OF APPROACHES
Eventually we will have Sj = Sj+1 for some index j and one
                                                                       We now compare the hypergraph-based approach with the
can conclude that GTΣ1 ,→fΣ GTΣ2 holds iff (xA , xA ) ∈ Sjf for        previous method for detecting logical differences that is de-
every A ∈ Σ.                                                           veloped in [5]. The previous approach also makes use of the
                                                                       fact that it is sufficient to search for left- and right-hand side
It is easy to see that the simulation Conditions (if ) and (iif )      witnesses to decide whether a logical difference exists. For
can be checked in polynomial time. Thus, as the procedure              computing left-hand side witnesses, the method described
described above terminates in at most |V1 ×V2 | iterations, we         in [5] is similar to checking for the existence of a forward
can infer that it can be checked in polynomial time whether            simulation. The two simulation notions are virtually iden-
GT1 ,→fΣ GT2 holds.                                                    tical with the difference that we work with hypergraphs,
                                                                       whereas canonical models are used in [5].
Similar arguments show that the existence of a backward Σ-
simulation can be checked in polynomial time as well, which            Fundamental differences can be found regarding the compu-
gives us the following result.                                         tation of right-hand side witnesses. Recall from Section 2.3
                                                                       that A ∈ cWtnrhs Σ (T1 , T2 ) iff there exists a Σ-concept C such
   Theorem 4. Let GTΣ1 = (V1 , E1 ), GTΣ2 = (V2 , E2 ) be ontol-       that T1 |= C v A but T2 6|= C v A. The general aim of [5] is
ogy hypergraphs of two normalised terminologies T1 and T2              to find a complete representation of all Σ-concepts C with
for a signature Σ. Then it can be checked in polynomial time           T2 6|= C v A. Note that typically infinitely many such con-
whether GTΣ1 ,→fΣ GTΣ2 and GTΣ1 ,→bΣ GTΣ2 holds.                       cepts C exist. For every n ≥ 0, finite sets noimplyn    T2 ,Σ (A) of
                                                                       ELΣ -concepts are inductively defined which have the prop-
Note that in a practical implementation it would not be re-            erty that there exists an ELΣ -concept C with T1 |= C v
quired to take the complete ontology graphs GTΣ1 and GTΣ2 into         A and T2 6|= C v A iff there exists n ≥ 0 and a D ∈
account if one wants to check whether a concept name A ∈ Σ             noimplynT2 ,Σ (A) such that T1 |= D v A. The parameter n
is a difference witness. It is sufficient to consider the sub-         represents the maximal number of nestings of existential re-
graph only which is induced by the →T1 and →T2 either in               strictions in C.
the “forward” or “backward” direction depending on the type
of witnesses that should be computed. For a typical (prac-             Two different algorithms are then presented in [5] for han-
tical) terminology T , S →T S 0 only holds for relatively few          dling the depth parameter n. Algorithm 1 makes use of
sets of nodes S, S 0 , which suggests that the number of nodes         reasoning on ABoxes, i.e. finite sets of assertions of the
that have to be considered for a simulation check should               form A(c) or r(c1 , c2 ), where A is a concept name, r a
remain fairly small as well.                                           role name, and c, c1 , c2 are constants. For a TBox T , an
                                                                       ABox A and a constant c we write (T , A) |= A(c) iff every
3.4    Computing Difference Examples                                   model I of T and A fulfills cI ∈ AI . The infinite sequence
So far we have focused on finding difference witnesses, i.e.           noimplyn
                                                                              T2 ,Σ (A), n ≥ 0, is now encoded into a polynomial-
concept names A belonging either to the set cWtnlhs     Σ (T1 , T2 )   size ABox AT2 ,Σ . In this way a reduction of the original
or the set cWtnrhsΣ (T1 , T2 ), which is sufficient to decide the      problem to an instance checking problem for the knowledge
existence of a logical difference between T1 and T2 . However,         base (T1 , AT2 ,Σ ) can be obtained. It can be shown that
in practical applications of logical difference it can be helpful      A ∈ cWtnrhs Σ (T1 , T2 ) iff (T1 , AT2 ,Σ ) |= A(ξ) for some con-
stant ξ which occurs in AT2 ,Σ and which is connected to A           also discussed how the hypergraph-based approach simpli-
(in some specific sense). The ABox AT2 ,Σ can be seen as an          fies previous approaches to computing the logical difference
encoding of the infinite sequence noimplyn  T2 ,Σ (A) for n ≥ 0;     that required a combination of different methods.
Algorithm 1 also works for cyclic terminologies, but one of
its drawbacks is that for typical terminologies and large Σ,         In this paper we have considered EL-terminologies only. This
the ABox AT2 ,Σ is of quadratic size in T2 , which makes             serves to illustrate the approach to the logical difference
it more challenging to obtain an implementation that can             problem based on hypergraphs, but extensions to richer log-
compare very large terminologies together with large sig-            ics are possible. For instance, dealing with the bottom con-
natures Σ. Also, it is not straightforward to extract ex-            cept, role inclusions and domain and range restrictions of
amples of Diff Σ (T1 , T2 ) which correspond to right-hand side      roles should not pose any problem. An extension to general
witnesses from an instance checking algorithm.                       EL-TBoxes and even to Horn SHIQ ontologies would be in-
                                                                     teresting. It remains to be seen whether and in how far the
Algorithm 2 uses a dynamic programming approach to de-               form and the number of concepts witnessing a logical dif-
rive conditions that allow us to identify which concepts in          ference can be restricted, analogous to the primitive witness
noimplyn T2 ,Σ (A) are relevant for deciding whether A is a right-   theorem (cf. Theorem 1). In any case the hypergraph and
hand side witness. This approach has been implemented in             the simulation notion would need to be adapted to the richer
the logical difference tool CEX [6], which can compare large         logic, but checking for the existence of a simulation may not
terminologies like Snomed ct on large signatures Σ in rea-           be tractable anymore. We leave this for future work as well
sonable time (cf. [5] for further details). Additionally, it is      as a performance evaluation of the current approach and any
possible to extend Algorithm 2 in such a way that it becomes         of its extensions on real-life ontologies. We also envision to
possible to construct examples of differences that correspond        integrate our approach for detecting logical differences into
to right-hand side witnesses (which is also implemented in           the OWL-API and into popular ontology editors such as
version 2.5 of CEX). As drawbacks, however, we have to note          Protégé.
that this approach only works for acyclic terminologies and
that possible extensions to more expressive description log-         6.   REFERENCES
ics are rather challenging as the complexity and the number           [1] F. Baader, S. Brandt, and C. Lutz. Pushing the EL
of the conditions that have to be checked to find right-hand              envelope. In Proc. of IJCAI-05. Morgan-Kaufmann
side witnesses for EL extended with role inclusions and do-               Publishers, 2005.
main/range restrictions is already rather involved.                   [2] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi,
                                                                          and P. F. Patel-Schneider, editors. The description
On the other hand, the approach presented in this paper                   logic handbook: theory, implementation, and
works for cyclic TBoxes, and it benefits from the fact that               applications. Cambridge University Press, 2007.
the same technique, i.e. checking for the existence of certain
                                                                      [3] S. Brandt. Polynomial time reasoning in a description
simulations, can be used both for finding left- and right-
                                                                          logic with existential restrictions, GCI axioms,
hand side witnesses. The structures that are simulated im-
                                                                          and—what else? In Proc. of ECAI-04, pages 298–302.
mediately correspond to the TBoxes involved (hyperedges
                                                                          IOS Press, 2004.
correspond to axioms). Moreover, the conditions that have
to be fulfilled for a node to simulate another node are fairly        [4] E. Clarke and H. Schlingloff. Model checking. In
straightforward in the sense that they only depend either                 Handbook of Automated Reasoning, volume II,
on the structure of the graph, or on the logical entailment               chapter 24, pages 1635–1790. Elsevier, 2001.
of Σ-concept names. Note that such conditions on the en-              [5] B. Konev, M. Ludwig, D. Walther, and F. Wolter. The
tailment of concept names are also present in Algorithm 1                 logical difference for the lightweight description logic
and 2. However, the practical usefulness of our approach will             EL. JAIR, 44:633–708, 2012.
still have to be demonstrated in an experimental evaluation.          [6] B. Konev, M. Ludwig, and F. Wolter. Logical
                                                                          difference computation with CEX2.5. In Proc. of
                                                                          IJCAR-12, pages 371–377. Springer, 2012.
5.   CONCLUSION                                                       [7] B. Konev, D. Walther, and F. Wolter. The logical
We have presented a novel approach to the logical difference              difference problem for description logic terminologies.
problem using a hypergraph representation of ontologies. As               In Proc. of IJCAR-08, pages 259–274. Springer, 2008.
ontologies we consider (possibly cyclic) terminologies given
                                                                      [8] D. Lembo, V. Santarelli, and D. F. Savo. Graph-based
in the description logic EL. As differences between termi-
                                                                          ontology classification in OWL 2 QL. In Proc. of
nologies we only consider EL-concept inclusions formulated
                                                                          ESWC 2013, volume 7882 of LNCS, pages 320–334.
in a given signature. A terminology is transformed into a
                                                                          Springer, 2013.
hypergraph by taking the signature symbols as nodes and
treating the axioms as hyperedges. We have devised two                [9] C. Lutz and F. Wolter. Deciding inseparability and
simulation notions between hypergraphs. The existence of                  conservative extensions in the description logic EL.
the simulations is equivalent to the fact that every concept              JoSC, 45(2):194–228, Feb. 2010.
inclusion which is formulated in the considered signature            [10] R. Nortje, A. Britz, and T. Meyer. Module-theoretic
and which follows from the first corresponding terminology                properties of reachability modules for SRIQ. In Proc.
also follows from the second terminology. Checking for the                of DL-13, pages 868–884. CEUR-WS.org, 2013.
existence of simulations is tractable, confirming the estab-         [11] B. Suntisrivaraporn. Polynomial time reasoning
lished complexity bounds in [7]. If a simulation does not                 support for design and maintenance of large-scale
exist, we have sketched how to construct a concept inclu-                 biomedical ontologies. PhD thesis, TU Dresden,
sion witnessing a difference using the hypergraph. We have                Germany, 2009.