The Concept Difference for EL-Terminologies using Hypergraphs∗ Andreas Ecke Michel Ludwig Dirk Walther Theoretical Computer Science Theoretical Computer Science Theoretical Computer Science TU Dresden, Germany TU Dresden, Germany TU Dresden, Germany ecke@tcs.inf.tu- Center for Advancing Center for Advancing dresden.de Electronics Dresden Electronics Dresden michel@tcs.inf.tu- dirk@tcs.inf.tu- dresden.de dresden.de ABSTRACT comes more challenging as well. For instance, the ontology Ontologies are used to represent and share knowledge. Nu- SNOMED CT contains now definitions for about 400 000 merous ontologies have been developed so far, especially in terms, and the ‘NCBI organismal classification’ ontology knowledge intensive areas such as the biomedical domain. even for about 850 000 terms. In particular, the need to As the size of ontologies increases, their continued devel- have automated tool support for detecting and represent- opment and maintenance is becoming more challenging as ing differences between versions of an ontology is growing in well. Detecting and representing semantic differences be- importance for ontology engineering. Current support from tween versions of ontologies is an important task for which ontology editors, such as Protegé, SWOOP, OBO-Edit, and automated tool support is needed. In this paper we investi- OntoView, is mostly based on syntactic differences and does gate the logical difference problem using a hypergraph rep- not capture the semantic differences between ontologies. An resentation of EL-terminologies. We focus solely on the con- early detection of possibly unwanted semantic changes can cept difference wrt. a signature. For computing this differ- contribute to an error-resilient authoring process of ontolo- ence it suffices to check the existence of simulations between gies. hypergraphs whereas previous approaches required a combi- nation of different methods. The aim of this paper is to propose and investigate the logical difference problem using a hypergraph representa- 1. INTRODUCTION tion of ontologies. The logical difference problem was intro- Ontologies are widely used to represent domain knowledge. duced in [7], where the logical difference is taken to be the They contain specifications of objects, concepts and relation- set of queries formulated in a vocabulary of interest, called ships that are often formalised using a logic-based language signature, that produce different answers when evaluated over a vocabulary that is particular to an application do- over ontologies that are to be compared. In this paper we main. Ontology languages based on description logics [2] concentrate on ontologies expressed as terminologies in the have been widely adopted, e.g., description logics are under- lightweight description logic EL [1, 3] and on queries that lying the Web Ontology Language (OWL) and its profiles.1 are concept inclusions formulated in EL. Even though EL- Numerous ontologies have already been developed, in par- terminologies merely serve as a starting point for this inves- ticular, in knowledge intensive areas such as the biomedical tigation, we can illustrate the elegance of the hypergraph- domain.2 Ontologies constantly evolve, they are regularly based approach and the advantages over existing approaches extended, corrected and refined. As the size of ontologies to computing the logical difference. The relevance of EL is increases, their continued development and maintenance be- emphasised by the fact that many ontologies are largely for- mulated in EL, notable examples being SNOMED CT and ∗We thank the reviewers of the workshop DChanges 2013 NCI. for their comments. The authors acknowledge the support of the German Research Foundation (DFG), Andreas Ecke An EL-terminology can easily be translated into a directed within GRK 1763 (QuantLA), and Michel Ludwig and Dirk hypergraph by taking the signature symbols as nodes and Walther within the Resilience and Bio Path of the Cluster of Excellence ‘Center for Advancing Electronics Dresden’. treating the axioms as hyperedges. For instance, the axiom 1 http://www.w3.org/TR/owl2-overview/ A v ∃r.B is translated into the hyperedge ({xA }, {xr , xB }), 2 http://bioportal.bioontology.org and the axiom A ≡ B1 u B2 into the three hyperedges ({xA }, {xB1 }), ({xA }, {xB2 }) and ({xB1 , xB2 }, {xA }), where each node xY corresponds to the signature symbol Y , respec- tively. A feature of the translation of axioms into hyperedges is that all information about the axiom and the logical oper- ators in it is preserved. We can actually treat the ontology This work is licensed under the Creative Commons Attribution- and its hypergraph interchangeably. The existence of cer- ShareAlike 3.0 Unported License (CC BY-SA 3.0). To view a copy tain simulations between hypergraphs characterises the fact of the license, visit http://creativecommons.org/licenses/by-sa/3.0/. that the corresponding terminologies are logically equivalent DChanges 2013, September 10th, 2013, Florence, Italy. and, thus, no logical difference exists. If no simulation ex- ceur-ws.org Volume 1008, http://ceur-ws.org/Vol-1008/paper3.pdf ists, we can directly extract the axioms responsible for the C I 6= ∅, C I ⊆ DI , or C I = DI . We write I |= α if I concept inclusion that witnesses the logical difference from satisfies the axiom α. An interpretation I satisfies a TBox T the hypergraph. if I satisfies all axioms in T ; in this case, we say that I is a model of T . An axiom α follows from a TBox T , written The main advantages of the hypergraph-based approach to T |= α, if for all models I of T , we have that I |= α. logical difference are: (i) an elegant algorithm for detecting Checking that T |= α can be done in polynomial time in the the existence of concept differences (solely involving check- size of T and α [1, 3]. ing for simulations in hypergraphs), even for large or cyclic terminologies; (ii) a straightforward way to construct con- A signature Σ is a finite set of symbols from NC and NR . The cept inclusions that witness the logical difference between signature sig(C), sig(α) or sig(T ) of the concept C, axiom α two terminologies, even for cyclic terminologies; and (iii) a or TBox T is the set of concept and role names occurring in simple computation of explanations, i.e., sets of axioms that C, α or T , respectively. An ELΣ -concept C is an EL-concept entail such concept inclusions. Currently, the algorithms im- such that sig(C) ⊆ Σ. plemented for detecting the logical difference work for large but acyclic terminologies such as SNOMED CT [5–7]. The Two TBoxes T and T 0 are logically equivalent wrt. a sig- algorithm in [6] can also handle “small” cyclic terminologies, nature Σ, written T ≡Σ T 0 , if for all EL-axioms α with but the concept inclusions witnessing a difference cannot sig(α) ⊆ Σ: T |= α iff T 0 |= α. In other words, two TBoxes easily be constructed using that algorithm. are logically equivalent wrt. a signature if the same axioms formulated in the signature follow from them. In this case, The paper is organised as follows. We start by reviewing the TBoxes are also said to be Σ-inseparable. Conserva- some notions regarding the description logic EL, the logical tive extensions are a special case of logical equivalence: for difference problem, and ontology hypergraphs. In Section 3, T ⊆ T 0 and Σ = sig(T ), T 0 is a conservative extension of we introduce two simulation notions, a forward and a back- T wrt. Σ iff T ≡Σ T 0 . Deciding the logical equivalence of ward simulation, one for each type of concept inclusion that EL-TBoxes wrt. a signature is ExpTime-complete [9]. may witness the logical difference between two terminolo- gies. In each case we show that the existence of a simula- To be able to better deal with complex concepts in a TBox, tion between two terminologies corresponds to the absence we assume that there are no nested existential restrictions. of difference witnesses. We analyse the computational com- We say that a TBox T is flattened if all conjunctions C u D plexity of checking for simulations, and we sketch how to and existential restrictions ∃r.E in T are such that C, D are construct counter-examples. In Section 4, we discuss previ- concept names or conjunctions, and E is a concept name. ous approaches to computing the logical difference in [5] and We ignore the nesting of binary conjunctions and treat them explain the advantages of the hypergraph-based approach as n-ary conjunctions of n concept names, where n ≥ 2. introduced in this paper. Finally we conclude the paper. The axioms of a flattened TBox are of the form X ./ Y , where X, Y ∈ {>} ∪ {B1 u · · · u Bn | n > 0, Bi ∈ NC } ∪ 2. PRELIMINARIES {∃r.A | r ∈ NR , A ∈ NC } and ./ ∈ {v, ≡}. Any EL-TBox We start by briefly reviewing the lightweight description can be flattened by appropriately replacing nested complex logic EL and some notions related to the logical difference, concepts C by fresh concept names XC and adding concept together with some basic results. equations XC ≡ C to the TBox that define the new symbols. It can be readily seen that this transformation is tractable and that it does not change the meaning of the original 2.1 The Logic EL TBox. The following lemma makes this precise. Let NC and NR be mutually disjoint sets of concept names and role names. We assume these sets to be countably infi- nite. We typically use A, B to denote concept names and r Lemma 1. For every EL-TBox T , there is a flattened EL- to denote role names. The set of EL-concepts C is defined TBox T ’ of polynomial size in the size of T such that T ≡Σ inductively as: T 0 with Σ = sig(T ). • > and all concept names in NC are EL-concepts, For the remainder of the paper we assume that TBoxes are • if C, D are EL-concepts, then C u D and ∃r.C are EL- flattened. concepts, where r ∈ NR . 2.2 Terminologies in Normal Form An EL-TBox T is a finite set of axioms, where an axiom An important motivating feature of EL is that it exhibits can be a concept inclusion C v D, or a concept equation a low complexity for standard reasoning tasks. However, C ≡ D, where C, D range over EL-concepts. as we have seen above, deciding the logical equivalence of EL-TBoxes wrt. a signature already requires exponential The semantics of EL is defined using interpretations I = time.3 To gain tractability for deciding the logical equiva- (∆I , ·I ), where the domain ∆I is a non-empty set, and ·I is a lence, TBoxes are restricted to a particular form as in [5, 7]. function mapping each concept name A to a subset AI of ∆I and every role name r to a binary relation rI over ∆I . The extension C I of a concept C is defined inductively as follows: Definition 1. An EL-TBox T is called an EL-terminology >I := ∆I , (C u D)I := C I ∩ DI and (∃r.C)I := {x ∈ ∆I | if it satisfies the following conditions: ∃y ∈ C I : (x, y) ∈ rI }. An interpretation I satisfies a 3 Note that it is tractable to check the logical equivalence of concept C, an axiom C v D or C ≡ D if, respectively, two EL-TBoxes without restricting the signature [1, 3]. • all concept inclusions and equations in T are of the As the set Diff Σ (T1 , T2 ) is infinite in general, we make use form A v C, A ≡ C, where A is a concept name, and of the following “primitive witnesses” theorem from [5] that states that we only have to consider two specific types of • no concept name A occurs more than once on the left- concept differences. hand side of an axiom in T . Theorem 1 (Primitive witnesses). Let T1 and T2 be The restriction to EL-terminologies yields that deciding log- EL-terminologies and Σ a signature. If α ∈ Diff Σ (T1 , T2 ), ical equivalence wrt. a signature becomes tractable [5, 7]. then either C v A or A v D is a member of Diff Σ (T1 , T2 ), where A ∈ sig(α) is a concept name and C, D are EL- Definitions in terminologies can be cyclic, which may cause concepts occurring in α. difficulties for reasoning algorithms. A terminology is cyclic if a concept name refers to itself along concept inclusions and equations. To be precise, for a terminology T , let ≺T We define cWtnlhs Σ (T1 , T2 ) as the set of all concept names be a binary relation over NC such that A ≺T B if there is A from Σ such that there exists an ELΣ -concept C with an axiom of the form A v C or A ≡ C in T such that A v C ∈ Diff Σ (T1 , T2 ). Similarly, cWtnrhs Σ (T1 , T2 ) is the set B ∈ sig(C). A terminology T is acyclic if the transitive of all concept names A ∈ Σ such that there exists an ELΣ - closure of ≺T is irreflexive; otherwise T is cyclic. An acyclic concept C with C v A ∈ Diff Σ (T1 , T2 ). The concept names terminology can be unfolded (i.e. the process of substituting in cWtnlhsΣ (T1 , T2 ) are called left-hand side witnesses and the concept names by their definitions stops). concept names in cWtnrhs Σ (T1 , T2 ) right-hand side witnesses. Note that these sets are subsets of Σ, and by Theorem 1 In this paper we do not restrict terminologies to be acyclic. their union is a finite and succinct representation of the set However, we have to take care of certain cycles. In our Diff Σ (T1 , T2 ), which is typically infinite. approach we want all conjunctions to be unfolded. That is, for any conjunction A1 u · · · u Am in T , we substitute any Checking for the concept difference between two terminolo- Ai with B1 u · · · u Bn if Ai ≡ B1 u · · · u Bn ∈ T . To this end gies equals checking for the existence of left- and right-hand we need to handle the cycles along such concept equations. side witnesses. As a corollary of Theorem 1, we have that: Formally, a terminology T has unfoldable conjunctions if it Diff Σ (T1 ,T2 ) = ∅iff cWtnlhs rhs Σ (T1 ,T2 ) = ∅ and cWtnΣ (T1 ,T2 ) = ∅. does not contain any concept equations A1 ≡ F1 , . . . , An ≡ Fn , where F1 , . . . , Fn are conjunctions of concept names such 2.4 Ontology Hypergraphs that Ai+1 ∈ sig(Fi ) for every 1 ≤ i < n, and A1 ∈ sig(Fn ). Hypergraphs are a generalisation of graphs with many ap- Any terminology can be rewritten such that it has unfoldable plications in computer science and discrete mathematics. conjunctions without changing the logical consequences (cf. In knowledge representation hypergraphs have been used proof of Lemma 1 in [5]). We say that a concept name A is implicitly to define reachability-based modules of ontolo- conjunctive in T iff there exist concept names B1 , . . . , Bn , gies [11], and explicitly to define locality-based modules [10]. n > 0, such that A ≡ B1 u . . . u Bn ∈ T ; otherwise A is said In this paper we also make the notion of a hypergraph ex- to be non-conjunctive in T . Note that after the unfolding plicit by transforming terminologies into hypergraphs in or- of conjunctions (and removing of cycles) in a terminology T der to be able to define simulations on the graphs. no concept name that appears as a conjunct is defined as a conjunction in T . A directed hypergraph is a tuple G = (V, E), where V is a non- empty set of nodes (or vertices), and E is a set of directed To simplify the presentation we assume that terminologies hyperedges of the form e = (S, S 0 ), where S, S 0 ⊆ V. We use do not contain trivial axioms of the form A ≡ > or A ≡ B, hypergraphs to represent terminologies as follows. where A and B are concept names. An EL-terminology T is normalised if it consists of EL- Definition 3. For a normalised terminology T and a sig- concept inclusions and equations of the following forms: nature Σ, the ontology hypergraph GTΣ of T for Σ is a directed hypergraph GTΣ = (V, E) defined as follows: • A ≡ ∃r.B, A ≡ ∃r.>, A ≡ B1 u . . . u Bm , and V = { xA | A ∈ NC ∩ (Σ ∪ sig(T )) } • A v ∃r.B, A v ∃r.>, A v B1 u . . . u Bn , ∪ { xr | r ∈ NR ∩ (Σ ∪ sig(T )) } ∪ {x> } where m ≥ 2, n ≥ 1, and A, B, Bi are concept names such that every conjunct Bi is non-conjunctive in T . and E = { ({xA }, {xBi }) | A v B1 u . . . u Bn ∈ T , 1 ≤ i ≤ n } 2.3 Logical Difference ∪ { ({xA }, {xBi }) | A ≡ B1 u . . . u Bn v T , 1 ≤ i ≤ n } The logical difference between two TBoxes witnessed by con- cept inclusions over a signature Σ is defined as follows. ∪ { ({xA }, {xr , xY }) | A v ∃r.Y ∈ T , Y ∈ NC ∪ {>} } ∪ { ({xA }, {xr , xY }) | A ≡ ∃r.Y ∈ T , Y ∈ NC ∪ {>} } Definition 2. The Σ-concept difference between two EL- ∪ { ({xr , xY }, {xA }) | A ≡ ∃r.Y ∈ T , Y ∈ NC ∪ {>} } TBoxes T1 and T2 for a signature Σ is the set Diff Σ (T1 , T2 ) ∪ { ({xB1 , . . . , xBn }, {xA }) | A ≡ B1 u . . . u Bn ∈ T } of all EL-concept inclusions α such that sig(α) ⊆ Σ, T1 |= α, and T2 6|= α. An ontology hypergraph GTΣ contains a node for > and for (vi) {x} →T {z} if {x} →T {yi } for all i ∈ {1, . . . , n}, every role and concept name in Σ or T . Hyperedges in ({y1 , . . . , yn }, {z}) ∈ E. GTΣ represent axioms in T . Every hyperedge is directed and can be understood as an implication, i.e., ({xA }, {xB }) represents T |= A v B. The complex hyperedges are of Note that the relation →T associates nodes xA that repre- the form ({xA }, {xr , xB }) and ({xr , xB }, {xA }) represent- sent concept names A either with nodes xB that stand for ing T |= A v ∃r.B and T |= ∃r.B v A, and of the form concept names B or with pairs of nodes {xr , z} representing ({xB1 , ..., xBn }, {xA }) standing for T |= B1 u . . . u Bn v A. concepts of the form ∃r.A or ∃r.>. The binary relation →T Note that due to the normalisation of T , conjunctions al- is reflexive and transitive on single nodes by Conditions (i) ways have more than one conjunct (i.e. n ≥ 2). and (ii). Moreover, in Condition (vi) transitivity of →T is extended to hyperedges with complex left-hand sides, repre- senting axioms of the form A ≡ B1 u. . .uBn . The other con- Example 1. Let T = {A ≡ B1 uB2 uB3 , B3 v ∃r.B4 , B4 v ditions handle pairs of nodes. Condition (iii) states that any B1 } and Σ = {B5 }. Then the ontology hypergraph GTΣ of T indirectly reachable pair {xr , z} via an intermediate node for Σ can be depicted as follows: is also directly reachable via →T , while Condition (iv) en- sures the same property for indirectly reachable nodes via intermediate pairs. Condition (v) is a special case of (iv) xB5 x> for handling pairs involving > as ontology hypergraphs for xB1 normalised terminologies T do not contain hyperedges from xB4 nodes xA representing concept names to x> representing > xB2 (T does not contain any axioms of the form A v > or xA A ≡ >). xr xB3 It can be readily seen that the relation →T can be computed in polynomial time. We emphasise here that the relation →T does not coincide 3. LOGICAL DIFFERENCE USING with the usual reachability notion in a hypergraph. The HYPERGRAPHS following example shows that →T connects reachable nodes Our approach for detecting logical differences wrt. Σ is based but not all reachable nodes are connected via →T . This on finding appropriate simulations between the hypergraphs means that the usual reachability relation does not correctly GTΣ1 and GTΣ2 such that every node xA in GTΣ1 with A ∈ Σ is mimic logical consequences entailed by T . simulated by the node xA in GTΣ2 . It is well known that the existence of a simulation between two graph structures can be used to characterise some notion of equivalence between Example 2. Let T = {A v ∃r.B 0 , ∃r.B 0 v B, ∃r.B v 0 the graphs [4], for example reachability. In this paper we aim A }. It holds that {xA } →T {xB }, i.e. T |= A v B, and to capture logical entailment wrt. a signature by defining the the node xB is reachable from xA (in terms of standard simulation relations appropriately. graph reachability). However, xA0 is also reachable from xA whereas {xA } 6→T {xA0 } and T 6|= A v A0 . We first introduce an auxiliary relation →T over the nodes of the ontology hypergraph GTΣ of the terminology T . The relation →T is a special reachability notion in GTΣ that mim- The notion of reachability induced by the relation →T can ics reasoning wrt. T . The definition of →T is related to be characterised in terms of entailment. the completion algorithm for classification in EL [1] and OWL 2 QL [8]. Afterwards we define two types of sim- Lemma 2. Let GTΣ = (V, E) be the ontology hypergraph ulations between the hypergraphs of two terminologies T1 of a normalised terminology T for a signature Σ. Then we and T2 , one type of simulation for each type of witness. have for every A, B, r ∈ Σ ∪ sig(T ): Definition 4. Let GTΣ = (V, E) be the ontology hypergraph (i) T |= A v B iff {xA } →T {xB }; of a normalised terminology T for a signature Σ. The re- (ii) T |= A v ∃r.B iff {xA } →T {xr , xB 0 } and {xB 0 } →T lation →T ⊆ V(1) × V(2) is inductively defined as follows, {xB } for some B 0 ∈ Σ ∪ sig(T ); where V(k) = { S ⊆ V | 0 < |S| ≤ k }: (iii) T |= A v ∃r.> iff {xA } →T {xr , xY } for some Y ∈ Σ ∪ sig(T ) ∪ {>}. (i) {x} →T {x} for every x ∈ V; (ii) {x} →T {z} if {x} →T {y}, ({y}, {z}) ∈ E; As described above, we want to check for every concept name A ∈ Σ whether A belongs to cWtnlhs Σ (T1 , T2 ) or to (iii) {x} →T {xr , z} if {x} →T {y}, ({y}, {xr , z}) ∈ E; cWtnrhs Σ (T1 , T2 ). For the former, we check for the existence (iv) {x} →T {z} if {x} →T {xr , y}, {y} →T {y 0 }, and of a forward simulation, and for the latter, for the existence ({xr , y 0 }, {z}) ∈ E; of a backward simulation between the ontology hypergraphs GTΣ1 and GTΣ2 . We define the simulations in the following (v) {x} →T {z} if {x} →T {xr , y}, ({xr , x> }, {z}) ∈ E; subsections. 3.1 Forward Simulation We now prove that the existence of a forward simulation Based on the relation →T we can now give the definition of between a node xA1 in GT1 and a node xA2 in GT2 exactly the forward simulation, which connects nodes in GTΣ1 with captures the property that T1 |= A1 v C entails that T2 |= nodes in GTΣ2 that are reachable via →T1 and →T2 , respec- A2 v C for every Σ-concept C. tively. Lemma 3. Let T1 , T2 be normalised terminologies, and let Σ be a signature such that GT1 ,→fΣ GT2 . Then for ev- Definition 5. Let GTΣ1 = (V1 , E1 ), GTΣ2 = (V2 , E2 ) be ontol- ogy hypergraphs of two normalised terminologies T1 and T2 ery ELΣ -concept C and for every (xA1 , xA2 ) ∈ ,→fΣ with for a signature Σ. A relation ,→fΣ ⊆ V1 × V2 is a forward Σ- T1 |= A1 v C it holds that T2 |= A2 v C. simulation between GTΣ1 and GTΣ2 if the following conditions hold: Lemma 4. Let T1 , T2 be normalised terminologies, and let Σ be a signature such that cWtnlhs Σ (T1 , T2 ) = ∅. Then (if ) if xA ,→fΣ xA0 , then for every B ∈ Σ with {xA } →T1 GT1 ,→fΣ GT2 . {xB } it holds that {xA0 } →T2 {xB }; (iif ) if xA ,→fΣ xA0 , then for every r ∈ Σ such that {xA } →T1 {xr , xX } there is a xX 0 ∈ V2 such that {xA0 } →T2 We obtain Theorem 2 as a consequence of the previous two lemmas. {xr , xX 0 } and xX ,→fΣ xX 0 . We write GTΣ1 ,→fΣ GTΣ2 iff there exists a forward Σ-simulation Theorem 2. Let T1 , T2 be normalised terminologies, and f ,→fΣ ⊆ V1 × V2 such that (xA , xA ) ∈ ,→fΣ for every A ∈ Σ. let Σ be a signature. Then cWtnlhs Σ (T1 , T2 ) = ∅ iff GT1 ,→Σ GT2 . For a node xA in GTΣ1 to be forward simulated by xA0 in GTΣ2 , Condition (if ) enforces that every Σ-concept name B that 3.2 Backward Simulation is entailed by A in T1 must also be entailed by A0 in T2 . We now turn to right-hand side witnesses, i.e. we want to Condition (iif ) ensures a similar requirement for concepts devise an algorithm that checks whether cWtnrhs Σ (T1 , T2 ) = ∅. of the form ∃r.X with X ∈ sig(T1 ) ∪ {>} such that T1 |= Analogously as for the left-hand side witnesses, we introduce A v ∃r.X while propagating the simulation to the successor a backward simulation which has the property that a node node xX . xA1 in GTΣ1 is simulated by a node xA2 in GTΣ2 iff T1 |= C v A1 entails T2 |= C v A2 for every Σ-concept C. Intuitively, the hypergraph has to be traversed backwards to identify all Example 3. Let T1 = {A v ∃r.A}, T2 = {A v ∃r.X, X v essential concepts C for which T1 |= C v A1 . In particular, A u Y, Y v ∃r.X}, and Σ = {A, r}. Then one can see concept names A1 for which there does not exist an ELΣ - that Diff Σ (T1 , T2 ) = ∅. Furthermore, wrt. GTΣ1 it only holds concept C with T1 |= C v A1 do not have to be simulated by that {xA } →T1 {xA }, {xA } →T1 {xr , xA }. Regarding GTΣ2 , a node in GTΣ2 since such concept names cannot become right- we have {xA } →T2 {xA }, {xA } →T2 {xr , xX }, {xX } →T2 hand side witnesses. We identify such concept names A1 by {xA }, {xX } →T2 {xr , xX }. Hence, one can see that S = checking whether the node xA1 is Σ-entailed in the following {(xA , xA ), (xA , xX )} is a forward Σ-simulation between GTΣ1 sense. and GTΣ2 with (xA , xA ) ∈ S. A graphical representation of the ontology hypergraphs GTΣ1 , GTΣ2 and of the simulation S Definition 6. Let GTΣ = (V, E) be the ontology hypergraph can be found below. of a normalised terminology T for a signature Σ. Moreover, let VΣ ⊆ V be the smallest set of nodes defined inductively as follows: xY GTΣ1 GTΣ2 (i) x> ∈ VΣ ; xX xr xr (ii) if xA ∈ V such that there exists B ∈ Σ with {xB } →T {xA }, then xA ∈ VΣ ; (iii) if xB ∈ VΣ with B ∈ NC ∪ {>}, ({xB , xr }, {xA }) ∈ E, x> xA xA x> and r ∈ Σ, then xA ∈ VΣ ; Example 4. Let T1 = {A v ∃r.X, X v AuB}, T2 = {A v (iv) if xB1 , . . . , xBn ∈ VΣ with ({xB1 , . . . , xBn }, {xA }) ∈ E, X u Y, X v ∃r.A, Y v ∃r.B}, and Σ = {A, B, r}. Then, then xA ∈ VΣ . for instance, A v ∃r.(A u B) ∈ Diff Σ (T1 , T2 ). It holds that We then say that a node x ∈ V is Σ-entailed in GTΣ iff x ∈ VΣ . {xA } →T1 {xr , xX }, {xX } →T1 {xA }, {xX } →T1 {xB }, {xA } →T2 {xr , xA }, {xA } →T2 {xr , xB }. However, for x = xA or x = xB it does not hold that {x} →T2 {xA } The node x> is always Σ-entailed for every signature Σ. A and {x} →T2 {xB }, i.e. the node xX in GTΣ1 cannot be sim- node x is Σ-entailed if it is reachable via →T from a node xB ulated by xA or xB in GTΣ2 as Condition (if ) cannot be sat- with B ∈ Σ, or if its direct predecessors in the ontology isfied. Thus, one can see that there cannot exist a forward hypergraph are Σ-entailed. In particular, every node xA Σ-simulation S between GTΣ1 and GTΣ2 with (xA , xA ) ∈ S. with A ∈ Σ is Σ-entailed. Example 5. Let T = {A ≡ ∃r.X, X ≡ B1 uB2 }. For Σ1 = In the following, we write GTΣ1 ,→bΣ GTΣ2 iff there exists a {B1 , B2 , r}, all the nodes are Σ1 -entailed in GTΣ1 . However, backward Σ-simulation ,→bΣ ⊆ V1 × V2 with (xA , xA ) ∈ ,→bΣ for Σ2 = {B1 , B2 } only the nodes xB1 , xB2 xX , and x> are for every A ∈ Σ. Σ2 -entailed in GTΣ2 , whereas for Σ3 = {B1 , r} only the node x> is Σ3 -entailed in GTΣ3 . Note that T |= C v A holds for For a node xA in GTΣ1 to be backward simulated by xA0 C = ∃r.(B1 u B2 ) and sig(C) ⊆ Σ1 but sig(C) 6⊆ Σ2 and in GTΣ2 , Conditions (ib ) and (iib ) are the equivalent of the sig(C) 6⊆ Σ3 . Conditions (if ) and (iif ), respectively, for forward simu- lations. Condition (iiib ) handles axioms of the form A ≡ Lemma 5. Let GTΣ = (V, E) be the ontology hypergraph B1 u. . .uBn in T1 . Note that we quantify over the conjuncts of a normalised terminology T for a signature Σ, and let of A0 in T2 since, intuitively speaking, fewer conjuncts suffice xA ∈ V. Then the node xA is Σ-entailed in GTΣ iff there to preserve logical entailments. Take, for instance, the two exists an ELΣ -concept C such that T |= C v A. normalised terminologies T1 = {A ≡ B1 u B2 }, T2 = {A v B1 u B2 , B1 v A} and the signature Σ = {A, B1 , B2 }; then To compute all the nodes in a given graph GT that are Σ- cWtnrhs Σ (T1 , T2 ) = ∅ and, in particular, T2 |= B1 u B2 v A entailed, we can proceed as follows. In a first step identify holds as well. all the nodes x that fulfill conditions (i) and (ii) by using the relation →T . Subsequently, propagate the Σ-entailed status to other nodes using conditions (iii) and (iv). It can be Example 6. Let T1 = {A ≡ ∃r.X, X ≡ B1 u B2 }, T2 = readily seen that these computation steps can be performed {A ≡ XuY, X ≡ ∃r.B1 , Y ≡ ∃r.B2 }, and Σ = {A, B1 , B2 , r}. in polynomial time. First we observe that the nodes xB1 , xB2 , xX , and xA are Σ-entailed in GTΣ1 . As only {xBi } →T1 {xBi } for i ∈ {1, 2}, Before we can give the definition of the backward simula- one can see that the node xBi in GTΣ1 can be simulated by the tion, we have to introduce the following notion: we associate node xBi in GTΣ2 for i ∈ {1, 2}. Due to non-conjT2 (xBi ) = with every node xA in a hypergraph GT a set of concept {xBi } for i ∈ {1, 2} and non-conjT1 (xX ) = {xB1 , xB2 }, we names non-conj(xA ) which are “essential” to entail A in T can infer that the node xX in GTΣ1 can be simulated both (also see [5] for a similar notion). by xB1 and xB2 in GTΣ2 (there does not exist X 0 ∈ Σ with {xX 0 } →T1 {xX }). Finally, as non-conjT2 (xA ) = {xX , xY }, Definition 7. Let GTΣ = (V, E) be an ontology hypergraph. we conclude that the node xA in GTΣ1 can be simulated by xA For xA ∈ V, let non-conj(xA ) be defined as follows in GTΣ2 due to Condition (iib ) (Condition (ib ) is trivially sat- isfied). Overall, • if ({xB1 , . . . , xBn }, {xA }) ∈ E, let S = {(xA , xA ), (xX , xB1 ), (xX , xB2 ), (xB1 , xB1 ), (xB2 , xB2 )} non-conjT (xA ) = {xB1 , . . . , xBn }; is a backward Σ-simulation between GTΣ1 and GTΣ2 such that • otherwise, let non-conjT (xA ) = {xA }. (Z, Z) ∈ S for every Z ∈ NC ∩ Σ. A graphical representation of the ontology hypergraphs GTΣ1 , GTΣ2 and of the simulation S can be found below. For a graph GTΣ = (V, E) we have ({xB1 , . . . , xBn }, {xA }) ∈ E iff A ≡ B1 u . . . u Bn ∈ T . Hence, it holds for every ELΣ - xA concept C that T |= C v A iff T |= C v X for every xA X ∈ { X | xX ∈ non-conjT (xA ) }. GTΣ1 GTΣ2 xX We can now give the definition of a backward simulation. xr xY xX Definition 8. Let GTΣ1 = (V1 , E1 ), GTΣ2 = (V2 , E2 ) be the x> x> ontology hypergraphs of the normalised terminologies T1 and T2 for a signature Σ. A relation ,→bΣ ⊆ V1 × V2 is a xB1 xB2 xB2 xr xB1 backward Σ-simulation between GTΣ1 and GTΣ2 if the following conditions hold: Example 7. Let T1 = {A ≡ B1 u B2 }, T2 = {A ≡ B1 u B 0 }, and Σ = {A, B1 , B2 }. First we observe that there (ib ) if xA ,→bΣ xA0 , then for every B ∈ Σ with {xB } →T1 does not exist a concept name Z ∈ Σ with {xZ } →T2 {xA } it holds that {xB } →T2 {x0A }; {xB 0 }, i.e. the nodes xB1 , xB2 in GTΣ1 cannot be simulated (iib ) if xA ,→bΣ xA0 and ({xX , xr }, {xA }) ∈ E1 such that by xB 0 in GTΣ2 as Condition (ib ) would be violated. Hence, r ∈ Σ and xX is Σ-entailed in GTΣ1 , then for every xBi0 ∈ as non-conjT1 (xA ) = {xB1 , xB2 } and as non-conjT2 (xA ) = non-conjT2 (xA0 ) there exists ({xXi0 , xr }, {xBi0 }) ∈ E2 {xB1 , xB 0 }, we can conclude that there cannot exist a back- such that xX ,→bΣ xXi0 ; ward Σ-simulation such that xA in GTΣ1 is simulated by xA in GTΣ2 as Condition (iiib ) cannot be fulfilled. (iiib ) if xA ,→bΣ xA0 and ({xB1 , . . . xBn }, {xA }) ∈ E1 where xBi are Σ-entailed in GTΣ1 for every 1 ≤ i ≤ n, then for every x0 ∈ non-conjT2 (xA ) there exists an x ∈ We can now establish the correctness and completeness prop- non-conjT1 (xA ) with x ,→bΣ x0 . erties regarding backward simulations. Lemma 6. Let T1 , T2 be normalised terminologies, and for users to have a concrete concept inclusion C v A or let Σ be a signature such that GT1 ,→bΣ GT2 . Then for ev- A v D in Diff Σ (T1 , T2 ) that corresponds to a witness A. We ery ELΣ -concept C and for every (xA1 , xA2 ) ∈ ,→bΣ with now sketch how to read such concept inclusions directly off T1 |= C v A1 it holds that T2 |= C v A2 . a hypergraph using Example 7. Recall that xB1 , xB2 in GTΣ1 cannot be simulated by xB 0 Lemma 7. Let T1 , T2 be normalised terminologies, and in GTΣ2 as T2 6|= B1 v B 0 and T2 6|= B2 v B 0 , i.e. for the let Σ be a signature such that cWtnrhs Σ (T1 , T2 ) = ∅. Then Σ-concept C = B1 u B2 it holds that T1 |= C v B1 u B2 , GT1 ,→bΣ GT2 . but T2 6|= C v B1 u B 0 . Hence, we have T1 |= C v A but T2 6|= C v A. We obtain Theorem 3 as a consequence of the previous two In general, if a node xA in GTΣ1 cannot be simulated by xA lemmas. in GTΣ2 , there exists a node x in GTΣ2 which is the main cause for the failure to find a simulation (x = xB 0 in the example Theorem 3. Let T1 , T2 be normalised terminologies, and above). By following the path from that node to the node xA let Σ be a signature with A ∈ Σ. Then cWtnrhs Σ (T1 , T2 ) = ∅ in GTΣ2 and by constructing conjunctions over all the failing iff GT1 ,→bΣ GT2 . possibilities to fulfill the simulation conditions (B1 u B2 in the example above) one can construct an example inclusion 3.3 Computational Complexity C v A (or A v C) that matches the difference witness A. Given two hypergraphs GTΣ1 = (V1 , E1 ) and GTΣ2 = (V2 , E2 ), The correctness of the algorithm described above can be seen one can proceed as follows to check whether GTΣ1 ,→fΣ GTΣ2 by using Lemma 2. It is known that such concepts C can be holds. First, let S0f ⊆ V1 × V2 be the set of all the pairs of exponential size [5], and consequently, we cannot hope to that fulfill Conditions (if ). Subsequently, iterate over the devise an algorithm that is guaranteed to run in polynomial time. elements contained in the set Sif and remove those pairs f which do not satisfy Conditions (iif ) to obtain the set Si+1 . f f 4. COMPARISON OF APPROACHES Eventually we will have Sj = Sj+1 for some index j and one We now compare the hypergraph-based approach with the can conclude that GTΣ1 ,→fΣ GTΣ2 holds iff (xA , xA ) ∈ Sjf for previous method for detecting logical differences that is de- every A ∈ Σ. veloped in [5]. The previous approach also makes use of the fact that it is sufficient to search for left- and right-hand side It is easy to see that the simulation Conditions (if ) and (iif ) witnesses to decide whether a logical difference exists. For can be checked in polynomial time. Thus, as the procedure computing left-hand side witnesses, the method described described above terminates in at most |V1 ×V2 | iterations, we in [5] is similar to checking for the existence of a forward can infer that it can be checked in polynomial time whether simulation. The two simulation notions are virtually iden- GT1 ,→fΣ GT2 holds. tical with the difference that we work with hypergraphs, whereas canonical models are used in [5]. Similar arguments show that the existence of a backward Σ- simulation can be checked in polynomial time as well, which Fundamental differences can be found regarding the compu- gives us the following result. tation of right-hand side witnesses. Recall from Section 2.3 that A ∈ cWtnrhs Σ (T1 , T2 ) iff there exists a Σ-concept C such Theorem 4. Let GTΣ1 = (V1 , E1 ), GTΣ2 = (V2 , E2 ) be ontol- that T1 |= C v A but T2 6|= C v A. The general aim of [5] is ogy hypergraphs of two normalised terminologies T1 and T2 to find a complete representation of all Σ-concepts C with for a signature Σ. Then it can be checked in polynomial time T2 6|= C v A. Note that typically infinitely many such con- whether GTΣ1 ,→fΣ GTΣ2 and GTΣ1 ,→bΣ GTΣ2 holds. cepts C exist. For every n ≥ 0, finite sets noimplyn T2 ,Σ (A) of ELΣ -concepts are inductively defined which have the prop- Note that in a practical implementation it would not be re- erty that there exists an ELΣ -concept C with T1 |= C v quired to take the complete ontology graphs GTΣ1 and GTΣ2 into A and T2 6|= C v A iff there exists n ≥ 0 and a D ∈ account if one wants to check whether a concept name A ∈ Σ noimplynT2 ,Σ (A) such that T1 |= D v A. The parameter n is a difference witness. It is sufficient to consider the sub- represents the maximal number of nestings of existential re- graph only which is induced by the →T1 and →T2 either in strictions in C. the “forward” or “backward” direction depending on the type of witnesses that should be computed. For a typical (prac- Two different algorithms are then presented in [5] for han- tical) terminology T , S →T S 0 only holds for relatively few dling the depth parameter n. Algorithm 1 makes use of sets of nodes S, S 0 , which suggests that the number of nodes reasoning on ABoxes, i.e. finite sets of assertions of the that have to be considered for a simulation check should form A(c) or r(c1 , c2 ), where A is a concept name, r a remain fairly small as well. role name, and c, c1 , c2 are constants. For a TBox T , an ABox A and a constant c we write (T , A) |= A(c) iff every 3.4 Computing Difference Examples model I of T and A fulfills cI ∈ AI . The infinite sequence So far we have focused on finding difference witnesses, i.e. noimplyn T2 ,Σ (A), n ≥ 0, is now encoded into a polynomial- concept names A belonging either to the set cWtnlhs Σ (T1 , T2 ) size ABox AT2 ,Σ . In this way a reduction of the original or the set cWtnrhsΣ (T1 , T2 ), which is sufficient to decide the problem to an instance checking problem for the knowledge existence of a logical difference between T1 and T2 . However, base (T1 , AT2 ,Σ ) can be obtained. It can be shown that in practical applications of logical difference it can be helpful A ∈ cWtnrhs Σ (T1 , T2 ) iff (T1 , AT2 ,Σ ) |= A(ξ) for some con- stant ξ which occurs in AT2 ,Σ and which is connected to A also discussed how the hypergraph-based approach simpli- (in some specific sense). The ABox AT2 ,Σ can be seen as an fies previous approaches to computing the logical difference encoding of the infinite sequence noimplyn T2 ,Σ (A) for n ≥ 0; that required a combination of different methods. Algorithm 1 also works for cyclic terminologies, but one of its drawbacks is that for typical terminologies and large Σ, In this paper we have considered EL-terminologies only. This the ABox AT2 ,Σ is of quadratic size in T2 , which makes serves to illustrate the approach to the logical difference it more challenging to obtain an implementation that can problem based on hypergraphs, but extensions to richer log- compare very large terminologies together with large sig- ics are possible. For instance, dealing with the bottom con- natures Σ. Also, it is not straightforward to extract ex- cept, role inclusions and domain and range restrictions of amples of Diff Σ (T1 , T2 ) which correspond to right-hand side roles should not pose any problem. An extension to general witnesses from an instance checking algorithm. EL-TBoxes and even to Horn SHIQ ontologies would be in- teresting. It remains to be seen whether and in how far the Algorithm 2 uses a dynamic programming approach to de- form and the number of concepts witnessing a logical dif- rive conditions that allow us to identify which concepts in ference can be restricted, analogous to the primitive witness noimplyn T2 ,Σ (A) are relevant for deciding whether A is a right- theorem (cf. Theorem 1). In any case the hypergraph and hand side witness. This approach has been implemented in the simulation notion would need to be adapted to the richer the logical difference tool CEX [6], which can compare large logic, but checking for the existence of a simulation may not terminologies like Snomed ct on large signatures Σ in rea- be tractable anymore. We leave this for future work as well sonable time (cf. [5] for further details). Additionally, it is as a performance evaluation of the current approach and any possible to extend Algorithm 2 in such a way that it becomes of its extensions on real-life ontologies. We also envision to possible to construct examples of differences that correspond integrate our approach for detecting logical differences into to right-hand side witnesses (which is also implemented in the OWL-API and into popular ontology editors such as version 2.5 of CEX). As drawbacks, however, we have to note Protégé. that this approach only works for acyclic terminologies and that possible extensions to more expressive description log- 6. REFERENCES ics are rather challenging as the complexity and the number [1] F. Baader, S. Brandt, and C. Lutz. Pushing the EL of the conditions that have to be checked to find right-hand envelope. In Proc. of IJCAI-05. Morgan-Kaufmann side witnesses for EL extended with role inclusions and do- Publishers, 2005. main/range restrictions is already rather involved. [2] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The description On the other hand, the approach presented in this paper logic handbook: theory, implementation, and works for cyclic TBoxes, and it benefits from the fact that applications. Cambridge University Press, 2007. the same technique, i.e. checking for the existence of certain [3] S. Brandt. Polynomial time reasoning in a description simulations, can be used both for finding left- and right- logic with existential restrictions, GCI axioms, hand side witnesses. The structures that are simulated im- and—what else? In Proc. of ECAI-04, pages 298–302. mediately correspond to the TBoxes involved (hyperedges IOS Press, 2004. correspond to axioms). Moreover, the conditions that have to be fulfilled for a node to simulate another node are fairly [4] E. Clarke and H. Schlingloff. Model checking. In straightforward in the sense that they only depend either Handbook of Automated Reasoning, volume II, on the structure of the graph, or on the logical entailment chapter 24, pages 1635–1790. Elsevier, 2001. of Σ-concept names. Note that such conditions on the en- [5] B. Konev, M. Ludwig, D. Walther, and F. Wolter. The tailment of concept names are also present in Algorithm 1 logical difference for the lightweight description logic and 2. However, the practical usefulness of our approach will EL. JAIR, 44:633–708, 2012. still have to be demonstrated in an experimental evaluation. [6] B. Konev, M. Ludwig, and F. Wolter. Logical difference computation with CEX2.5. In Proc. of IJCAR-12, pages 371–377. Springer, 2012. 5. CONCLUSION [7] B. Konev, D. Walther, and F. Wolter. The logical We have presented a novel approach to the logical difference difference problem for description logic terminologies. problem using a hypergraph representation of ontologies. As In Proc. of IJCAR-08, pages 259–274. Springer, 2008. ontologies we consider (possibly cyclic) terminologies given [8] D. Lembo, V. Santarelli, and D. F. Savo. Graph-based in the description logic EL. As differences between termi- ontology classification in OWL 2 QL. In Proc. of nologies we only consider EL-concept inclusions formulated ESWC 2013, volume 7882 of LNCS, pages 320–334. in a given signature. A terminology is transformed into a Springer, 2013. hypergraph by taking the signature symbols as nodes and treating the axioms as hyperedges. We have devised two [9] C. Lutz and F. Wolter. Deciding inseparability and simulation notions between hypergraphs. The existence of conservative extensions in the description logic EL. the simulations is equivalent to the fact that every concept JoSC, 45(2):194–228, Feb. 2010. inclusion which is formulated in the considered signature [10] R. Nortje, A. Britz, and T. Meyer. Module-theoretic and which follows from the first corresponding terminology properties of reachability modules for SRIQ. In Proc. also follows from the second terminology. Checking for the of DL-13, pages 868–884. CEUR-WS.org, 2013. existence of simulations is tractable, confirming the estab- [11] B. Suntisrivaraporn. Polynomial time reasoning lished complexity bounds in [7]. If a simulation does not support for design and maintenance of large-scale exist, we have sketched how to construct a concept inclu- biomedical ontologies. PhD thesis, TU Dresden, sion witnessing a difference using the hypergraph. We have Germany, 2009.