=Paper=
{{Paper
|id=None
|storemode=property
|title=More is Sometimes Less: Succinctness in EL
|pdfUrl=https://ceur-ws.org/Vol-1014/paper_49.pdf
|volume=Vol-1014
|dblpUrl=https://dblp.org/rec/conf/dlog/NikitinaS13
}}
==More is Sometimes Less: Succinctness in EL==
More is Sometimes Less: Succinctness in EL Nadeschda Nikitina1 and Sven Schewe2 1 University of Oxford, UK 2 University of Liverpool, UK Abstract. In logics, there are many ways to represent same facts. With respect to both reasoning and cognitive complexity, some representations are significantly less efficient than others. In this paper, we investigate different means of improv- ing the succinctness of TBoxes expressed in the lightweight description logic EL that forms a basis of some large ontologies used in practice. As a measure of size, we consider the number of references to signature elements. We investigate the problem of finding minimal equivalent representations and show that this task is NP-complete. A significant (up to triple-exponential) further improvement can be achieved by the introduction of auxiliary concept symbols. Thus, we additionally investigate the task of finding minimal representations for an ontology by extending its sig- nature. Since arbitrary extension of the ontology with concept symbols can make the ontology unreadable, we only allow for auxiliary concepts acting as shortcuts for other concepts (EL concepts and disjunctions thereof) expressed by means of terms of the original ontology. We show that this task is also NP-complete if shortcuts represent only EL concepts, and between NP and Σ2P , otherwise. 1 Introduction It is well-known that same facts can be represented in many ways, and that the size of these representations can vary significantly. Determining and increasing the degree of succinctness of a particular syntactic representation is an important, but also a very difficult task: for the average ontology, it is almost impossible to obtain the minimal representation without tool support. Thus, automated methods that help to assess the current succinctness of an ontology and generate suggestions on how to increase it would be highly valued by ontology engineers. In description logics [1], only few results in this direction were obtained so far. Baader, Küster, and Molitor [2] investigate rewriting concepts using terminologies in the narrow sense (sets of equivalence axioms where each defined atomic concept has exactly one definition). The investigated problem is a special case of minimizing a knowledge base by computing a minimal equivalent knowledge base. Grimm et al. [3] propose an algorithm for eliminating semantically redundant axioms from ontologies. In the above approach, axioms are considered as atoms that cannot be split into parts or changed in any other way. Bienvenu [4] proposes a normal form called prime im- plicates normal form for ALC ontologies, which enables fast reasoning. However, as a side-effect of this transformation, a doubly-exponential blowup in concept size can occur. In this paper, we investigate the succinctness for the lightweight description logic EL [5], which is the logical underpinning of one of the tractable sub-languages (the so-called profiles [6]) of the W3C-specified OWL Web Ontology Language [7]. First, we consider the problem of finding a minimal equivalent EL representation for a given ontology. We show that the related decision problem (is there an equivalent ontology of size ≤ k?) is NP-complete. Inspired by recent results on uniform interpolation in EL [8], we additionally con- sider an extended version of the problem. The above results imply that, even for the minimal equivalent representation of an ontology, an up to triple-exponentially more succinct representation can be obtained by extending its signature. Auxiliary concept symbols are therefore important contributors towards the succinctness of ontologies. It is easy to envision scenarios that demonstrate the usefulness of auxiliary concept sym- bols for improving succinctness. For instance, when a complex concept C is frequently used in the axioms of an ontology, the ontology will diminish in size when all occur- rences of C are replaced by a fresh atomic concept AC , and an axiom AC ≡ C is added to the ontology. However, an arbitrary extension of the ontology with concept symbols whose meaning is not obvious can certainly make the ontology unreadable. In order to preserve comprehensiveness, we only allow for auxiliary concepts acting as shortcuts – concepts that are defined using only terms of the original ontology. Presented with such a shortcut concept, an ontology engineer could find an appropriate comprehensive name for it. Otherwise, the ontology engineer has to guess the meaning of an auxiliary concept and the chance that he approves the extension suggested by the tool would be low. We demonstrate that auxiliary concept symbols acting as shortcuts for EL concepts expressed only by means of original ontology terms can lead to an exponential im- provement of succinctness and that the corresponding decision problem (is there such a representation of size ≤ k?) is NP-complete. Further, we show that, if we additionally allow for auxiliary concept symbols that act as shortcuts for disjunctions of EL concepts on the left-hand side of axioms (en- codable in EL using several axioms), we can reduce the size of the representation by a further exponent, thereby obtaining doubly-exponentially more succinct representa- tions.We show that the corresponding decision problem (is there such a representation of size ≤ k?) is NP-hard and included in Σ2P . The paper is organized as follows: In Section 2, we recall the necessary prelimi- naries on description logics. Section 3 demonstrates the potential of auxiliary concept symbols acting as shortcuts for achieving a higher succinctness. In the same section, we also introduce the basic definitions of the size of ontologies as well as the investigated notions of equivalents with and without signature extension. In Sections 4,5, we derive the complexity bounds for the corresponding decision problems. Finally, we conclude and outline future work in Section 6. Further details and proofs can be found in the extended version [9] of this paper. 2 Preliminaries We recall the basic notions in description logics [1] required in this paper. Let NC and NR be countably infinite and mutually disjoint sets of concept symbols and role symbols. An EL concept C is defined as C ::= A|>|C u C|∃r.C, where A and r range over NC and NR , respectively. In the following, we use symbols A, B to denote atomic concepts and C, D, E to denote arbitrary concepts. A terminol- ogy or TBox consists of concept inclusion axioms C v D and concept equivalence axioms C ≡ D used as a shorthand for C v D and D v C. The signature of an EL concept C or an axiom α, denoted by sig(C) or sig(α), respectively, is the set of con- cept and role symbols occurring in it. To distinguish between the set of concept symbols and the set of role symbols, we use sigC (C) and sigR (C), respectively. The signature of a TBox T , in symbols sig(T ) (correspondingly, sigC (T ) and sigR (T )), is defined anal- ogously. Next, we recall the semantics of the above introduced DL constructs, which is defined by means of interpretations. An interpretation I is given by the domain ∆I and a function ·I assigning each concept A ∈ NC a subset AI of ∆I and each role r ∈ NR a subset rI of ∆I × ∆I . The interpretation of > is fixed to ∆I . The interpre- tation of an arbitrary EL concept is defined inductively, i.e., (C u D)I = C I ∩ DI and (∃r.C)I = {x | (x, y) ∈ rI , y ∈ C I }. An interpretation I satisfies an axiom C v D if C I ⊆ DI . I is a model of a TBox, if it satisfies all of its axioms. We say that a TBox T entails an axiom α (in symbols, T |= α), if α is satisfied by all models of T . A TBox T entails another TBox T 0 , in symbols T |= T 0 , if T |= α for all α ∈ T 0 . T ≡ T 0 is a shortcut for T |= T 0 and T 0 |= T . In addition to EL, we will use disjunction on the left-hand side of axioms to obtain more succinct representations of EL TBoxes. Note that this extension is of a notational nature, i.e., does not give us the expressive power to represent more TBoxes than stan- dard EL. We define an ELD concept C as C ::= A|>|C u C|C t C|∃r.C, where A and r range over NC and NR , respectively. The interpretation of an arbitrary ELD concept is defined analogously to the interpretation of EL concepts with the ex- tension (C t D)I = C I ∪ DI . An ELD TBox consists of axioms that are either EL axioms or have the form C v D, where C is an ELD concept and D is an EL concept. Note that equivalence axioms (C ≡ D) do not contain ELD concepts, since they are a shortcut for C v D and D v C. 3 Achieving Succinctness in EL The size of a TBox is often measured by the number of axioms contained in it. This is, however, a very simplified view of the size in terms of both, cognitive complexity and reasoning. In this paper, we measure the size of a concept, an axiom, or a TBox by the number of references to signature elements. Definition 1. The size of an EL concept D is defined as follows: – for D ∈ sig(T ), ∫ (D) = 1; – for D = ∃r.C, ∫ (D) = ∫ (C) + 1 where r ∈ sigR (T ) and C is an arbitrary concept; – for D = C1 u C2 , ∫ (D) = ∫ (C1 ) + ∫ (C2 ) where C1 , C2 are arbitrary concepts; The size of an EL axiom or a TBox is accordingly defined as follows: – ∫ (C1 v C2 ) = ∫ (C1 ) + ∫ (C2 ) for concepts C1 , C2 ; – ∫ (C1 ≡ C P2 ) = ∫ (C1 ) + ∫ (C2 ) for concepts C1 , C2 . – ∫ (T ) = α∈T ∫ (α) for a TBox T . In practice, the suitable means that can be used to obtain a compact representation can differ depending on the scenario. To address cases, in which a signature extension is not feasible, we first consider the problem of finding the minimal equivalent EL representation for a given TBox among representations that use the same signature. Popular examples for avoidable non-succinctness are axioms that follow from other axioms and sub-concepts that can be removed from axioms without losing any logical consequences. While non-succinctness is easy to detect in these simple cases, non- succinctness can occur in many other forms. The ontology T = {C v ∃r.C, ∃r.C v ∃r.D, ∃r.D v D}, for instance, does neither contain any axioms that are entailed by the remainder of the ontology, nor are there any sub-expressions that can be removed. However, there exists a smaller representation {C v ∃r.C, C v D, ∃r.D v D} of T . The general version of the corresponding decision problem can be formulated as follows: Definition 2 (P1). Given an EL TBox T and a natural number k, is there an EL TBox T 0 with ∫ (T 0 ) ≤ k such that T 0 ≡ T . We denote the set {T 0 | T 0 ≡ T } by [T ]. We will show that this decision problem, which does not involve any signature extensions, is already NP-complete. Extending the Signature From the user’s point of view as well as with respect to reasoning, it sometimes makes sense to introduce fresh concept symbols, for instance, used as shortcuts for complex concepts that occur frequently in the ontology. It can be a tedious task for an ontol- ogy engineer to do it in an advantageous way, since, as we will show later on, the corresponding decision problem is NP-hard. To account for scenarios, in which an in- troduction of auxiliary concept symbols is desirable, in addition to the decision problem introduced above we consider the problem of finding succinct representations contain- ing shortcuts. We demonstrate by means of the following example the theoretical po- tential of such an extension of the signature with shortcuts: we show that it can lead to a doubly-exponentially more succinct representation of TBoxes. Example 1. Let the sets Ci of concept descriptions be inductively defined by C0 = {A1 , A2 }, Ci+1 = {∃r.C1 u ∃s.C2 | C1 , C2 ∈ Ci }. For a natural number n, consider the TBox Tn = {C v B | C ∈ Cn−1 }. Intuitively, the sets Ci of concepts have the shape of binary trees with exponentially many leaves, each of which can be A1 or A2 . Clearly, the concepts grow exponentially i with i. Further, it holds that |Ci+1 | = |Ci |2 and consequently |Ci | = 2(2 ) . Thus, Tn contains doubly exponentially many axioms, each of which has exponential size. While there is no smaller equivalent representation of Tn , this TBox can easily be represented in a more compact way using auxiliary concept symbols as shortcuts for complex EL or ELD concept expressions. First, combining several axioms into a single axiom with a disjunction on the left- hand side would allow us to reduce the size of Tn from double-exponential to single- exponential: we can define C0 = {A1 tA2 } and thus express all elements of the set Cn−1 by means of a single concept Cn−1 that has the shape of a binary tree with the concept A1 t A2 as leaves. The corresponding EL TBox Tn0 can be obtained by introducing the concept B0 that represents the disjunction A1 t A2 by means of the axioms A1 v B0 and A2 v B0 . Second, by using fresh concept symbols as shortcuts for complex EL concepts, Tn0 can be reduced by a further exponential as follows: we introduce concept symbols Bi with i ∈ {1, ..., n − 1} to represent each Ci and obtain the following TBox Tn00 : A1 v B 0 (1) A2 v B 0 (2) Bi+1 ≡ ∃r.Bi u ∃s.Bi i1 do • for all S ∈ S 0 , choose an s ∈ S and join S 0 with S r {s} • decrement l by one. After this, we join S with {{s} | s ∈ S}, and remove the empty set from S if applicable. Note that S 0 can easily be constructed in polynomial time. Now we show that there is a cover C of size ≤ k of S exactly if there is a cover C 0 of size ≤ k of S 0 . W.l.o.g., we can assume that ∅ 6∈ C, since we always obtain a cover from any cover C by removing ∅ from it. Since S ⊆ S 0 ∪ {∅}, any cover of S is a cover of S 0 . Let C 0 be a cover of size ≤ k of S 0 . We can construct a cover C of S by replacing each S 0 ∈ C 0 by the corresponding superset S ∈ S. t u Given the above NP-completeness result, we show that the size of minimal equiva- lents specified in P1 through P4 is a linear function of the size of the minimal cover. To this end, we use the lemma below to obtain a lower bound on the size of equivalents. Intuitively, it states that for each entailed non-trivial equivalence C ≡ A, the TBox must contain at least one axiom that is at least as large as C 0 ≡ A for some C 0 with T |= C ≡ C 0 : Lemma 4. Let T be an EL TBox, A ∈ sig(T ) and C, D EL concepts such that T |= C ≡ A, T |= A v D (the latter is required for induction). Then, one of the following is true: 1. A is a conjunct of C (including the case C = A); 2. there exists an EL concept C 0 such that T |= C ≡ C 0 and C 0 ./ A ∈ T or C 0 ./ A u D0 ∈ T for some ./∈ {≡, v} and some concept D0 . Proof Sketch. For the full version of the proof, see extended version of the paper. We use the sound and complete proof system for general subsumption in EL terminologies introduced in [8] and prove the lemma by induction on the depth of the derivation of C v A u D. We assume that the proof has minimal depth and consider the possible rules that could have been applied last to derive C v A u D. In each case the lemma holds. t u The encoding of the dense set cover problem as P1-P4 is as follows. Consider an instance of the dense set cover problem with the carrier set A = {B1 , . . . , Bn }, the set S = {A1 , . . . , Am , {B1 }, . . . , {Bn }} of subsets that can be used to form a cover. By interpreting the set and element names as atomic concepts, we can construct TSbase as follows: TSbase = {A00 ≡ A0 u B | A00 , A0 ∈ S, B ∈ A, A00 = A0 ∪ {B}, A00 6= A0 }. d Observe that the size ofdTSbase is at least 3m. Clearly, TSbase |= Ai ≡ B∈Ai B. Let TS = TSbase ∪ {A ≡ B∈A B}. We establish the connection between the size of TS equivalents and the size of the cover of S as follows: Lemma 5. TS has an equivalent (as specified in P1-P4) of size ∫ (TSbase ) + k + 1 if, and only if, S has a cover of size k. Proof. For the if-direction, assume that S has a cover d of size k. We construct TS0 of size ∫ (TSbase ) + k + 1 as follows: TS = TSbase ∪ {A ≡ A0 ∈C A0 }. Clearly, TS0 ≡ TS . 0 Note that TS0 ∈ [TS ] and, therefore, also TS0 ∈ [TS ]t , [TS ]EL , [TS ]ELD . For the only-if-direction, we assume that k is minimal and argue that no equivalent T 0 ∈ [TS ]ELD of size ≤ ∫ (TSbase ) + k can exist. Assume that T is a minimal TBox with T ∈ [TS ]ELD . With the observation, that the m + n atomic concepts that repre- sent elements of S are pairwise not equivalent with each other or the concept A that represents the carrier set, we can conclude that no two atomic concepts are equivalent. From Lemma 4 it follows that, for each Ai with i ∈ {1, . . . , m}, there is an axiom Ci ≡ Ci0 ∈ T or Ci v Ci0 ∈ T such that T |= Ci ≡ Ai and Ai is a conjunct of Ci0 or Ai = Ci0 . Since there are no equivalent atomic concepts and Ci 6= Ai due to the minimality of T , the size of each such axiom is at least 3 and none of these axioms coincide. We will later make use of two obvious properties (*) of these axioms: 1. since TS 6|= Ai v A, A cannot occur as a conjunct of Ci or as a conjunct of Ci0 ; 2. these axioms cannot be (parts of) the definitions of atomic concepts representing disjunctions (as Ai is a conjunct of Ci0 ) or shortcuts (T |= Ci ≡ Ai ). Finally, we estimate the size of the remaining axioms and show that their cumulative size is > k. It also follows from Lemma 4 that there exists an axiom C ≡ C 0 ∈ T or C v C 0 ∈ T such 0 d that T |= C ≡ A and A is a conjunct of C or0 A = C . It holds 0 that Td|= C ≡ B∈A B. We also know that for no proper subset S ( A it holds that T |= B∈S 0 B v C. If C does not contain any shortcuts or disjunction replacements, then we have found a cover of S and the size of the axiom must be ≥ k+1. Assume that it contains auxiliary shortcut and disjunction concepts and let C 0 be the concept obtained by replacing all these concepts recursively in C until sig(C 0 ) ⊆ sig(TS ). It is clear that the cumulative size of the corresponding definitions for these auxiliary concept symbols cannot be smaller than the size of C 0 , which does not contain any concept symbols twice. Since T |= C 0 ≡ C, we have once more found a cover of S and the size of this axiom plus the size of definition axioms must be ≥ k + 1. From the two properties (*) of the axioms definition Ai we can conclude that none of these axioms can coincide. Thus, the overall size of T must be ≥ ∫ (TSbase ) + k + 1. t u Theorem 3. P1 through P4 are NP-hard. Proof. The theorem is an immediate consequence of Lemma 5. It establishes that all four problems can be used to solve the dense set cover problem, which is NP-complete according to Lemma 3. t u Thus, we establish completeness of the first two problems: Theorem 4. P1 and P2 are NP-complete. t u 6 Summary and Outlook In this paper, we have considered the problem of finding minimal equivalent representa- tions for ontologies expressed in the lightweight description logic EL that forms a basis of some large ontologies used in practice. We have shown that the task of finding such a representation (or rather: its related decision problem) is NP-complete. In addition to studying the problem of computing minimal equivalent TBoxes, we investigated the task of finding minimal representations for ontologies under signature extension. We considered scenarios, where auxiliary concepts are allowed to be used as shortcuts for complex EL concepts. We showed that this task is also NP-complete. For the corresponding decision problem with auxiliary concepts acting as shortcuts for a disjunction of EL concepts, we have established NP-hardness and inclusion in Σ2P . The same bounds hold for the combination of the two ways of extending the signature. There are various natural extensions of this work. The results obtained within this paper can easily be transferred to the context of ontology reuse, where a sub-signature becomes obsolete in a new context and a compact representation of the facts about the remaining terms is sought-after. Recent results on ontology reuse show that neither uniform interpolation nor standard module extraction guarantee the optimality of the extracted ontology [16]. Further, a question that naturally arises is that of tight complexity bounds when shortcuts for disjunctions are allowed for. Another target would be the complexity of identifying minimal TBoxes by the means of an arbitrary inseparable TBox, where we waive the requirement of explicitly defining the meaning of new concepts. An E XP - T IME upper bound for this problem is implied from the fact that the set of candidate TBoxes is exponential, and so is the general test for inseparability in EL. Minimizing representations is, of course, an interesting problem for all logics, and similar questions can (and should) be asked for more expressive ontology languages. While the concern of this paper is the complexity of the above problems, a natural follow-up task would be to develop efficient algorithms and tools that support ontology engineers in the development of succinct representations of their ontologies. Natural targets would be good heuristics and efficient approximations. For the latter, our proofs contain the bad news that there is no linear approximation scheme, as the set cover problem has no logarithmic approximations unless P equals NP. Finally, from practical point of view, it would be very interesting to investigate the potential improvement of succinctness in existing medical ontologies. Such a case study can be carried out after the corresponding tool support becomes available. Acknowledgments This work is supported by the EPSRC grant EP/H046623/1 ’Synthesis and Verification in Markov Game Structures’ and the University of Oxford. References 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.: The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press (2003) 2. Baader, F., Küsters, R., Molitor, R.: Rewriting concepts using terminologies. In: Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2000). (2000) 297–308 3. Grimm, S., Wissmann, J.: Elimination of redundancy in ontologies. In: Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011). (2011) 260–274 4. Bienvenu, M.: Prime implicates and prime implicants: From propositional to modal logic. Journal of Artificial Intelligence Research (JAIR) 36 (2009) 71–128 5. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005). (2005) 364–369 6. Motik, B., Cuenca Grau, B., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C., eds.: OWL 2 Web Ontology Language: Profiles. W3C Recommendation (27 October 2009) Available at http: //www.w3.org/TR/owl2-profiles/. 7. OWL Working Group, W.: OWL 2 Web Ontology Language: Document Overview. W3C Recommendation (27 October 2009) Available at http://www.w3.org/TR/owl2-overview/. 8. Nikitina, N., Rudolph, S.: ExpExpExplosion: Uniform interpolation in general EL termi- nologies. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI 2012). (2012) 618–623 9. Nikitina, N., Schewe, S.: More is Sometimes Less: Succinctness in EL. Techreport, Depart- ment of Computer Science, University of Oxford, Oxford (Mai 2013) 10. Ghilardi, S., Lutz, C., Wolter, F.: Did I Damage my Ontology? A Case for Conservative Extensions in Description Logics. In: Proceedings of the 10th International Conference on the Principles of Knowledge Representation and Reasoning (KR 2006). (2006) 187–197 11. Lutz, C., Walther, D., Wolter, F.: Conservative extensions in expressive description logics. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007). (2007) 453–458 12. Konev, B., Lutz, C., Walther, D., Wolter, F.: Semantic modularity and module extraction in description logics. In: Proceedings of the 18th European Conference on Artificial Intelli- gence (ECAI 2008). (2008) 55–59 13. Konev, B., Lutz, C., Walther, D., Wolter, F.: Formal properties of modularisation. In Stucken- schmidt, H., Parent, C., Spaccapietra, S., eds.: Modular Ontologies. Springer-Verlag (2009) 25–66 14. Lutz, C., Wolter, F.: Deciding inseparability and conservative extensions in the description logic EL. Journal of Symbolic Computation 45(2) (2010) pp.194–228 15. Kontchakov, R., Wolter, F., Zakharyaschev, M.: Can you tell the difference between dl-lite ontologies? In: Proceedings of the 11th International Conference on Principles of Knowledge Representation and Reasoning (KR 2008). (2008) 285–295 16. Nikitina, N., Glimm, B.: Hitting the sweetspot: Economic rewriting of knowledge bases. In: Proceedings of the 11th International Semantic Web Conference (ISWC 2012). (2012) 394–409