-

More is Sometimes Less: Succinctness in E L

Nadeschda Nikitina

Sven Schewe

0 0 University of Liverpool , UK 1 University of Oxford , UK

In logics, there are many ways to represent same facts. With respect to both reasoning and cognitive complexity, some representations are significantly less efficient than others. In this paper, we investigate different means of improving the succinctness of TBoxes expressed in the lightweight description logic E L that forms a basis of some large ontologies used in practice. As a measure of size, we consider the number of references to signature elements. We investigate the problem of finding minimal equivalent representations and show that this task is NP-complete. A significant (up to triple-exponential) further improvement can be achieved by the introduction of auxiliary concept symbols. Thus, we additionally investigate the task of finding minimal representations for an ontology by extending its signature. Since arbitrary extension of the ontology with concept symbols can make the ontology unreadable, we only allow for auxiliary concepts acting as shortcuts for other concepts (E L concepts and disjunctions thereof) expressed by means of terms of the original ontology. We show that this task is also NP-complete if shortcuts represent only E L concepts, and between NP and 2P , otherwise.

It is well-known that same facts can be represented in many ways, and that the size of these representations can vary significantly. Determining and increasing the degree of succinctness of a particular syntactic representation is an important, but also a very difficult task: for the average ontology, it is almost impossible to obtain the minimal representation without tool support. Thus, automated methods that help to assess the current succinctness of an ontology and generate suggestions on how to increase it would be highly valued by ontology engineers.

In description logics [ 1 ], only few results in this direction were obtained so far. Baader, Ku¨ ster, and Molitor [ 2 ] investigate rewriting concepts using terminologies in the narrow sense (sets of equivalence axioms where each defined atomic concept has exactly one definition). The investigated problem is a special case of minimizing a knowledge base by computing a minimal equivalent knowledge base. Grimm et al. [ 3 ] propose an algorithm for eliminating semantically redundant axioms from ontologies. In the above approach, axioms are considered as atoms that cannot be split into parts or changed in any other way. Bienvenu [ 4 ] proposes a normal form called prime implicates normal form for ALC ontologies, which enables fast reasoning. However, as a side-effect of this transformation, a doubly-exponential blowup in concept size can occur.

In this paper, we investigate the succinctness for the lightweight description logic E L [ 5 ], which is the logical underpinning of one of the tractable sub-languages (the so-called profiles [ 6 ]) of the W3C-specified OWL Web Ontology Language [ 7 ].

First, we consider the problem of finding a minimal equivalent E L representation for a given ontology. We show that the related decision problem (is there an equivalent ontology of size k?) is NP-complete.

Inspired by recent results on uniform interpolation in E L [ 8 ], we additionally consider an extended version of the problem. The above results imply that, even for the minimal equivalent representation of an ontology, an up to triple-exponentially more succinct representation can be obtained by extending its signature. Auxiliary concept symbols are therefore important contributors towards the succinctness of ontologies. It is easy to envision scenarios that demonstrate the usefulness of auxiliary concept symbols for improving succinctness. For instance, when a complex concept C is frequently used in the axioms of an ontology, the ontology will diminish in size when all occurrences of C are replaced by a fresh atomic concept AC , and an axiom AC C is added to the ontology. However, an arbitrary extension of the ontology with concept symbols whose meaning is not obvious can certainly make the ontology unreadable. In order to preserve comprehensiveness, we only allow for auxiliary concepts acting as shortcuts – concepts that are defined using only terms of the original ontology. Presented with such a shortcut concept, an ontology engineer could find an appropriate comprehensive name for it. Otherwise, the ontology engineer has to guess the meaning of an auxiliary concept and the chance that he approves the extension suggested by the tool would be low.

We demonstrate that auxiliary concept symbols acting as shortcuts for E L concepts expressed only by means of original ontology terms can lead to an exponential improvement of succinctness and that the corresponding decision problem (is there such a representation of size k?) is NP-complete.

Further, we show that, if we additionally allow for auxiliary concept symbols that act as shortcuts for disjunctions of E L concepts on the left-hand side of axioms (encodable in E L using several axioms), we can reduce the size of the representation by a further exponent, thereby obtaining doubly-exponentially more succinct representations.We show that the corresponding decision problem (is there such a representation of size k?) is NP-hard and included in 2P .

The paper is organized as follows: In Section 2, we recall the necessary preliminaries on description logics. Section 3 demonstrates the potential of auxiliary concept symbols acting as shortcuts for achieving a higher succinctness. In the same section, we also introduce the basic definitions of the size of ontologies as well as the investigated notions of equivalents with and without signature extension. In Sections 4,5, we derive the complexity bounds for the corresponding decision problems. Finally, we conclude and outline future work in Section 6. Further details and proofs can be found in the extended version [ 9 ] of this paper.

Preliminaries

We recall the basic notions in description logics [ 1 ] required in this paper. Let NC and NR be countably infinite and mutually disjoint sets of concept symbols and role symbols. An E L concept C is defined as

C ::= Aj>jC u Cj9r:C; where A and r range over NC and NR, respectively. In the following, we use symbols A; B to denote atomic concepts and C; D; E to denote arbitrary concepts. A terminology or TBox consists of concept inclusion axioms C v D and concept equivalence axioms C D used as a shorthand for C v D and D v C. The signature of an E L concept C or an axiom , denoted by sig(C) or sig( ), respectively, is the set of concept and role symbols occurring in it. To distinguish between the set of concept symbols and the set of role symbols, we use sigC (C) and sigR(C), respectively. The signature of a TBox T , in symbols sig(T ) (correspondingly, sigC (T ) and sigR(T )), is defined analogously. Next, we recall the semantics of the above introduced DL constructs, which is defined by means of interpretations. An interpretation I is given by the domain I and a function I assigning each concept A 2 NC a subset AI of I and each role r 2 NR a subset rI of I I . The interpretation of > is fixed to I . The interpretation of an arbitrary E L concept is defined inductively, i.e., (C u D)I = CI \ DI and (9r:C)I = fx j (x; y) 2 rI ; y 2 CI g. An interpretation I satisfies an axiom C v D if CI DI . I is a model of a TBox, if it satisfies all of its axioms. We say that a TBox T entails an axiom (in symbols, T j= ), if is satisfied by all models of T . A TBox T entails another TBox T 0, in symbols T j= T 0, if T j= for all 2 T 0. T T 0 is a shortcut for T j= T 0 and T 0 j= T .

In addition to E L, we will use disjunction on the left-hand side of axioms to obtain more succinct representations of E L TBoxes. Note that this extension is of a notational nature, i.e., does not give us the expressive power to represent more TBoxes than standard E L. We define an E LD concept C as

C ::= Aj>jC u CjC t Cj9r:C; where A and r range over NC and NR, respectively. The interpretation of an arbitrary E LD concept is defined analogously to the interpretation of E L concepts with the extension (C t D)I = CI [ DI . An E LD TBox consists of axioms that are either E L axioms or have the form C v D, where C is an E LD concept and D is an E L concept. Note that equivalence axioms (C D) do not contain E LD concepts, since they are a shortcut for C v D and D v C. 3

Achieving Succinctness in E L The size of a TBox is often measured by the number of axioms contained in it. This is, however, a very simplified view of the size in terms of both, cognitive complexity and reasoning. In this paper, we measure the size of a concept, an axiom, or a TBox by the number of references to signature elements.

Definition 1. The size of an E L concept D is defined as follows:

– for D 2 sig(T ), s (D) = 1; – for D = 9r:C, s (D) = s (C) + 1 where r 2 sigR(T ) and C is an arbitrary concept; – for D = C1 u C2, s (D) = s (C1) + s (C2) where C1; C2 are arbitrary concepts;

The size of an E L axiom or a TBox is accordingly defined as follows:

– s (C1 v C2) = s (C1) + s (C2) for concepts C1; C2; – s (C1 C2) = s (C1) + s (C2) for concepts C1; C2.

– s (T ) = P 2T s ( ) for a TBox T .

In practice, the suitable means that can be used to obtain a compact representation can differ depending on the scenario. To address cases, in which a signature extension is not feasible, we first consider the problem of finding the minimal equivalent E L representation for a given TBox among representations that use the same signature. Popular examples for avoidable non-succinctness are axioms that follow from other axioms and sub-concepts that can be removed from axioms without losing any logical consequences. While non-succinctness is easy to detect in these simple cases, nonsuccinctness can occur in many other forms. The ontology T = fC v 9r:C; 9r:C v 9r:D; 9r:D v Dg, for instance, does neither contain any axioms that are entailed by the remainder of the ontology, nor are there any sub-expressions that can be removed. However, there exists a smaller representation fC v 9r:C; C v D; 9r:D v Dg of T . The general version of the corresponding decision problem can be formulated as follows:

Definition 2 (P1). Given an E L TBox T and a natural number k, is there an E L TBox T 0 with s (T 0) k such that T 0 T .

We denote the set fT 0 j T 0 T g by [T ]. We will show that this decision problem, which does not involve any signature extensions, is already NP-complete.

Extending the Signature

From the user’s point of view as well as with respect to reasoning, it sometimes makes sense to introduce fresh concept symbols, for instance, used as shortcuts for complex concepts that occur frequently in the ontology. It can be a tedious task for an ontology engineer to do it in an advantageous way, since, as we will show later on, the corresponding decision problem is NP-hard. To account for scenarios, in which an introduction of auxiliary concept symbols is desirable, in addition to the decision problem introduced above we consider the problem of finding succinct representations containing shortcuts. We demonstrate by means of the following example the theoretical potential of such an extension of the signature with shortcuts: we show that it can lead to a doubly-exponentially more succinct representation of TBoxes.

Example 1. Let the sets Ci of concept descriptions be inductively defined by C0 = fA1; A2g, Ci+1 = f9r:C1 u 9s:C2 j C1; C2 2 Cig. For a natural number n, consider the TBox Tn = fC v B j C 2 Cn 1g.

Intuitively, the sets Ci of concepts have the shape of binary trees with exponentially many leaves, each of which can be A1 or A2. Clearly, the concepts grow exponentially with i. Further, it holds that jCi+1j = jCij2 and consequently jCij = 2(2i). Thus, Tn contains doubly exponentially many axioms, each of which has exponential size. While there is no smaller equivalent representation of Tn, this TBox can easily be represented in a more compact way using auxiliary concept symbols as shortcuts for complex E L or E LD concept expressions.

First, combining several axioms into a single axiom with a disjunction on the lefthand side would allow us to reduce the size of Tn from double-exponential to singleexponential: we can define C0 = fA1tA2g and thus express all elements of the set Cn 1 by means of a single concept Cn 1 that has the shape of a binary tree with the concept A1 t A2 as leaves. The corresponding E L TBox Tn0 can be obtained by introducing the concept B0 that represents the disjunction A1 t A2 by means of the axioms A1 v B0 and A2 v B0.

Second, by using fresh concept symbols as shortcuts for complex E L concepts, Tn0 can be reduced by a further exponential as follows: we introduce concept symbols Bi with i 2 f1; :::; n 1g to represent each Ci and obtain the following TBox Tn00: (1) (2) (3) (4) 5.

Bi+1

A1 v B0

A2 v B0 9r:Bi u 9s:Bi

Bn 1 v B i < n 1 As a result, the binary tree contracts into a chain of n+3 axioms j with s ( j ) In general, an extension of the signature has to be meaning-preserving in the sense that the logical consequences expressed using only the originally given signature remain unchanged. Formally, the corresponding “equivalence” between TBoxes with different signatures is captured by the notion of inseparability as investigated by various authors [ 10–15 ] in different variations. We base this work on the deductive notion of inseparability for E L. Two E L TBoxes, T1 and T2, are inseparable w.r.t. a signature if they have the same E L consequences whose signature is a subset of :

Definition 3. Let T1 and T2 be two general E L TBoxes and a signature. T1 and T2 are -inseparable, in symbols T1 EL T2, if for all E L concepts C; D with sig(C) [ sig(D) it holds that T1 j= C v D, iff T2 j= C v D.

Thus, the formal requirement for any TBox T 0 obtained from T by means of a signature extension is that it remains -inseparable from T , where = sig(T ). We take this into account in the subsequent definitions.

EL-Shortcuts

We now consider the problem of finding small TBoxes that are -inseparable from T (with = sig(T )) and use explicitly defined E L shortcuts. From Example 1, we can observe that a significantly higher effect can be achieved if shortcuts are introduced gradually such that previously introduced shortcuts can be used to define new ones. The definition below allows for a hierarchy of shortcuts. To ensure that shortcuts form a hierarchy, we impose an acyclicity condition on the syntactic references within the definitions of shortcuts.

Definition 4 (E L-Shortcuts). Let T be an E L TBox with sig(T ) = TBox T 0 is an equivalent with E L-shortcuts, in symbols T 0 2 [T ]EL, iff . Then an E L 1. T 0 EL T ; 2. sigR(T 0) = sigR(T ); 3. for all Ai 2 fA1; : : : ; Ang = sigC (T 0) n sigC (T ) there exists exactly one concept

Ci (called definition of Ai) such that Ai Ci 2 T 0;

4. for all i 2 f1; : : : ; ng it holds that sig(Ci) sig(T ) [ fAj j j < ig.

The introduction of E L-shortcuts corresponds to the second transformation of the TBox given in Example 1. The corresponding decision problem can be stated as follows:

Definition 5 (P2). Given an E L TBox T and a natural number k, is there an E L TBox T 0 with s (T 0) k such that T 0 2 [T ]EL.

It can be shown that the equivalence relation between T and its equivalent given in Definition 4 is stronger than deductive inseparability. It is called emulation and is defined as follows: Definition 6. Let T1 and T2 be two E L TBoxes. T2 emulates T1, in symbols T2 j=em T1, iff T2 j= T1 and every model of T1 can be extended into a model of T2. Clearly, T2 j=em T1 implies T2 EL T1 with establishes the role of E L-shortcuts within TBoxes: = sig(T1). The following lemma Lemma 1. Let T ; T 0 be two E L TBoxes such that T 0 2 [T ]EL and fA1; : : : ; Ang = sigC (T 0) n sigC (T ). Further, let Ci be the corresponding definition of Ai. Then for the TBox Text = T [ fAi Ci j i 2 f1; : : : ; ngg it holds that Text j=em T . Proof Sketch. Clearly, the interpretation of each Ai 62 is completely determined by the interpretations of symbols in (due to acyclicity condition on the syntactic references within the definitions of the shortcuts). Thus, we can extend each model of T by assigning AiI = CiI and obtain a model of Text. Additionally, Text j= T , since T

Text. t-Shortcuts The second important contribution of additional vocabulary elements to succinctness of E L TBoxes is their ability to act as a replacement for disjunction on the left-hand side of axioms. We can obtain a corresponding E L TBox T 0 from an E LD TBox T by replacing each disjunction C1 t ::: t Cn occurring in T by a fresh concept symbol A and extending T with axioms C1 v A; :::; Cn v A, called definitions of A. We denote such an E L representation of T by TEL(T ). tu

Definition 7 (t-Shortcuts). Let T be an E L TBox. Then an E L TBox T 0 is an equivalent with t-shortcuts, in symbols T 0 2 [T ]t, iff there is an E LD TBox T 00 such that T 00 T ; sig(T 00) sig(T ) and TEL(T 00) = T 0.

Introduction of t-shortcuts corresponds to the first transformation in Example 1. The corresponding decision problem is as follows:

Definition 8 (P3). Given an E L TBox T and a natural number k, is there an E L TBox T 0 with s (T 0) k such that T 0 2 [T ]t. ELD-Shortcuts

If we simultaneously allow for both types of shortcuts (note that these roles can never be played by a single concept at the same time!), we obtain the following definition of equivalents:

Definition 9 (E LD-Shortcuts). Let T be an E L TBox. Then an E L TBox T 0 is an equivalent with E LD-shortcuts, in symbols T 0 2 [T ]ELD, iff there is an E LD TBox T 00 such that Conditions 1-4 of Definition 4 are true for T 00 and T 0 = TEL(T 00).

The corresponding decision problem is stated as follows:

Definition 10 (P4). Given an E L TBox T and a natural number k, is there an E L TBox T 0 with s (T 0) k such that T 0 2 [T ]ELD.

The following inclusion relations between the above introduced notions hold: [T ] [T ] [T ]EL [T ]t [T ]ELD [T ]ELD

In the following, we show that problems P1-P2 are NP-complete, while the two problems involving t- and E LD-shortcuts (P3-P4) are between NP and 2P .

4 Inclusion in NP resp.

P 2 In this section, we investigate the upper complexity bound for the problems P1 -P4 and show that P1 -P2 are in NP and P3 -P4 in 2P . In case of P1, showing the upper bound is simple:

Theorem 1. P1 is in NP.

Proof. We ask the non-deterministic algorithm to guess such an equivalent TBox T 0 T of size k. Then, we check T 0 T in PTIME [ 5 ]. tu

The inclusion of P2 in NP (and of P3 and P4 in 2P ) is less straightforward, since deciding inseparability of E L TBoxes is known to be EXPTIME-complete and emulation is even undecidable [ 14 ]. For the inclusion of P3 and P4 in 2P , we make use of the following simple lemma: Lemma 2. Let T be an E LD TBox. TEL(T ) j=em T .

Proof Sketch. We can transform each model of T into a model of T 0 = TEL(T ) by successively adding AI = Sn

i=1 CiI for each concept A that is introduced to replace the disjunction Fn

i=1 Ci. Additionally, it can be show that TEL(T ) j= T holds, since fFin=1 Ci v Ag fCi v A j i 2 f1; : : : ; ngg and “v” is transitive. tu

Theorem 2. P2 is in NP and P3, P4 are in

2P .

Proof. Let T 0 the corresponding equivalent of an E L TBox T returned by the nondeterministic algorithm of size k. Now we consider how to verify that T 0 indeed fulfills the requirements stated in Definitions 4,7,9.

For P2, we have to verify Conditions 2-4 of Definition 4, which clearly can be done in polynomial time. In order to verify Condition 1 (T 0 EL T ), it is sufficient to insert the shortcut definitions into T and then test the equivalence of this extended TBox Text and T 0 for the following reasons: By Lemma 1, Text j=em T . Due to transitivity of

EL, Text T 0 implies T 0 EL T . It remains to show that Text T 0 only if T 0 EL T . Let us assume for contradiction that there exists an inclusion axiom C v D 2 T 0 such that Text 6j= C v D. Then we can obtain concepts C0; D0 with sig(C0)[sig(D0) by recursively replacing shortcuts by their definitions such that Text 6j= C0 v D0 and Text j= C0 v D0. With, T EL Text we can conclude T 6 EL T 0.

For P3, we need to show that there exists an E LD TBox T 00 such that sig(T 00) sig(T ); T 0 = TEL(T 00) and T 00 T . In case there exists such T 00, we can obtain it from T 0 by replacing the introduced concept symbols by the corresponding disjunctions of definitions. As T 0 and T 00 are -inseparable with = sig(T 00) by Lemma 2, it suffices to show T 0 j= T and T j= T 00. The first is standard reasoning in E L and can clearly be performed in polynomial time. The refutation of the latter, i.e., showing T 6j= T 00, can be done in NP: if T 6j= T 00, then, for some concretization C v D of some axiom of T 0 (where a concretization is simply the replacement of each disjunction by one of its disjuncts) T 6j= C v D holds. A non-deterministic machine can simply guess the axiom and its concretization. Consequently, testing T j= T 00 is in CONP and P3 thus in 2P . (Note that it suffices to call the oracle once at the end.)

For P4, we can simply combine these tests. tu Clearly, the results of this section also apply to tractable extensions of E L. 5

NP-Hardness of P1- P4

In this section, we show the NP-hardness of problems P1 through P4 by a reduction from the set cover problem, which is one of the standard NP-complete problems. For a given set S = fS1; S2; : : : ; Sng with carrier set S = Sin=1 Si, a cover C S is a subset of S , such that the union of the sets in C covers S, i.e., S = SC2C C.

The set cover problem is the problem to determine, for a given set S = fS1; S2; : : : ; Sng and a given integer k, if there is a cover C of S with at most k jCj elements.

We will use a restricted version of the set cover problem, which we call the dense set cover problem (DSCP). In the dense set cover problem, we require that – neither the carrier set S nor the empty set is in S, – all singleton subsets (sets with exactly one element) of S are in S, and – if a non-singleton set S is in S, so is some subset S0 S, which contains only one element less than S (jS r S0j = 1).

Lemma 3. The dense set cover problem is NP-complete.

Proof. Inclusion in NP is inherited from the set cover problem, of which it is a special instance.

We now reduce solving the set cover problem to solving the dense set cover problem. We start with a set cover problem for a given S and k, and first check if the carrier set S is contained in S (if so, the problem is solved). If it is not the case, we identify the size l of the largest set in S, initialise S0 to S and extend S0 using the following algorithm: – while l > 1 do for all S 2 S0, choose an s 2 S and join S0 with S r fsg decrement l by one.

After this, we join S with ffsg j s 2 Sg, and remove the empty set from S if applicable. Note that S0 can easily be constructed in polynomial time. Now we show that there is a cover C of size k of S exactly if there is a cover C0 of size k of S0. W.l.o.g., we can assume that ; 62 C, since we always obtain a cover from any cover C by removing ; from it. Since S S0 [ f;g, any cover of S is a cover of S0. Let C0 be a cover of size k of S0. We can construct a cover C of S by replacing each S0 2 C0 by the corresponding superset S 2 S. tu

Given the above NP-completeness result, we show that the size of minimal equivalents specified in P1 through P4 is a linear function of the size of the minimal cover. To this end, we use the lemma below to obtain a lower bound on the size of equivalents. Intuitively, it states that for each entailed non-trivial equivalence C A, the TBox must contain at least one axiom that is at least as large as C0 A for some C0 with T j= C C0: Lemma 4. Let T be an E L TBox, A 2 sig(T ) and C; D E L concepts such that T j= C A, T j= A v D (the latter is required for induction). Then, one of the following is true: 1. A is a conjunct of C (including the case C = A); 2. there exists an E L concept C0 such that T j= C C0 and C0 ./ A 2 T or

C0 ./ A u D0 2 T for some ./2 f ; vg and some concept D0.

Proof Sketch. For the full version of the proof, see extended version of the paper. We use the sound and complete proof system for general subsumption in E L terminologies introduced in [ 8 ] and prove the lemma by induction on the depth of the derivation of C v A u D. We assume that the proof has minimal depth and consider the possible rules that could have been applied last to derive C v A u D. In each case the lemma holds. tu The encoding of the dense set cover problem as P1-P4 is as follows.

Consider an instance of the dense set cover problem with the carrier set A = fB1; : : : ; Bng, the set S = fA1; : : : ; Am; fB1g; : : : ; fBngg of subsets that can be used to form a cover. By interpreting the set and element names as atomic concepts, we can construct TSbase as follows:

TSbase = fA00

A0 u B j A00; A0 2 S; B 2 A; A00 = A0 [ fBg; A00 6= A0g:

Observe that the size odfTSbase is at least 3m. Clearly, TSbase j= Ai dB2Ai B. Let TS = TSbase [ fA B2A Bg. We establish the connection between the size of TS equivalents and the size of the cover of S as follows:

Lemma 5. TS has an equivalent (as specified in P1-P4) of size s (TSbase) + k + 1 if, and only if, S has a cover of size k.

Proof. For the if-direction, assume that S has a cover of size k. We construct TS0 of size s (TSbase) + k + 1 as follows: TS0 = TSbase [ fA dA02C A0g. Clearly, TS0 TS . Note that TS0 2 [TS ] and, therefore, also TS0 2 [TS ]t; [TS ]EL; [TS ]ELD.

For the only-if-direction, we assume that k is minimal and argue that no equivalent T 0 2 [TS ]ELD of size s (TSbase) + k can exist. Assume that T is a minimal TBox with T 2 [TS ]ELD. With the observation, that the m + n atomic concepts that represent elements of S are pairwise not equivalent with each other or the concept A that represents the carrier set, we can conclude that no two atomic concepts are equivalent. From Lemma 4 it follows that, for each Ai with i 2 f1; : : : ; mg, there is an axiom Ci Ci0 2 T or Ci v Ci0 2 T such that T j= Ci Ai and Ai is a conjunct of Ci0 or Ai = C0. Since there are no equivalent atomic concepts and Ci 6= Ai due to the i minimality of T , the size of each such axiom is at least 3 and none of these axioms coincide. We will later make use of two obvious properties (*) of these axioms: 1. since TS 6j= Ai v A, A cannot occur as a conjunct of Ci or as a conjunct of C0; i 2. these axioms cannot be (parts of) the definitions of atomic concepts representing disjunctions (as Ai is a conjunct of Ci0) or shortcuts (T j= Ci Ai).

Finally, we estimate the size of the remaining axioms and show that their cumulative size is > k. It also follows from Lemma 4 that there exists an axiom C C0 2 T or C v C0 2 T such that T j= C A and A is a conjunct of C0 or A = C0. It holds that T j= C dB2A B. We also know that for no proper subset S0 ( A it holds that T j= dB2S0 B v C.

If C does not contain any shortcuts or disjunction replacements, then we have found a cover of S and the size of the axiom must be k +1. Assume that it contains auxiliary shortcut and disjunction concepts and let C0 be the concept obtained by replacing all these concepts recursively in C until sig(C0) sig(TS ). It is clear that the cumulative size of the corresponding definitions for these auxiliary concept symbols cannot be smaller than the size of C0, which does not contain any concept symbols twice. Since T j= C0 C, we have once more found a cover of S and the size of this axiom plus the size of definition axioms must be k + 1. From the two properties (*) of the axioms definition Ai we can conclude that none of these axioms can coincide. Thus, the overall size of T must be s (TSbase) + k + 1. tu Theorem 4. P1 and P2 are NP-complete. 6

Summary and Outlook

tu Theorem 3. P1 through P4 are NP-hard.

Proof. The theorem is an immediate consequence of Lemma 5. It establishes that all four problems can be used to solve the dense set cover problem, which is NP-complete according to Lemma 3. tu

Thus, we establish completeness of the first two problems: In this paper, we have considered the problem of finding minimal equivalent representations for ontologies expressed in the lightweight description logic E L that forms a basis of some large ontologies used in practice. We have shown that the task of finding such a representation (or rather: its related decision problem) is NP-complete.

In addition to studying the problem of computing minimal equivalent TBoxes, we investigated the task of finding minimal representations for ontologies under signature extension. We considered scenarios, where auxiliary concepts are allowed to be used as shortcuts for complex E L concepts. We showed that this task is also NP-complete. For the corresponding decision problem with auxiliary concepts acting as shortcuts for a disjunction of E L concepts, we have established NP-hardness and inclusion in 2P . The same bounds hold for the combination of the two ways of extending the signature.

There are various natural extensions of this work. The results obtained within this paper can easily be transferred to the context of ontology reuse, where a sub-signature becomes obsolete in a new context and a compact representation of the facts about the remaining terms is sought-after. Recent results on ontology reuse show that neither uniform interpolation nor standard module extraction guarantee the optimality of the extracted ontology [ 16 ].

Further, a question that naturally arises is that of tight complexity bounds when shortcuts for disjunctions are allowed for. Another target would be the complexity of identifying minimal TBoxes by the means of an arbitrary inseparable TBox, where we waive the requirement of explicitly defining the meaning of new concepts. An EXPTIME upper bound for this problem is implied from the fact that the set of candidate TBoxes is exponential, and so is the general test for inseparability in E L.

Minimizing representations is, of course, an interesting problem for all logics, and similar questions can (and should) be asked for more expressive ontology languages.

While the concern of this paper is the complexity of the above problems, a natural follow-up task would be to develop efficient algorithms and tools that support ontology engineers in the development of succinct representations of their ontologies. Natural targets would be good heuristics and efficient approximations. For the latter, our proofs contain the bad news that there is no linear approximation scheme, as the set cover problem has no logarithmic approximations unless P equals NP.

Finally, from practical point of view, it would be very interesting to investigate the potential improvement of succinctness in existing medical ontologies. Such a case study can be carried out after the corresponding tool support becomes available. Acknowledgments This work is supported by the EPSRC grant EP/H046623/1 ’Synthesis and Verification in Markov Game Structures’ and the University of Oxford.

1. Baader , F. , Calvanese , D. , McGuinness , D. , Nardi , D. , Patel-Schneider , P. : The Description Logic Handbook: Theory, Implementation and Applications . Cambridge University Press ( 2003 )

2. Baader , F. , Ku¨sters, R., Molitor , R.: Rewriting concepts using terminologies . In: Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2000 ). ( 2000 ) 297 - 308

3. Grimm , S. , Wissmann , J.: Elimination of redundancy in ontologies . In: Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011 ). ( 2011 ) 260 - 274

4. Bienvenu , M. : Prime implicates and prime implicants: From propositional to modal logic . Journal of Artificial Intelligence Research (JAIR) 36 ( 2009 ) 71 - 128

5. Baader , F. , Brandt , S. , Lutz , C. : Pushing the E L envelope . In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005 ). ( 2005 ) 364 - 369

6. Motik , B. , Cuenca Grau , B. , Horrocks , I. , Wu , Z. , Fokoue , A. , Lutz , C., eds. : OWL 2 Web Ontology Language: Profiles. W3C Recommendation ( 27 October 2009 ) Available at http: //www.w3.org/TR/owl2-profiles/.

7. OWL Working Group, W.: OWL 2 Web Ontology Language: Document Overview . W3C Recommendation ( 27 October 2009 ) Available at http://www.w3.org/TR/owl2-overview/.

8. Nikitina , N. , Rudolph , S.: ExpExpExplosion: Uniform interpolation in general EL terminologies . In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI 2012 ). ( 2012 ) 618 - 623

9. Nikitina , N. , Schewe , S. : More is Sometimes Less: Succinctness in E L. Techreport , Department of Computer Science, University of Oxford, Oxford ( Mai 2013 )

10. Ghilardi , S. , Lutz , C. , Wolter , F. : Did I Damage my Ontology? A Case for Conservative Extensions in Description Logics . In: Proceedings of the 10th International Conference on the Principles of Knowledge Representation and Reasoning (KR 2006 ). ( 2006 ) 187 - 197

11. Lutz , C. , Walther , D. , Wolter , F. : Conservative extensions in expressive description logics . In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007 ). ( 2007 ) 453 - 458

12. Konev , B. , Lutz , C. , Walther , D. , Wolter , F. : Semantic modularity and module extraction in description logics . In: Proceedings of the 18th European Conference on Artificial Intelligence (ECAI 2008 ). ( 2008 ) 55 - 59

13. Konev , B. , Lutz , C. , Walther , D. , Wolter , F. : Formal properties of modularisation . In Stuckenschmidt, H., Parent , C. , Spaccapietra , S., eds.: Modular Ontologies. Springer-Verlag ( 2009 ) 25 - 66

14. Lutz , C. , Wolter , F. : Deciding inseparability and conservative extensions in the description logic E L . Journal of Symbolic Computation 45 ( 2 ) ( 2010 ) pp. 194 - 228

15. Kontchakov , R. , Wolter , F. , Zakharyaschev , M. : Can you tell the difference between dl-lite ontologies? In: Proceedings of the 11th International Conference on Principles of Knowledge Representation and Reasoning (KR 2008 ). ( 2008 ) 285 - 295

16. Nikitina , N. , Glimm , B. : Hitting the sweetspot: Economic rewriting of knowledge bases . In: Proceedings of the 11th International Semantic Web Conference (ISWC 2012 ). ( 2012 ) 394 - 409