=Paper= {{Paper |id=None |storemode=property |title=More is Sometimes Less: Succinctness in EL |pdfUrl=https://ceur-ws.org/Vol-1014/paper_49.pdf |volume=Vol-1014 |dblpUrl=https://dblp.org/rec/conf/dlog/NikitinaS13 }} ==More is Sometimes Less: Succinctness in EL== https://ceur-ws.org/Vol-1014/paper_49.pdf
         More is Sometimes Less: Succinctness in EL

                         Nadeschda Nikitina1 and Sven Schewe2
                                    1
                                         University of Oxford, UK
                                2
                                        University of Liverpool, UK



       Abstract. In logics, there are many ways to represent same facts. With respect to
       both reasoning and cognitive complexity, some representations are significantly
       less efficient than others. In this paper, we investigate different means of improv-
       ing the succinctness of TBoxes expressed in the lightweight description logic EL
       that forms a basis of some large ontologies used in practice. As a measure of size,
       we consider the number of references to signature elements. We investigate the
       problem of finding minimal equivalent representations and show that this task is
       NP-complete.
       A significant (up to triple-exponential) further improvement can be achieved by
       the introduction of auxiliary concept symbols. Thus, we additionally investigate
       the task of finding minimal representations for an ontology by extending its sig-
       nature. Since arbitrary extension of the ontology with concept symbols can make
       the ontology unreadable, we only allow for auxiliary concepts acting as shortcuts
       for other concepts (EL concepts and disjunctions thereof) expressed by means
       of terms of the original ontology. We show that this task is also NP-complete if
       shortcuts represent only EL concepts, and between NP and Σ2P , otherwise.


1   Introduction

It is well-known that same facts can be represented in many ways, and that the size
of these representations can vary significantly. Determining and increasing the degree
of succinctness of a particular syntactic representation is an important, but also a very
difficult task: for the average ontology, it is almost impossible to obtain the minimal
representation without tool support. Thus, automated methods that help to assess the
current succinctness of an ontology and generate suggestions on how to increase it
would be highly valued by ontology engineers.
     In description logics [1], only few results in this direction were obtained so far.
Baader, Küster, and Molitor [2] investigate rewriting concepts using terminologies in
the narrow sense (sets of equivalence axioms where each defined atomic concept has
exactly one definition). The investigated problem is a special case of minimizing a
knowledge base by computing a minimal equivalent knowledge base. Grimm et al. [3]
propose an algorithm for eliminating semantically redundant axioms from ontologies.
In the above approach, axioms are considered as atoms that cannot be split into parts
or changed in any other way. Bienvenu [4] proposes a normal form called prime im-
plicates normal form for ALC ontologies, which enables fast reasoning. However, as
a side-effect of this transformation, a doubly-exponential blowup in concept size can
occur.
    In this paper, we investigate the succinctness for the lightweight description logic
EL [5], which is the logical underpinning of one of the tractable sub-languages (the
so-called profiles [6]) of the W3C-specified OWL Web Ontology Language [7].

    First, we consider the problem of finding a minimal equivalent EL representation
for a given ontology. We show that the related decision problem (is there an equivalent
ontology of size ≤ k?) is NP-complete.

    Inspired by recent results on uniform interpolation in EL [8], we additionally con-
sider an extended version of the problem. The above results imply that, even for the
minimal equivalent representation of an ontology, an up to triple-exponentially more
succinct representation can be obtained by extending its signature. Auxiliary concept
symbols are therefore important contributors towards the succinctness of ontologies. It
is easy to envision scenarios that demonstrate the usefulness of auxiliary concept sym-
bols for improving succinctness. For instance, when a complex concept C is frequently
used in the axioms of an ontology, the ontology will diminish in size when all occur-
rences of C are replaced by a fresh atomic concept AC , and an axiom AC ≡ C is added
to the ontology. However, an arbitrary extension of the ontology with concept symbols
whose meaning is not obvious can certainly make the ontology unreadable. In order to
preserve comprehensiveness, we only allow for auxiliary concepts acting as shortcuts
– concepts that are defined using only terms of the original ontology. Presented with
such a shortcut concept, an ontology engineer could find an appropriate comprehensive
name for it. Otherwise, the ontology engineer has to guess the meaning of an auxiliary
concept and the chance that he approves the extension suggested by the tool would be
low.

    We demonstrate that auxiliary concept symbols acting as shortcuts for EL concepts
expressed only by means of original ontology terms can lead to an exponential im-
provement of succinctness and that the corresponding decision problem (is there such a
representation of size ≤ k?) is NP-complete.

    Further, we show that, if we additionally allow for auxiliary concept symbols that
act as shortcuts for disjunctions of EL concepts on the left-hand side of axioms (en-
codable in EL using several axioms), we can reduce the size of the representation by
a further exponent, thereby obtaining doubly-exponentially more succinct representa-
tions.We show that the corresponding decision problem (is there such a representation
of size ≤ k?) is NP-hard and included in Σ2P .

    The paper is organized as follows: In Section 2, we recall the necessary prelimi-
naries on description logics. Section 3 demonstrates the potential of auxiliary concept
symbols acting as shortcuts for achieving a higher succinctness. In the same section, we
also introduce the basic definitions of the size of ontologies as well as the investigated
notions of equivalents with and without signature extension. In Sections 4,5, we derive
the complexity bounds for the corresponding decision problems. Finally, we conclude
and outline future work in Section 6. Further details and proofs can be found in the
extended version [9] of this paper.
2   Preliminaries
We recall the basic notions in description logics [1] required in this paper. Let NC
and NR be countably infinite and mutually disjoint sets of concept symbols and role
symbols. An EL concept C is defined as

                                C ::= A|>|C u C|∃r.C,

where A and r range over NC and NR , respectively. In the following, we use symbols
A, B to denote atomic concepts and C, D, E to denote arbitrary concepts. A terminol-
ogy or TBox consists of concept inclusion axioms C v D and concept equivalence
axioms C ≡ D used as a shorthand for C v D and D v C. The signature of an EL
concept C or an axiom α, denoted by sig(C) or sig(α), respectively, is the set of con-
cept and role symbols occurring in it. To distinguish between the set of concept symbols
and the set of role symbols, we use sigC (C) and sigR (C), respectively. The signature of
a TBox T , in symbols sig(T ) (correspondingly, sigC (T ) and sigR (T )), is defined anal-
ogously. Next, we recall the semantics of the above introduced DL constructs, which
is defined by means of interpretations. An interpretation I is given by the domain ∆I
and a function ·I assigning each concept A ∈ NC a subset AI of ∆I and each role
r ∈ NR a subset rI of ∆I × ∆I . The interpretation of > is fixed to ∆I . The interpre-
tation of an arbitrary EL concept is defined inductively, i.e., (C u D)I = C I ∩ DI and
(∃r.C)I = {x | (x, y) ∈ rI , y ∈ C I }. An interpretation I satisfies an axiom C v D
if C I ⊆ DI . I is a model of a TBox, if it satisfies all of its axioms. We say that a TBox
T entails an axiom α (in symbols, T |= α), if α is satisfied by all models of T . A TBox
T entails another TBox T 0 , in symbols T |= T 0 , if T |= α for all α ∈ T 0 . T ≡ T 0 is a
shortcut for T |= T 0 and T 0 |= T .
    In addition to EL, we will use disjunction on the left-hand side of axioms to obtain
more succinct representations of EL TBoxes. Note that this extension is of a notational
nature, i.e., does not give us the expressive power to represent more TBoxes than stan-
dard EL. We define an ELD concept C as

                            C ::= A|>|C u C|C t C|∃r.C,

where A and r range over NC and NR , respectively. The interpretation of an arbitrary
ELD concept is defined analogously to the interpretation of EL concepts with the ex-
tension (C t D)I = C I ∪ DI . An ELD TBox consists of axioms that are either EL
axioms or have the form C v D, where C is an ELD concept and D is an EL concept.
Note that equivalence axioms (C ≡ D) do not contain ELD concepts, since they are a
shortcut for C v D and D v C.


3   Achieving Succinctness in EL
The size of a TBox is often measured by the number of axioms contained in it. This is,
however, a very simplified view of the size in terms of both, cognitive complexity and
reasoning. In this paper, we measure the size of a concept, an axiom, or a TBox by the
number of references to signature elements.
Definition 1. The size of an EL concept D is defined as follows:
 – for D ∈ sig(T ), ∫ (D) = 1;
 – for D = ∃r.C, ∫ (D) = ∫ (C) + 1 where r ∈ sigR (T ) and C is an arbitrary
   concept;
 – for D = C1 u C2 , ∫ (D) = ∫ (C1 ) + ∫ (C2 ) where C1 , C2 are arbitrary concepts;
The size of an EL axiom or a TBox is accordingly defined as follows:
 – ∫ (C1 v C2 ) = ∫ (C1 ) + ∫ (C2 ) for concepts C1 , C2 ;
 – ∫ (C1 ≡ C
           P2 ) = ∫ (C1 ) + ∫ (C2 ) for concepts C1 , C2 .
 – ∫ (T ) = α∈T ∫ (α) for a TBox T .

In practice, the suitable means that can be used to obtain a compact representation
can differ depending on the scenario. To address cases, in which a signature extension
is not feasible, we first consider the problem of finding the minimal equivalent EL
representation for a given TBox among representations that use the same signature.
Popular examples for avoidable non-succinctness are axioms that follow from other
axioms and sub-concepts that can be removed from axioms without losing any logical
consequences. While non-succinctness is easy to detect in these simple cases, non-
succinctness can occur in many other forms. The ontology T = {C v ∃r.C, ∃r.C v
∃r.D, ∃r.D v D}, for instance, does neither contain any axioms that are entailed by
the remainder of the ontology, nor are there any sub-expressions that can be removed.
However, there exists a smaller representation {C v ∃r.C, C v D, ∃r.D v D} of
T . The general version of the corresponding decision problem can be formulated as
follows:
Definition 2 (P1). Given an EL TBox T and a natural number k, is there an EL TBox
T 0 with ∫ (T 0 ) ≤ k such that T 0 ≡ T .
We denote the set {T 0 | T 0 ≡ T } by [T ]. We will show that this decision problem,
which does not involve any signature extensions, is already NP-complete.

Extending the Signature
From the user’s point of view as well as with respect to reasoning, it sometimes makes
sense to introduce fresh concept symbols, for instance, used as shortcuts for complex
concepts that occur frequently in the ontology. It can be a tedious task for an ontol-
ogy engineer to do it in an advantageous way, since, as we will show later on, the
corresponding decision problem is NP-hard. To account for scenarios, in which an in-
troduction of auxiliary concept symbols is desirable, in addition to the decision problem
introduced above we consider the problem of finding succinct representations contain-
ing shortcuts. We demonstrate by means of the following example the theoretical po-
tential of such an extension of the signature with shortcuts: we show that it can lead to
a doubly-exponentially more succinct representation of TBoxes.
Example 1. Let the sets Ci of concept descriptions be inductively defined by C0 =
{A1 , A2 }, Ci+1 = {∃r.C1 u ∃s.C2 | C1 , C2 ∈ Ci }. For a natural number n, consider
the TBox Tn = {C v B | C ∈ Cn−1 }.
Intuitively, the sets Ci of concepts have the shape of binary trees with exponentially
many leaves, each of which can be A1 or A2 . Clearly, the concepts grow exponentially
                                                                             i
with i. Further, it holds that |Ci+1 | = |Ci |2 and consequently |Ci | = 2(2 ) . Thus, Tn
contains doubly exponentially many axioms, each of which has exponential size. While
there is no smaller equivalent representation of Tn , this TBox can easily be represented
in a more compact way using auxiliary concept symbols as shortcuts for complex EL
or ELD concept expressions.
    First, combining several axioms into a single axiom with a disjunction on the left-
hand side would allow us to reduce the size of Tn from double-exponential to single-
exponential: we can define C0 = {A1 tA2 } and thus express all elements of the set Cn−1
by means of a single concept Cn−1 that has the shape of a binary tree with the concept
A1 t A2 as leaves. The corresponding EL TBox Tn0 can be obtained by introducing the
concept B0 that represents the disjunction A1 t A2 by means of the axioms A1 v B0
and A2 v B0 .
    Second, by using fresh concept symbols as shortcuts for complex EL concepts, Tn0
can be reduced by a further exponential as follows: we introduce concept symbols Bi
with i ∈ {1, ..., n − 1} to represent each Ci and obtain the following TBox Tn00 :

                                       A1 v B 0                                       (1)
                                       A2 v B 0                                       (2)
                        Bi+1 ≡ ∃r.Bi u ∃s.Bi         i 1 do
    • for all S ∈ S 0 , choose an s ∈ S and join S 0 with S r {s}
    • decrement l by one.
After this, we join S with {{s} | s ∈ S}, and remove the empty set from S if applicable.
Note that S 0 can easily be constructed in polynomial time. Now we show that there is
a cover C of size ≤ k of S exactly if there is a cover C 0 of size ≤ k of S 0 . W.l.o.g., we
can assume that ∅ 6∈ C, since we always obtain a cover from any cover C by removing
∅ from it. Since S ⊆ S 0 ∪ {∅}, any cover of S is a cover of S 0 . Let C 0 be a cover of
size ≤ k of S 0 . We can construct a cover C of S by replacing each S 0 ∈ C 0 by the
corresponding superset S ∈ S.                                                              t
                                                                                           u
    Given the above NP-completeness result, we show that the size of minimal equiva-
lents specified in P1 through P4 is a linear function of the size of the minimal cover. To
this end, we use the lemma below to obtain a lower bound on the size of equivalents.
Intuitively, it states that for each entailed non-trivial equivalence C ≡ A, the TBox
must contain at least one axiom that is at least as large as C 0 ≡ A for some C 0 with
T |= C ≡ C 0 :

Lemma 4. Let T be an EL TBox, A ∈ sig(T ) and C, D EL concepts such that T |=
C ≡ A, T |= A v D (the latter is required for induction). Then, one of the following
is true:
 1. A is a conjunct of C (including the case C = A);
 2. there exists an EL concept C 0 such that T |= C ≡ C 0 and C 0 ./ A ∈ T or
    C 0 ./ A u D0 ∈ T for some ./∈ {≡, v} and some concept D0 .

Proof Sketch. For the full version of the proof, see extended version of the paper. We
use the sound and complete proof system for general subsumption in EL terminologies
introduced in [8] and prove the lemma by induction on the depth of the derivation of
C v A u D. We assume that the proof has minimal depth and consider the possible
rules that could have been applied last to derive C v A u D. In each case the lemma
holds.                                                                               t
                                                                                     u
    The encoding of the dense set cover problem as P1-P4 is as follows.
   Consider an instance of the dense set cover problem with the carrier set A =
{B1 , . . . , Bn }, the set S = {A1 , . . . , Am , {B1 }, . . . , {Bn }} of subsets that can be
used to form a cover. By interpreting the set and element names as atomic concepts, we
can construct TSbase as follows:

     TSbase = {A00 ≡ A0 u B | A00 , A0 ∈ S, B ∈ A, A00 = A0 ∪ {B}, A00 6= A0 }.
                                                                           d
    Observe that the size ofdTSbase is at least 3m. Clearly, TSbase |= Ai ≡ B∈Ai B.
Let TS = TSbase ∪ {A ≡ B∈A B}. We establish the connection between the size of
TS equivalents and the size of the cover of S as follows:

Lemma 5. TS has an equivalent (as specified in P1-P4) of size ∫ (TSbase ) + k + 1 if,
and only if, S has a cover of size k.

Proof. For the if-direction, assume that S has a cover d      of size k. We construct TS0 of
size ∫ (TSbase ) + k + 1 as follows: TS = TSbase ∪ {A ≡ A0 ∈C A0 }. Clearly, TS0 ≡ TS .
                                       0

Note that TS0 ∈ [TS ] and, therefore, also TS0 ∈ [TS ]t , [TS ]EL , [TS ]ELD .
    For the only-if-direction, we assume that k is minimal and argue that no equivalent
T 0 ∈ [TS ]ELD of size ≤ ∫ (TSbase ) + k can exist. Assume that T is a minimal TBox
with T ∈ [TS ]ELD . With the observation, that the m + n atomic concepts that repre-
sent elements of S are pairwise not equivalent with each other or the concept A that
represents the carrier set, we can conclude that no two atomic concepts are equivalent.
From Lemma 4 it follows that, for each Ai with i ∈ {1, . . . , m}, there is an axiom
Ci ≡ Ci0 ∈ T or Ci v Ci0 ∈ T such that T |= Ci ≡ Ai and Ai is a conjunct of Ci0
or Ai = Ci0 . Since there are no equivalent atomic concepts and Ci 6= Ai due to the
minimality of T , the size of each such axiom is at least 3 and none of these axioms
coincide. We will later make use of two obvious properties (*) of these axioms:

 1. since TS 6|= Ai v A, A cannot occur as a conjunct of Ci or as a conjunct of Ci0 ;
 2. these axioms cannot be (parts of) the definitions of atomic concepts representing
    disjunctions (as Ai is a conjunct of Ci0 ) or shortcuts (T |= Ci ≡ Ai ).

    Finally, we estimate the size of the remaining axioms and show that their cumulative
size is > k. It also follows from Lemma 4 that there exists an axiom C ≡ C 0 ∈ T or
C v C 0 ∈ T such                                                     0
                 d that T |= C ≡ A and A is a conjunct of C or0 A = C . It holds
                                                                                  0

that Td|= C ≡ B∈A B. We also know that for no proper subset S ( A it holds that
T |= B∈S 0 B v C.
    If C does not contain any shortcuts or disjunction replacements, then we have found
a cover of S and the size of the axiom must be ≥ k+1. Assume that it contains auxiliary
shortcut and disjunction concepts and let C 0 be the concept obtained by replacing all
these concepts recursively in C until sig(C 0 ) ⊆ sig(TS ). It is clear that the cumulative
size of the corresponding definitions for these auxiliary concept symbols cannot be
smaller than the size of C 0 , which does not contain any concept symbols twice. Since
T |= C 0 ≡ C, we have once more found a cover of S and the size of this axiom plus the
size of definition axioms must be ≥ k + 1. From the two properties (*) of the axioms
definition Ai we can conclude that none of these axioms can coincide. Thus, the overall
size of T must be ≥ ∫ (TSbase ) + k + 1.                                                  t
                                                                                          u
Theorem 3. P1 through P4 are NP-hard.
Proof. The theorem is an immediate consequence of Lemma 5. It establishes that all
four problems can be used to solve the dense set cover problem, which is NP-complete
according to Lemma 3.                                                             t
                                                                                  u
    Thus, we establish completeness of the first two problems:
Theorem 4. P1 and P2 are NP-complete.                                                   t
                                                                                        u

6   Summary and Outlook
In this paper, we have considered the problem of finding minimal equivalent representa-
tions for ontologies expressed in the lightweight description logic EL that forms a basis
of some large ontologies used in practice. We have shown that the task of finding such
a representation (or rather: its related decision problem) is NP-complete.
    In addition to studying the problem of computing minimal equivalent TBoxes, we
investigated the task of finding minimal representations for ontologies under signature
extension. We considered scenarios, where auxiliary concepts are allowed to be used
as shortcuts for complex EL concepts. We showed that this task is also NP-complete.
For the corresponding decision problem with auxiliary concepts acting as shortcuts for
a disjunction of EL concepts, we have established NP-hardness and inclusion in Σ2P .
The same bounds hold for the combination of the two ways of extending the signature.
    There are various natural extensions of this work. The results obtained within this
paper can easily be transferred to the context of ontology reuse, where a sub-signature
becomes obsolete in a new context and a compact representation of the facts about
the remaining terms is sought-after. Recent results on ontology reuse show that neither
uniform interpolation nor standard module extraction guarantee the optimality of the
extracted ontology [16].
    Further, a question that naturally arises is that of tight complexity bounds when
shortcuts for disjunctions are allowed for. Another target would be the complexity of
identifying minimal TBoxes by the means of an arbitrary inseparable TBox, where we
waive the requirement of explicitly defining the meaning of new concepts. An E XP -
T IME upper bound for this problem is implied from the fact that the set of candidate
TBoxes is exponential, and so is the general test for inseparability in EL.
    Minimizing representations is, of course, an interesting problem for all logics, and
similar questions can (and should) be asked for more expressive ontology languages.
    While the concern of this paper is the complexity of the above problems, a natural
follow-up task would be to develop efficient algorithms and tools that support ontology
engineers in the development of succinct representations of their ontologies. Natural
targets would be good heuristics and efficient approximations. For the latter, our proofs
contain the bad news that there is no linear approximation scheme, as the set cover
problem has no logarithmic approximations unless P equals NP.
    Finally, from practical point of view, it would be very interesting to investigate the
potential improvement of succinctness in existing medical ontologies. Such a case study
can be carried out after the corresponding tool support becomes available.
Acknowledgments This work is supported by the EPSRC grant EP/H046623/1 ’Synthesis and
Verification in Markov Game Structures’ and the University of Oxford.
References
 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.: The Description
    Logic Handbook: Theory, Implementation and Applications. Cambridge University Press
    (2003)
 2. Baader, F., Küsters, R., Molitor, R.: Rewriting concepts using terminologies. In: Proceedings
    of the Seventh International Conference on Principles of Knowledge Representation and
    Reasoning (KR 2000). (2000) 297–308
 3. Grimm, S., Wissmann, J.: Elimination of redundancy in ontologies. In: Proceedings of the
    8th Extended Semantic Web Conference (ESWC 2011). (2011) 260–274
 4. Bienvenu, M.: Prime implicates and prime implicants: From propositional to modal logic.
    Journal of Artificial Intelligence Research (JAIR) 36 (2009) 71–128
 5. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Proceedings of the 19th
    International Joint Conference on Artificial Intelligence (IJCAI 2005). (2005) 364–369
 6. Motik, B., Cuenca Grau, B., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C., eds.: OWL 2 Web
    Ontology Language: Profiles. W3C Recommendation (27 October 2009) Available at http:
    //www.w3.org/TR/owl2-profiles/.
 7. OWL Working Group, W.: OWL 2 Web Ontology Language: Document Overview. W3C
    Recommendation (27 October 2009) Available at http://www.w3.org/TR/owl2-overview/.
 8. Nikitina, N., Rudolph, S.: ExpExpExplosion: Uniform interpolation in general EL termi-
    nologies. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI
    2012). (2012) 618–623
 9. Nikitina, N., Schewe, S.: More is Sometimes Less: Succinctness in EL. Techreport, Depart-
    ment of Computer Science, University of Oxford, Oxford (Mai 2013)
10. Ghilardi, S., Lutz, C., Wolter, F.: Did I Damage my Ontology? A Case for Conservative
    Extensions in Description Logics. In: Proceedings of the 10th International Conference on
    the Principles of Knowledge Representation and Reasoning (KR 2006). (2006) 187–197
11. Lutz, C., Walther, D., Wolter, F.: Conservative extensions in expressive description logics.
    In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI
    2007). (2007) 453–458
12. Konev, B., Lutz, C., Walther, D., Wolter, F.: Semantic modularity and module extraction in
    description logics. In: Proceedings of the 18th European Conference on Artificial Intelli-
    gence (ECAI 2008). (2008) 55–59
13. Konev, B., Lutz, C., Walther, D., Wolter, F.: Formal properties of modularisation. In Stucken-
    schmidt, H., Parent, C., Spaccapietra, S., eds.: Modular Ontologies. Springer-Verlag (2009)
    25–66
14. Lutz, C., Wolter, F.: Deciding inseparability and conservative extensions in the description
    logic EL. Journal of Symbolic Computation 45(2) (2010) pp.194–228
15. Kontchakov, R., Wolter, F., Zakharyaschev, M.: Can you tell the difference between dl-lite
    ontologies? In: Proceedings of the 11th International Conference on Principles of Knowledge
    Representation and Reasoning (KR 2008). (2008) 285–295
16. Nikitina, N., Glimm, B.: Hitting the sweetspot: Economic rewriting of knowledge bases.
    In: Proceedings of the 11th International Semantic Web Conference (ISWC 2012). (2012)
    394–409