<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Forgetting and uniform interpolation in extensions of the description logic EL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Boris Konev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dirk Walther</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frank Wolter?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Liverpool</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Terminologies used in life sciences, health-care, and other knowledge intensive areas are often very large and comprehensive. Examples are the medical terminology SNOMED CT (Systematised Nomenclature of Medicine Clinical Terms) [17] containing about 380 000 concept definitions and the National Cancer Institute ontology (NCI) [16] containing more than 60 000 axioms. For terminologies T of this size, it is often of interest to forget a subvocabulary of the vocabulary of T ; i.e., to transform T into a new terminology T (called a -interpolant of T ) that contains no symbols from and that is indistinguishable from T regarding its consequences that do not use . In AI, this problem has been studied under a variety of names such as forgetting and variable elimination [14, 4, 10]. In mathematical logic, this problem has been investigated as the uniform interpolation problem [18]. Computing -interpolants of terminologies has a number of potential applications, e.g., Re-use of ontologies: when using terminologies such as SNOMED CT in an application, often only a very small fraction of its vocabulary is of interest. In this case, one could use a -interpolant instead of the whole terminology, where is the vocabulary not of interest for the application. Predicate hiding: a terminology developer might not want to publish a terminology completely because a certain part of its vocabulary is not intended for public use. Again, publishing -interpolants, where is the vocabulary to be hidden, appears to be a solution to this problem. Exhibiting hidden relations between terms: large terminologies are difficult to maintain as small changes to its axioms can have drastic and damaging effects. To analyze possibly unwanted consequences over a certain part of the vocabulary, an ontology developer can automatically generate a complete axiomatization of the relations between terms over by computing a -interpolant, where is the complement of . Ontology versioning: to check whether two versions of a terminology have the same consequences over their common vocabulary (or a subset thereof), one can first compute their interpolants by forgetting the vocabulary not shared by the two versions and then check whether the two interpolants are logically equivalent (i.e., have the same models). In the description of -interpolants given above, we have neither specified a language in which they are axiomatized nor did we specify the language wrt. which interpolants should be indistinguishable from the original termininology. The choice of</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>? This research was supported by the EPSRC grant EP/E065279/1.
the latter language depends on the application: for example, if one is interested in
inclusions between concepts, then a -interpolant should imply the same concept inclusions
using no symbols from as the original terminology. On the other hand, if the
terminology is used to query instance data using conjunctive queries, then a -interpolant
together with any instance data using no symbols from should imply the same certain
answers to conjunctive queries using no symbols from as the original terminology.</p>
      <p>Regarding the language L in which -interpolants should be axiomatized, one has
to find a compromise between the following three conflicting goals:
(R) Standard reasoning problems (e.g., logical equivalence) in L should not be more
complex than reasoning in the language underlying the terminology.
(I) -interpolants in L should be uniquely determined up to logical equivalence: if T10
and T20 are -interpolants in L of terminologies T1 and T2 that have the same
consequences not using , then T10 and T20 should be logically equivalent.
(E) The language L should be powerful enough to admit finite and succinct (ideally,
polynomial size) axiomatizations of -interpolants, and it should be possible to
compute -interpolants efficiently (ideally, in polynomial time).</p>
      <p>For terminologies in standard description logics (DLs) such as E L and any language
between ALC and S HIQO, there do not exist languages L achieving all these goals
simultaneously.1 To illustrate this point, let L be second-order logic. Then L trivially
satisfies (E) but fails to satisfy (R) and (I), for ontologies in any standard DL.</p>
      <p>
        In this paper, we consider forgetting in the lightweight description logic E L
underlying the designated OWL2-EL profile of the upcoming OWL Version 2 extended with
role inclusions and domain and range restrictions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This choice it motivated by the
fact that forgetting appears to be of particular interest for large-scale and
comprehensive terminologies and that many such terminologies are given in this language. We
introduce three DLs for axiomatizing -interpolants satisfying criteria (R) and (I) and
preserving, respectively, inclusions between concepts, concept instances, and answers
to conjunctive queries. These DLs do not satisfy (E), as -interpolants sometimes do
not exist or are of exponential size. We demonstrate that, nevertheless, -interpolants
typically exist and can be computed in practice for large-scale terminologies such as
SNOMED CT and appropriate versions of NCI. This work has been presented in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ];
detailed proofs and additional experiments are available in a technical report [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>Let NC, NR, and NI be countably infinite and mutually disjoint sets of concept names,
role names, and individual names. E L-concepts C are built according to the rule
C :=</p>
      <p>A
j
&gt;
j</p>
      <p>
        C u D
j
9r:C;
where A 2 NC, r 2 NR, and C; D range over E L-concepts. The set of E LHr-inclusions
consists of concept inclusions C v D and concept equations C D, domain
restrictions dom(r) v C, range restrictions ran(r) v C and role inclusions r v s, where
1 This follows from the fact that deciding whether TBoxes in these DLs imply the same concept
inclusions over a signature is at least one exponential harder than deciding logical
equivalence [
        <xref ref-type="bibr" rid="ref11 ref12">12, 11</xref>
        ].
      </p>
      <p>Concept C</p>
      <sec id="sec-2-1">
        <title>Translation C]</title>
        <p>Concept C</p>
      </sec>
      <sec id="sec-2-2">
        <title>Translation C]</title>
        <p>&gt;</p>
        <p>A
C u D
9r:C
x = x</p>
        <p>A(x)</p>
        <p>C](x) ^ D](x)
9y (r(x; y) ^ C](y)) 9r1 u
dom(r) 9y (r(x; y))
ran(r) 9y (r(y; x))
9u:C (x = x) ^ 9y C](y)</p>
        <p>u rn:C 9y(r1(x; y) ^ ^ rn(x; y) ^ C](y))
C; D are E L-concepts and r; s 2 NR. An E LHr-TBox T is a finite set of E LH
r
inclusions. An E LHr-TBox is called E LHr-terminology if all its concept inclusions
and equations are of the form A v C and A C and no concept name occurs more
than once on the left hand side. In what follows, we use A ./ C to denote expressions
of the form A v C and A C.</p>
        <p>Assertions of the form A(a) and r(a; b), where a; b 2 NI, A 2 NC, and r 2 NR,
are called ABox-assertions. An ABox is a finite set of ABox-assertions. By obj(A)
we denote the set of individual names in A. A knowledge base (KB) is a pair (T ; A)
consisting of a TBox T and an ABox A. Assertions of the form C(a) and r(a; b), where
a; b 2 NI, C a E L-concept, and r 2 NR, are called instance assertions. To define the
semantics of DLs considered in this paper we make use of the fact that DL-expressions
can be regarded as formulas in FO, where FO denotes the set of first-order predicate
logic formulas with equality using unary predicates in NC, binary predicates in NR, and
constants from NI; see Figure 1 (in which the DL-constructors not considered so far
are defined later). In what follows, we will not distinguish between DL-expressions and
their translation into FO and regard TBoxes, ABoxes and KBs as finite subsets of FO.
Thus, we use T j= ' to denote that ' follows from T in first-order logic even if ' is an
E LHr-inclusion and T a subset of FO and similar conventions apply to DLs introduced
later in this paper. FO (and, therefore, E LHr) is interpreted in models I = ( I ; I ),
where the domain I is a non-empty set, and I is a function mapping each concept
name A to a subset AI of I , each role name r to a binary relation rI I I ,
and each individual name a to an element aI 2 I .</p>
        <p>The most important ways of querying E LHr-TBoxes and KBs are subsumption
(check whether T j= for an E LHr-inclusion ), instance checking (check whether
(T ; A) j= for an instance assertion ), and conjunctive query answering. To
define the latter, call a first-order formula q(x) a conjunctive query if it is of the form
9y (x; y), where is a conjunction of expressions A(t) and r(t1; t2) with t; t1; t2
drawn from NI and sequences of variables x and y. If x has length k, then a sequence
a of elements of obj(A) of length k is called a certain answer to q(x) of a KB (T ; A)
if (T ; A) j= q(a).
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Forgetting</title>
      <p>NR) n
A signature is a subset of NC [ NR2. Given a signature , we set = (NC [
. Given a concept, role, concept inclusion, TBox, ABox, FO-sentence, set of
2 We investigate forgetting for TBoxes for DLs without nominals; thus we do not include
individual names into the signature.</p>
      <p>FO-sentences E, we denote by sig(E) the signature of E, that is, the set of concept and
role names occurring in it. We use the term E LHr -inclusion ( -ABox, -query, L
sentence, etc.) to denote E LHr-inclusions (ABoxes, queries, L-sentences, etc.) whose
signature is contained in .</p>
      <p>To define forgetting, we first formalize the notion of inseparability between TBoxes
wrt. a signature. Intuitively, two TBoxes T1 and T2 are inseparable wrt. a signature if
they have the same -consequences, where the set of -consequences considered can
either reflect subsumption queries, instance queries, or conjunctive queries, depending
on the application. We give the definitions for sets of FO-sentences because we later
require these notions for a variety of DLs.</p>
      <p>Definition 1. Let T1 and T2 be sets of FO-sentences and a signature.
–T1 and T2 are concept -inseparable, in symbols T1 C T2, if for all E LHr -inclusions
: T1 j= , T2 j= .
–T1 and T2 are instance -inseparable, in symbols T1 i T2, if for all -ABoxes
A and -instance assertions using individual names from obj(A): (T1; A) j= ,
(T2; A) j= .
–T1 and T2 are query -inseparable, in symbols T1 q T2, if for all -ABoxes A,
conjunctive -queries q(x), and vectors a of the same length as x of individual names
in obj(A): (T1; A) j= q(a) , (T2; A) j= q(a).</p>
      <p>The definition of forgetting ( -interpolants) is now straightforward.</p>
      <p>Definition 2 ( -interpolant). Let T be an E LHr-TBox, a finite signature, and L
a set of FO-sentences. If T is a finite set of L -sentences such that T j= ' for all
' 2 T , then T is
– a concept -interpolant of T in L if T
– an instance -interpolant of T in L if T
– a query -interpolant of T in L if T q T .</p>
      <p>C T ;
i T ;
One can show that every query -interpolant is an instance -interpolant and every
instance -interpolant is a concept -interpolant. The converse implications do not
hold, even for E LHr-terminologies:
Example 1. Let T = fran(r) v A1; ran(s) v A2; B A1 u A2g and = fA1; A2g.
One can show that the empty TBox is a concept -interpolant of T . However, the
empty TBox is not an instance -interpolant of T . To show this, consider the -ABox
A = fr(a0; b); s(a1; b)g. Then (T ; A) j= B(b) but (;; A) 6j= B(b). Observe that no
E LHr-TBox (and even no SHQ-TBox) is an instance -interpolant of T because it is
impossible to capture the ABox A in a DL in which one cannot refer to the range of
distinct roles in one concept. On the other hand, the TBox T 0 = fran(r) u ran(s) v Bg
given in an extension of E LHr is an instance -interpolant of T .</p>
      <p>Example 2. Let T = fA v 9s:&gt;; s v r1; s v r2g and = fsg. Then T 0 = fA v
9r1:&gt; u 9r2:&gt;g is an instance -interpolant of T , but T 0 is not a query -interpolant
of T . To show the latter, let A = fA(a)g and let q = 9x(r1(a; x) ^ r2(a; x)). Then
(T ; A) j= q but (T 0; A) 6j= q. Again, no E LHr-TBox (and even no TBox in SHIQ) is
a query -interpolant of T . On the other hand, the TBox T 00 = fA v 9r1ur2:&gt;g given
in an extension of E LHr with conjunctions of roles names is a query -interpolant of
T .</p>
      <p>Besides of exhibiting examples where concept, instance, and query -interpolants are
distinct, Example 1 and 2 also show that even in extremely simple cases E LHr and a
variety of more expressive DLs are not sufficiently powerful to express instance and query
-interpolants of E LHr-terminologies. Rather surprisingly, there also exist simple
examples in which E LHr-TBoxes are not sufficiently expressive to axiomatize concept
-interpolants of E LHr-terminologies.</p>
      <p>Example 3. Let
= fResearch Inst; Education Instg and T be</p>
      <p>University Research Inst u Education Inst</p>
      <p>School v Education Inst
ran(PhD from) v Research Inst
Then there does not exist an E LHr-TBox that is a concept -interpolant of T .
Intuitively, the reason is that there is no E LHr -TBox which follows from T and has the
following infinite set of -consequences (which are consequences of T ):
9PhD from:(School u A) v 9PhD from:(University u A);
where A 2 . On the other hand, the TBox T 0 = fran(PhD from) u School v
Universityg given in an extension of E LHr is a concept -interpolant of T .
We now introduce three extensions of E LHr which we propose to axiomatize concept,
instance, and query -interpolants.</p>
      <p>Definition 3 (E Lran;0, E Lran, E Lran;u;u). Cran;0-concepts are constructed using the
following syntax rule</p>
      <p>C :=</p>
      <p>D</p>
      <p>j ran(r) j ran(r) u D;
where D ranges over E L-concepts and r 2 NR. The set of E Lran;0-inclusions consists
of concept inclusions C v D and role inclusions r v s, where C is a Cran;0-concept, D
an E L-concept, and r; s 2 NR.</p>
      <p>Cran-concepts are constructed using the following syntax rule</p>
      <p>C := A j ran(r) j C u D j 9r:C;
where A 2 NC, C; D range over Cran-concepts and r 2 NR. The set of E Lran-inclusions
consists of all concept inclusions C v D and role inclusions r v s, where C is a
Cran-concept, D an E L-concept, and r; s 2 NR.</p>
      <p>Let u (the universal role) be a fresh logical symbol. Cu;u-concepts are constructed
using the following syntax rule</p>
      <p>C := A j C u D j 9R:C j 9u:C;
where A 2 NC, C; D range over Cu;u-concepts and R = r1 u u rn with r1; : : : ; rn 2
NR. The set of E Lran;u;u-inclusions consists of concept inclusions C v D and role
inclusions r v s, where C is a Cran-concept, D a Cu;u-concept, and r; s 2 rNanR,. E L</p>
      <p>An X-TBox is a finite set of X-inclusions, where X ranges over E L ran;0,
and E Lran;u;u.</p>
      <p>We have the following inclusions:</p>
      <p>E LH
r C E L
ran;0</p>
      <p>C E L
ran</p>
      <p>C E L
ran;u;u
where L1 C L2 means that every L1-TBox is logically equivalent to some L2-TBox.
The semantics of the additional constructors is straightforward and given in Figure 1.
We regard the universal role u as a logical symbol (i.e., u 62 NR). This interpretation
reflects the fact that the signature of the first-order translation of 9u:C coincides with the
signature of C. Observe that the TBox given as a concept -interpolant in Example 3
is an E Lran;0-TBox; the instance -interpolant given in Example 1 is an E Lran-TBox,
and the query -interpolant in Example 2 is an E Lran;u;u-TBox. The need for the
universal role is illustrated by considering T = fA v 9r:Bg and = frg. The empty
TBox is a concept and an instance -interpolant of T . A query -interpolant is given
by T 0 = fA v 9u:Bg and reflects the fact that (T ; A) j= 9xB(x) for A = fA(a)g.</p>
      <p>We show that the languages introduced in Definition 3 satisfy criteria (R) and (I)
from the introduction. (R) is a consequence of the following result.</p>
      <p>Theorem 1. The following problems are PTIME-complete for E Lran;u;u-TBoxes T and
ABoxes A: decide whether
– T j= C v D, for C v D an E Lran;u;u-inclusion;
– (T ; A) j= C(a), where C is an E L-concept.</p>
      <p>
        Deciding whether (T ; A) j= q(a), where q is a conjunctive query, is NP-complete, and
deciding this problem for fixed q(a) (knowledge base complexity) is PTIME-complete.
It follows, in particular, that logical equivalence of E Lran;u;u-TBoxes is decidable in
PTime. These complexity results are exactly the same as for E LH-TBoxes [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. For (I)
and the computation of -interpolants below, we first investigate the relationship
between the distinct inseparability notions introduced in Definition 1 and inseparability
wrt. the new languages. (For E L this relationship is characterized in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].) Let X range
over the superscripts ‘ran; 0’, ‘ran’, and ‘ran; u; u’. Say that two finite sets of
FOsentences T1 and T2 are X-inseparable wrt. , in symbols T1 X T2, if T1 j= ,
T2 j= , for all E LX -inclusions .
      </p>
      <p>Theorem 2. Let T1 and T2 be E Lran;u;u-TBoxes and
following holds:
an infinite signature. Then the
– T1
– T1
– T1</p>
      <p>C T2 iff T1
i
q T2 iff T1</p>
      <p>T2 iff T1
ran;0</p>
      <p>T2;
ran T2;
ran;u;u</p>
      <p>T2.</p>
      <p>The condition that is infinite is required only for the implication from right to left in
Point 1. As we forget finite signatures, their complement is always infinite.</p>
      <p>From Theorem 2 we immediately obtain that (I) is met for the three notions of
interpolants. For example, assume that T1 and T2 are E LHr-TBoxes such that T1 q
T2 and let T10 and T20 rbane;uq;uuery -interpolants in E Lran;u;u of T1 and T2, respectively.
By Theorem 2, T10 T20. But then T10 and T20 are logically equivalent: we have
T10 j= for all 2 T20 because all such are E Lran;u;u-inclusions and T20 j= . The
converse direction holds for the same reason.</p>
    </sec>
    <sec id="sec-4">
      <title>Computing -interpolants</title>
      <p>
        We give a recursive algorithm computing instance -interpolants for E
LHr-terminologies satisfying certain acyclicity conditions (similar algorithms computing concept
and query -interpolants are presented in the technical report [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). In this section
we assume wlog. that terminologies are normalized E LHr terminologies; i.e., E
LHrterminologies T consisting of role inclusions and axioms of the form (here, and in what
follows, we write r vT s if T j= r v s)
– A ./ 9r:B, where B 2 NC [ f&gt;g;
– A ./ B1 u Bn, where B1; : : : ; Bn 2 NC;
– dom(s) v A and ran(s) v A, where A 2 NC
such that dom(s) v A 2 T and r vT s imply dom(r) v A 2 T ; ran(s) v A 2 T and
r vT s imply ran(r) v A 2 T ; and r vT s and s vT r implies r = s. We give the
acyclicity conditions required for the algorithms to terminate. The -cover CT (r) of a
role r wrt. a terminology T consists of all s 2 such that r vT s and there does not
exist r0 2 with r0 6= s and r vT r0 vT s.
      </p>
      <p>Definition 4 ( -loop). Let T be a normalized E LHr-terminology and a signature.
Define a relation (NC \ ) (NC \ ) as follows: A B if A; B 2 and
(a) A ./ C 2 T for some C such that B occurs in C, or
(b) A ./ 9r:A0 2 T for some A0 2 NC [ f&gt;g and r 2 such that dom(r) v B 2 T ,
or
(c) A ./ 9r:A0 2 T for some A0 2 NC [ f&gt;g and r such that there exists s 2 CT (r)
with ran(r) v B 2 T , ran(s) v B 62 T .</p>
      <p>We say that T contains a -loop if contains a cycle.</p>
      <p>The following example illustrates this definition and shows that the existence of
loops typically entails the non-existence of -interpolants, even in FO.</p>
      <sec id="sec-4-1">
        <title>Example 4. Consider the terminology</title>
        <p>T = fElephant v Mammal; Mammal v 9has mother:Mammalg
and let = fMammalg. Even in FO, there exists no concept/instance/query
-interpolant of T . To see this observe that an infinite axiomatization of such a -interpolant
is given by the inclusions</p>
        <p>n
fElephant v z9has mother: }| 9has mothe{r :&gt; j n
1g:
This theory cannot be finitely axiomatized in FO without additional predicates. Observe
that T contains the -loop as Mammal Mammal.</p>
        <p>
          Examples illustrating the points (b) and (c) of Definition 4 of -loops can be found in
the technical report [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>Call a concept name A primitive (pseudo-primitive) in a terminology T if A does
not occur on the left hand side of any axiom in T (does not occur in the form A C in
T ).</p>
        <p>For A 2 sig(T ), let Pre (A) consist of all D = ran(r), D = 9r:&gt;, and D 2 NC such that
T j= D v A and sig(D) sig(T ) \ ; construct P (A) as follows:
1. for A pseudo-primitive in T , P (A) = Pre (A);
2. if A B1 u u Bn 2 T , then</p>
        <p>P (A) = fCB1 u
u CBn j (Bi 2 and CBi = Bi)
or (Bi 2 and CBi 2 P (Bi))g;
3. if A 9r:A0 2 T , then P (A) is the union of Pre (A) and
– if A0 2 : f9s:A0 j s vT r; s 2 g;
– if A0 62 : f9s:D j s vT r; s 2 ; D 2 P (A0)g.</p>
        <p>
          The intuition behind the following algorithm for -interpolants is as follows: first,
one can show using Theorem 2 and a sequent-style proof system for E LHr that under
the conditions of Theorem 3 there exists an instance -interpolant consisting of (in
addition to role inclusions and domain and range restrictions) concept inclusions of the
form C v A and A v C. In Figure 2, we compute the set P (A) of Cs such that C v A
is in the interpolant by making a case distinction: in Point 1 A is pseudo-primitive; in
Point 2 it is defined by a conjunction; in Point 3 it is defined as 9r:A0. Points 2 and 3 are
recursive as they require the sets P (B) when B is used in the definition of A. -loops
describe exactly the situation in which the recursion does not terminate. In Figure 3 we
compute, in a similar way, the set Q (A) of Cs such that A v C is in the interpolant.
A sample run of the algorithm is provided in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>Theorem 3. Let be a finite signature and T a normalized E LHr-terminology
without -loops. Then the algorithms computing P (A) and Q (A) in Figures 2 and 3
terminate for all A 2 sig(T ).</p>
        <p>Let T consist of the following inclusions, where A, r, and s range over sig(T )\ :
– r v s, for r vT s;
– D v A, for all D 2 P (A);
– A v D, for all D 2 Q (A);
– ran(r) v D, for all D 2 Q (B) such that ran(r) v B 2 T and B 2
– dom(r) v D, for all D 2 Q (B) such that dom(r) v B 2 T and B 2
;
.</p>
        <p>Then T</p>
        <p>is an instance -interpolant of T .</p>
        <p>P (A) and Q (A) are both of exponential size, in the worst case. For P (A), this is
clear from Point 2 of the construction: let T consist of A B1 u u Bn and Aij v Bi
(1 i; j n) and let = fBi j 1 i ng. Then P (A) is of size nn, and one
can show that there does not exist a shorter -interpolant in E Lran;u;u. For Q (A)
this follows from the fact that one might have to construct a complete unfolding of the
terminology.</p>
        <p>If we admit disjunctions in C in axioms C v D of -interpolants, then we can
replace, in Point 2, P (A) for A B1 u u Bn 2 T by the singleton set consisting
of</p>
        <p>1 i n;Bi2 1 i n;Bi2
We will see below that in practice this construction leads to much smaller -interpolants.
However, this improvement does not come for free. Consider the language E Lran;t,
where the only difference to E Lran is that Cran-concepts now admit ‘t’ as a binary
ran;t-TBox is logically equivalent to an (exponentially
concept constructor. Every E L ran;t inherits many desirable properties from
ErLan;tlarger) E Lran-TBox, and so E L ran.
However, one can show that, in contrast to E Lran, logical equivalence between E L
TBoxes is coNP-hard.</p>
        <p>l</p>
        <p>Bi u
l</p>
        <p>G</p>
        <p>CBi :
CBi 2P (Bi)
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>We have implemented a prototype called NUI that computes instance -interpolants
as presented in Theorem 3. We have applied NUI to a version of SNOMED CT dated
09 February 2005 (without two left-identities) and the E LHr-fragment of the release
08.08d of NCI. The first terminology has approx. 380K axioms, almost the same
number of concept names, and 56 role names. The E LHr-fragment of NCI has approx. 63K
axioms, approx. 65K concept names, and 123 role names. We note that the algorithms
given above compute (for ease of exposition) a large number of redundant axioms and
NUI implements a variety of straightforward optimizations.</p>
      <p>j \ sig( )j SNOMED CT j \ sig( )j
First observe that neither SNOMED CT nor NCI contain any -loops, for any
signature . Thus, -interpolants always exist and can, in principle, be computed using
our algorithm.</p>
      <p>In our experiments, we focus on the case of forgetting a large signature (and
keeping a “small” signature \ sig( )), as this corresponds to many application
scenarios. The experiments have been performed on a standard PC with 2:13 GHz and 3 GB
of RAM.</p>
      <p>Success rate: Table 1 shows the rate at which NUI succeeds to compute instance
interpolants of SNOMED CT and NCI wrt. various signatures. All failed cases are due
to memory overflow after several hours. For each table entry, 100 samples have been
used. The signatures contain concept and role names randomly selected from the full
signature of SNOMED CT (we never forget the role ‘roleGroup’ as this would make
forgetting trivial) and NCI, respectively. \ sig( ) always contains 20 role names. For
NCI and signatures of size 4 000, NUI had a 100% success rate.</p>
      <p>
        Size: We compare the size of instance -interpolants of SNOMED CT and NCI
computed by NUI with the size of extracted \ sig( )-modules; i.e., minimal subsets of the
respective terminologies which preserve, e.g., inclusions between \ sig( )-concepts.
We use MEX-modules [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] of SNOMED CT and, respectively, &gt;-local modules [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] of
NCI. The size of -interpolants, terminologies, and modules is measured as number of
symbols rather than number of axioms as -interpolants can contain large axioms.
      </p>
      <p>For SNOMED CT, we generated 100 random signatures with 3 000 concept names
and 20 role names in \ sig( ). For 17 of those signatures, NUI failed to compute an
instance -interpolant. For the remaining 83 signatures, Figure 4 shows the number of
interpolants and \ sig( )-modules (vertical axis) of a given size (horizontal axis). The
size of the \sig( )-modules lies between 125K and 150K symbols, which is about 3%
of SNOMED CT. 48:19% of instance -interpolants are smaller than the corresponding
modules. However, 10 instance -interpolants contain more than one million symbols
and the largest instance -interpolant is more than 11 times larger than SNOMED CT.</p>
      <p>For NCI, we computed instance -interpolants and \ sig( )-modules wrt. random
signatures with 7 000 concept names and 20 role names in \ sig( ). NUI succeeded
to compute interpolants for 97 of 100 signatures, each within 25 min. Figure 5 shows
the size distribution of the successfully computed interpolants and the corresponding
modules. The size of the \ sig( )-modules lies between 140K and 160K symbols,
which is about 22:5% of NCI. 74:47% of the instance -interpolants are smaller than
the corresponding modules. On the other hand, 18 instance -interpolants consist of
more than 400K symbols and the largest instance -interpolant is more than 12 times
larger than NCI.</p>
      <p>Forgetting with disjunction: All failures in Table 1 are due to the fact that P (A) is
too large. Indeed, if we admit disjunction and consider E Lran;t, then NUI succeeds to
compute all -interpolants from Table 1, each within 15 min. Moreover, for NCI, no
signature for which NUI fails has been detected. For SNOMED CT, however, NUI still
typically fails for j \ sig( )j 30 000.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Discussion</title>
      <p>
        The notion of forgetting in DL ontologies has recently been investigated in a number of
research papers. [
        <xref ref-type="bibr" rid="ref19 ref9">9, 19</xref>
        ] consider forgetting in DL-Lite and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] investigate in how far
forgetting in DLs can be reduced to forgetting in logic programs. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], on which this paper
is based, proposes forgetting for acyclic E L-terminologies and concept inclusions.
      </p>
      <p>The main novel contributions of this paper are (i) the first algorithms with
experimental results indicating the practical feasibility of forgetting in DL-terminologies
and (ii) the first systematic analysis of the distinct languages required to axiomatize
-interpolants for distinct query languages. Many open problems remain; e.g., we
conjuncture that -interpolants of E LHr-terminologies (and possibly even TBoxes) exist
in the languages introduced whenever they exist in FO. Such a result would provide
further justification for those languages. Secondly, it would be of interest to prove
decidability (and complexity) of the decision problem whether there exists a -interpolant
for a given E LHr-terminology (TBox). Note that our acyclicity conditions are sufficient
but not necessary for the existence of -interpolants.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brandt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          .
          <article-title>Pushing the EL-envelope further</article-title>
          .
          <source>In Proceedings of OWLED DC</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>B.</given-names>
            <surname>Cuenca Grau</surname>
          </string-name>
          , I. Horrocks,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kazakov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>Just the right amount: Extracting modules from ontologies</article-title>
          .
          <source>In Proceedings of WWW'07</source>
          , pages
          <fpage>717</fpage>
          -
          <lpage>727</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>T.</given-names>
            <surname>Eiter</surname>
          </string-name>
          , G. Ianni,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schindlauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tompits</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Forgetting in managing rules and ontologies</article-title>
          .
          <source>In Web Intelligence</source>
          , pages
          <fpage>411</fpage>
          -
          <lpage>419</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>T.</given-names>
            <surname>Eiter</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Semantic forgetting in answer set programming</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>172</volume>
          (
          <issue>14</issue>
          ):
          <fpage>1644</fpage>
          -
          <lpage>1672</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>B.</given-names>
            <surname>Konev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Walther</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Semantic modularity and module extraction in description logic</article-title>
          .
          <source>In Proceedings of ECAI'08</source>
          , pages
          <fpage>55</fpage>
          -
          <lpage>59</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>B.</given-names>
            <surname>Konev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Walther</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>The logical difference problem for description logic terminologies</article-title>
          .
          <source>In Proceedings of IJCAR'08</source>
          , pages
          <fpage>259</fpage>
          -
          <lpage>274</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B.</given-names>
            <surname>Konev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Walther</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Forgetting and uniform interpolation in large-scale description logic terminologies</article-title>
          .
          <source>In Proceedings of IJCAI'09</source>
          ,
          <year>2009</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>B.</given-names>
            <surname>Konev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Walther</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Forgetting and uniform interpolation in large-scale description logic terminologies</article-title>
          .
          <source>Technical Report ULCS-09-006</source>
          , University of Liverpool,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>Can you tell the difference between DLlite ontologies</article-title>
          .
          <source>In Proceedings of KR'08</source>
          , pages
          <fpage>285</fpage>
          -
          <lpage>295</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>J. Lang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Liberatore</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Marquis</surname>
          </string-name>
          .
          <article-title>Propositional independence: formula-variable independence and forgetting</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>18</volume>
          :
          <fpage>391</fpage>
          -
          <lpage>443</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>C. Lutz</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Walther</surname>
            , and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Conservative extensions in expressive description logics</article-title>
          .
          <source>In Proceedings of IJCAI'07</source>
          , pages
          <fpage>453</fpage>
          -
          <lpage>458</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Conservative extensions in the lightweight description logic EL</article-title>
          .
          <source>In Proceedings of CADE'07</source>
          , pages
          <fpage>84</fpage>
          -
          <lpage>99</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Deciding inseparability and conservative extensions in the description logic EL</article-title>
          .
          <source>Journal of Symbolic Computation</source>
          ,
          <year>2009</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>R.</given-names>
            <surname>Reiter</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Lin</surname>
          </string-name>
          . Forget it!
          <source>In Proceedings of AAAI Fall Symposium on Relevance</source>
          , pages
          <fpage>154</fpage>
          -
          <lpage>159</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>On conjunctive query answering in EL</article-title>
          .
          <source>In Proceedings of DL'07</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>N.</given-names>
            <surname>Sioutos</surname>
          </string-name>
          , S. de Coronado,
          <string-name>
            <given-names>M.</given-names>
            <surname>Haber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hartel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shaiu</surname>
          </string-name>
          , and
          <string-name>
            <surname>L. Wright.</surname>
          </string-name>
          <article-title>NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          ,
          <volume>40</volume>
          (
          <issue>1</issue>
          ):
          <fpage>30</fpage>
          -
          <lpage>43</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>K.</given-names>
            <surname>Spackman</surname>
          </string-name>
          .
          <article-title>Managing clinical terminology hierarchies using algorithmic calculation of subsumption: Experience with SNOMED-RT</article-title>
          .
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>A.</given-names>
            <surname>Visser</surname>
          </string-name>
          .
          <article-title>Uniform interpolation and layered bisimulation</article-title>
          .
          <source>In Go¨del '96 (Brno</source>
          ,
          <year>1996</year>
          ), volume
          <volume>6</volume>
          of Lecture Notes Logic. Springer Verlag,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Topor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          .
          <article-title>Forgetting in DL-Lite</article-title>
          .
          <source>In Proceedings of ESWC'08</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>