=Paper=
{{Paper
|id=None
|storemode=property
|title=Towards Practical Uniform Interpolation and Forgetting for ALC TBoxes
|pdfUrl=https://ceur-ws.org/Vol-1014/paper_61.pdf
|volume=Vol-1014
|dblpUrl=https://dblp.org/rec/conf/dlog/LudwigK13
}}
==Towards Practical Uniform Interpolation and Forgetting for ALC TBoxes==
Towards Practical Uniform Interpolation and Forgetting for ALC TBoxes Michel Ludwig1,2 and Boris Konev1 1 Department of Computer Science, University of Liverpool, United Kingdom Konev@liverpool.ac.uk 2 Institute for Theoretical Computer Science, TU Dresden, Germany Center for Advancing Electronics Dresden michel@tcs.inf.tu-dresden.de Abstract We develop a clausal resolution-based approach for computing uni- form interpolants of TBoxes formulated in the description logic ALC when such uniform interpolants exist. We also present an experimental evaluation of our approach and its applications to concept forgetting, ontology obfuscation and lo- gical difference on real-life ALC ontologies. Our results indicate that in many practical cases a uniform interpolant exists and can be computed with the presen- ted algorithm. 1 Introduction Ontologies or TBoxes expressed in Description Logics (DL) provide a common vocabu- lary for a domain of interest together with a description of the meaning of the terms built from the vocabulary and of the relationships between them. Modern applications of on- tologies, especially in the biological, medical, or healthcare domain, often demand large and complex ontologies; for example, the National Cancer Institute ontology (NCI) con- sists of more than 60 000 term definitions. For developing, maintaining, and deploying such large-scale ontologies it can be advantageous for ontology engineers to concen- trate on specific parts of an ontology and ignore or forget the rest, for example, in the following scenarios. Exhibiting hidden relations: in addition to the explicitly stated re- lations between terms, additional relations can also be derived from ontologies with the help of reasoners. Such inferred relations are often harder to understand or debug. By forgetting everything but a handful of terms of interest, it then becomes possible to ex- hibit inferred relations that are hidden initially, which can simplify the understanding of the ontology structure. Ontology obfuscation: in software engineering, obfuscation [4] transforms a given program into a functionally equivalent one that is more difficult to read and understand for humans for the purpose of preventing reverse engineering. For- getting can provide a similar function in the context of ontology engineering. Terms are often defined with the help of auxiliary terms which help to give structure to TBox inclusions but such a structure might be considered proprietary knowledge that should not be exposed, or it could simply be of little interest for ontology users. By forgetting This research was supported by EPSRC grant EP/E065279/1. these intermediate auxiliary terms, we obtain an ontology that is functionally equival- ent, yet harder to read, understand, and modify by humans. Logical difference: when modifying an ontology, an ontology engineer usually wants to ensure that her changes do not interfere with the meaning of the terms outside the fragment under considera- tion. Such a correctness guarantee can be achieved by checking the equivalence of the ontologies resulting from forgetting the terms under consideration before and after the changes occurred. Further applications of forgetting can be found in [11, 12]. Ignoring parts of an ontology can be formalised with the help of predicate forgetting and its dual uniform interpolation, which have both been extensively studied in the AI and DL literature [3,6,8,11,13,16–18]. However, to the best of our knowledge, previous research in this area in the setting of DL mainly concentrates on theoretical foundations of forgetting and, except for practical work on lightweight description logics [11], we are not aware of any attempts to compute uniform interpolants for real-life ontologies in practice. This could be partly explained by the high computational complexity of this task and by the fact that uniform interpolants do not always exist. In this paper we develop an algorithm based on clausal resolution for computing uniform interpolants of TBoxes formulated in the description logic ALC which can pre- serve all the consequences that do not make use of some given concept names. We also present an experimental evaluation of our approach which demonstrates that in many practical cases uniform interpolants exist and that they can be computed with our al- gorithm. All missing proofs can be found in the full version of this paper, which is avail- able from http://lat.inf.tu-dresden.de/˜michel/publications/. 2 Preliminaries We start with introducing the description logic ALC. Let NC and NR be countably infinite and mutually disjoint sets of concept names and role names. ALC-concepts are built according to the following syntax rule C ::= A | > | ¬C | ∃r.C | C u D, where A ∈ NC and r ∈ NR . As usual, other ALC concept constructors are introduced as abbreviations: ⊥ stands for ¬>, C t D stands for ¬(¬C u ¬D) and ∀r.C stands for ¬∃r.¬C. An ALC-TBox T is a finite set of ALC-inclusions of the form C v D, where C and D are ALC-concepts. A concept equation C ≡ D is an abbreviation for the two inclusions C v D and D v C. An ALC-TBox is acyclic if all its inclusions are of the form A v C and A ≡ C, where A ∈ NC and C is an ALC-concept, such that no concept name occurs more than once on the left-hand side and T contains no cycle in its definitions, that is, it does not contain inclusions A1 ./ C1 ,. . . , Ak ./ Ck , where ./ ∈ {v, ≡}, such that Ai+1 occurs in Ci , for i = 1, . . . , k − 1 and Ak = A1 . A signature Σ is a finite subset of NC ∪NR . The signature of a concept C, denoted by sig(C), is the set of concept and role names that occur in C. If sig(C) ⊆ Σ, we call C a Σ-concept. We assume that the two previous definitions also apply to concept inclu- sions/equations C ./ D with ./ ∈ {v, ≡} and to TBoxes T . The size of a concept C is the length of the string that represents it, where concept names and role names are con- sidered to be of length one. The size of an inclusion/equation C ./ D with ./ ∈ {v, ≡} is the sum of the sizes of C and D plus one. The size of a TBox T is the sum of the sizes of its inclusions. The semantics of ALC is given by interpretations I = (∆I , ·I ), where the domain ∆ is a non-empty set, and ·I is a function mapping each concept name A to a subset I AI of ∆I , each role name r to a binary relation rI ⊆ ∆I × ∆I . The extension C I of a concept C is defined by induction as follows: >I := ∆I (∃r.C)I := { d ∈ ∆I | ∃e ∈ C I : (d, e) ∈ rI } (¬C) := ∆I \ C I I (C u D)I := C I ∩ DI . Then I satisfies a concept inclusion C v D, in symbols I |= C v D, if C I ⊆ DI . We say that an interpretation I is a model of a TBox T if I |= C v D for all C v D ∈ T . An ALC-inclusion C v D follows from (or is entailed by) a TBox T if every model of T is a model of C v D, in symbols T |= C v D. We use |= C v D to denote that C v D follows from the empty TBox. Finally, a TBox T 0 follows from (or is entailed by) a TBox T if every model of T is a model of T 0 , in symbols T |= T 0 . We now introduce the main notion that we study in this paper. Definition 1. Let T be an ALC-TBox and let Σ ⊆ sig(T ) be a signature. We say that an ALC-TBox TΣ is a Σ-uniform interpolant of the TBox T iff sig(TΣ ) ⊆ Σ, T |= TΣ , and for every ALC Σ-concept inclusion C v D with T |= C v D it holds that TΣ |= C v D. Uniform interpolation can be seen as the dual notion of forgetting: a TBox TΥ is the result of forgetting about a signature Υ in a TBox T iff TΥ is a uniform interpolant of T w.r.t. Σ = sig(T ) \ Υ . As the following example shows, uniform interpolants of ALC-TBoxes do not always exist. Example 2. Let T = {A v B, B v C u ∃r.B} and Σ = {A, C, r}. Then there does not exist a Σ-uniform interpolant of T as (in particular) the infinite number of consequences of the form A v ∃r.C, A v ∃r.∃r.C, . . . cannot be captured by an ALC-TBox T 0 with sig(T 0 ) ⊆ Σ. On the other hand, for T 0 = {A v B, B v C u ∃r.B, D ≡ B} and Σ 0 = {A, C, D, r}, a Σ 0 -uniform interpolant of T 0 is {A v D, D v C u ∃r.D}. A 2-E XP T IME-complete bound on the complexity for deciding the existence of a Σ-uniform interpolant in ALC and a worst-case triple-exponential procedure for com- puting a Σ-uniform interpolant if it exists, have been given in [13]. The proof of the upper bound proceeds by ‘internalisation’ of the TBox: if a Σ-uniform interpolant ex- ists, then there exists a concept CT of size double exponential in the size of T having the property that for any Σ-inclusion C v D it holds that T |= C v D iff |= C u CT v D. Then the authors compute C 0 , a Σ-concept uniform interpolant [3] of CT , and show that > v C 0 is a Σ-uniform interpolant of T . 3 Computing Uniform Interpolants by ALC-Resolution The aim of our work is to investigate a practical approach for computing uniform inter- polants when they exist. Note that the procedure given in [13] is inherently inefficient as it requires one to explicitly construct the double-exponential size internalisation CT of a given TBox T . Our approach is to introduce a resolution-like calculus for ALC that derives con- sequences of a TBox T such that a concept inclusion C v D is entailed by T iff a con- tradiction can be derived from T and C u ¬D. Similarly to [8], we then show that any derivation can be restructured in such a way that inferences on selected concept names always precede inferences on other concept names. Then, if the signature Σ is such that sig(T ) \ Σ only contains concept names, we generate a set of Σ-consequences T 0 of T by applying the inference rules in a forward chaining manner such that for an arbitrary Σ-inclusion C v D a contradiction can be derived from T and C u ¬D iff a con- tradiction can be derived from T 0 and C u ¬D. Thus, if the forward-chaining process terminates, T 0 is a Σ-uniform-interpolant for T . ALC-Resolution. ALC-resolution operates on ALC formulae in conjunctive normal form defined according to the following grammar (this is similar to [8]): Literal ::= A | ¬A | ∀r.Clause | ∃r.CNF Clause ::= Literal | Clause t Clause | ⊥ CNF ::= > | Clause | Clause u CNF To simplify the presentation, we assume that clauses are sets of literals and that CNF expressions are sets of clauses. Then ⊥ corresponds to the empty clause and > to the empty set of clauses. In the following, the calligraphic letters C, D, E symbolise clauses and F, G represent sets of clauses. Similarly to first-order formulae, every ALC concept can be transformed into an equivalent set of ALC clauses. The depth of a clause C, Depth(C), is defined to be the maximal nesting depth of the quantifiers contained in C. We additionally assume that every clause is assigned a type. Clauses obtained from the clausification of TBox inclusions are of the type universal, and clauses resulting from the clausification of inclusions to be tested for entailment by the TBox are of the type initial. The type of a derived clause is determined by the types of the clauses from which it is derived and by the derivation rule that is used. Example 3. The clausification of T from Example 2 produces three universal clauses: ¬A t B, ¬B t C, ¬B t ∃r.B. We now introduce the two resolution calculi T and Tu . The former calculus assumes the TBox to be empty, whereas the latter takes TBox inclusions into account. Thus, T derives the empty clause from the set of initial clauses stemming from the clausification of an inclusion C v D iff |= C v D; and Tu derives the empty clause from the universal clauses stemming from the clausification of a TBox T and the initial clauses stemming from the clausification of an inclusion C v D iff T |= C v D. The calculus T is defined with the help of the relation ⇒α given in Fig. 1. For every α ∈ NC ∪{⊥}, the relation ⇒α associates with a set of clauses N a new clause C which can be ‘derived’ from the set N by ‘resolving’ on α. T now consists of the following two inference rules. C (if C ⇒ E) C D (if C, D ⇒ E), α α E E where C, D, and E are initial clauses. (rule ⊥) C10 t ∀r.⊥, C20 t ∃r.F =⇒⊥ C10 t C20 (rule A) C10 t A, C20 t ¬A =⇒A C10 t C20 (rule ∀∃) C10 t ∀r.C1 , C20 t ∃r.(C2 , F) =⇒α C10 t C20 t ∃r.(C2 , F, C3 ) if C1 , C2 =⇒α C3 (rule ∀∀) C10 t ∀r.C1 , C20 t ∀r.C2 =⇒α C10 t C20 t ∀r.C3 if C1 , C2 =⇒α C3 (rule ∃1 ) C 0 t ∃r.(C1 , F) =⇒α C 0 t ∃r.(C1 , F, C2 ) if C1 =⇒α C2 (rule ∃2 ) C 0 t ∃r.(C1 , C2 , F) =⇒α C 0 t ∃r.(C1 , C2 , F, C3 ) if C1 , C2 =⇒α C3 (rule ∀) C 0 t ∀r.C1 =⇒α C 0 t ∀r.C2 if C1 =⇒α C2 Figure 1. Rules of =⇒α . The calculus Tu operates initial and universal clauses and also consists of two rules: C (if C ⇒ E) C 0 D (if C 0 , D ⇒u E 0 ), α α E E0 where C, C 0 , D are initial or universal clauses, and C 0 , D ⇒uα E 0 holds iff either C 0 , D ⇒α E 0 , or D is a universal clause and there exist role names r1 , . . . , rn ∈ NR (n ≥ 1) such that C 0 , ∀r1 . . . . ∀rn .D ⇒α E 0 . (Intuitively, the calculus Tu allows for inferences with universal clauses at arbitrary nesting levels of quantifiers, which the calculus T does not.) Then E is a universal clause if C is a universal clause, and an initial clause other- wise. Similarly, E 0 is a universal clause if both C 0 and D are universal clauses, and an initial clause otherwise. Example 4. For the universal clauses from Example 3, we have for instance, ¬A t B, ¬B t ∃r.B ⇒B ¬A t ∃r.B by (rule A). So, the universal clause ¬A t ∃r.B is derivable by Tu from ¬A t B and ¬B t ∃r.B. As ¬B t C is a universal clause and ¬B t ∃r.B, ∀r.¬B t C ⇒B ¬B t ∃r.(B, C) by (rule ∀∃), the universal clause ¬B t ∃r.(B, C) is derivable by Tu from ¬B t ∃r.B and ¬B t C. By applying the inference rules to old and newly generated clauses, one can conclude that the universal clauses ¬A t ∃r.(B, C) and ¬A t ∃r.(B, ∃r.B) are also derivable by Tu from N = {¬A t B, ¬B t C, ¬B t ∃r.B}. For x ∈ {T, Tu }, a x-derivation (tree) ∆ built from a set of clauses N is a finite binary tree where each leaf is labelled with a clause from N and each non-leaf node n is labelled with a clause C such that C results from an x-inference on the parent(s) of n in ∆. We say that ∆ is a derivation of a clause C if the root of ∆ is labelled with C. A derivation of the empty clause is called a refutation. Every path n1 , . . . , nm of nodes in ∆ where n1 is a leaf node and nm is the root node induces an inference path α2 , . . . , αm where αi ∈ NC ∪ {⊥} (2 ≤ i ≤ m) denotes the concept name, or ⊥, which has been resolved upon to obtain the clause that is the label of the node ni . For a signature Υ ⊆ NC and a strict total order ⊆ Υ × Υ , a derivation ∆ is a (x, Υ, )- derivation if for every inference path α1 , . . . , αn of ∆ (with αi ∈ NC ∪ {⊥} for every 1 ≤ i ≤ n) there exists 0 ≤ k ≤ n such that {α1 , . . . , αk } ⊆ Υ , αj αj+1 or αj = αj+1 for every 1 ≤ j < k, and αj 6∈ Υ for every k < j ≤ n. The calculus T is a direct adaption of the calculus for the modal logic K introduced in [7] to ALC, and so |= > v C iff the empty clause can be derived from the clausifica- tion of ¬C. However, as we show in the full version of the paper, the proof given in [7] of the fact that any derivation in T can be reordered so that inferences on concept names from Υ always precede inferences on other concept names, or ⊥, appears to have some gaps. We prove the following theorem which extends the results stated in [8]. Theorem 5 (T-Completeness). Let Υ ⊆ NC , let ⊆ Υ × Υ be a strict total order on Υ and let C be an ALC concept. Then it holds that |= C v D iff there exists a (T, Υ, )-derivation of the empty clause from the initial clauses Cls(C u ¬D). To prove completeness for Tu , we observe the following link between derivations in T and Tu . Let N be a set of clauses and let [ Univ0 (N ) = N ; Univi+1 (N ) = Univi (N ) ∪ { ∀r.D | D ∈ Univi (N ) } S r∈NR ∩sig(N ) and Univ(N ) = i≥0 Univi (N ). Theorem 6. Let M be a set of initial clauses and let N be a set of universal clauses. Additionally, let ∆ be a (T, Υ, )-refutation from M ∪ Univ(N ) such that there exists n ∈ N with Depth(C) ≤ n for every C ∈ Clauses(∆). Then there exists a (Tu , Υ, )- derivation ∆u of the empty clause from M ∪ N such that Depth(C) ≤ n for every C ∈ Clauses(∆u ). We then use Theorem 5 and the fact that every ALC-TBox can be internalised. Notice that the actual TBox internalisation CT does not have to be computed as it is only used for the proof of completeness. Corollary 7 (Tu -Completeness). Let T be an ALC-TBox, let Υ ⊆ NC , let ⊆ Υ × Υ be a strict total order on Υ and let C be an ALC concept. Then it holds that T |= C v D iff there exists a (Tu , Υ, )-derivation of the empty clause from the universal clauses Cls(T ) and the initial clauses Cls(C u ¬D). Computing Uniform Interpolants. The procedure U NIFORM I NTERPOLANT depicted in Algorithm 1 takes as input an ALC-TBox T , a signature Σ ⊆ sig(T ) such that Σ ∩ NR = sig(T ) ∩ NR and a strict total order ⊆ Υ × Υ over Υ = sig(T ) \ Σ. Following the outline of [8], after the clausification of T , the procedure iterates over the concept names contained in Υ in descending order according to the relation . In each iteration all possible Tu -inferences on the current concept name A ∈ Υ from the clause set N obtained in the previous iteration are computed. Finally, after iterating over all the concept names from Υ = sig(T ) \ Σ, the operator ‘Supp’ is applied on the resulting clauses, which replaces all occurrences of Υ concept names in clauses with > and then simplifies the resulting CNF. Algorithm 1 1: procedure U NIFORM I NTERPOLANT(T , Σ, ) 2: Υ := sig(T ) \ Σ 3: N := Cls(T ) 4: while Υ 6= ∅ do 5: A := max (Υ ) 6: N := Res∞ Tu ,{A} (N ) 7: Υ := Υ \ {A} 8: end while 9: return FΣ (T ) = Supp(sig(T ) \ Σ, N ) 10: end procedure Example 8. For the clauses derived in Example 4, Supp({B}, ¬A t C) = A t C, Supp({B}, ¬A t ∃r.B) = ¬A t ∃r.>, Supp({B}, ¬A t ∃r.(B, C)) = ¬A t ∃r.C. One can show that if Algorithm 1 terminates, for all ALC Σ-concepts C, D such that there exists a (Tu , Υ, )-refutation ∆u from the universal clauses Cls(T ) and the initial clauses Cls(C u ¬D) it holds that FΣ (T ) |= C v D. Thus, it follows from Corollary 7 that if Algorithm 1 terminates, it computes a Σ-uniform interpolant of T . However, Algorithm 1 does not terminate if a uniform interpolant does not exist. For example, when applied to T from Example 2, Algorithm 1 can generate, among others, the infinite sequence of universal clauses ¬A t ∃r.C, ¬A t ∃r.(C, ∃r.C), . . . and so it does not terminate. Moreover, as T from Example 2 is a subset of T 0 , and so Cls(T ) ⊆ Cls(T 0 ), Algorithm 1 will derive, among others, the same clauses when it is applied on T 0 . Thus, in some cases Algorithm 1 does not terminate even though a uniform interpolant exists. To guarantee termination on all inputs, we focus on the notion of depth-bounded uniform interpolation (related to the notion of ‘bounded forgetting’ [19]). Let T be an ALC-TBox and let Σ ⊆ sig(T ) be a signature. We say that an ALC-TBox TΣ is a depth n-bounded uniform interpolant of the TBox T w.r.t. Σ iff sig(TΣ ) ⊆ Σ, T |= TΣ , and for every ALC Σ-concept inclusion C v D with T |= C v D and max{Depth(C), Depth(D)} ≤ n it holds that TΣ |= C v D. Let FΣ,m (T ) be the outcome of Algorithm 1 where in Step 6 only clauses up to depth m are generated. The following example shows that it might be necessary to consider intermediate clauses of a depth m > n in order to preserve all the Σ-consequences of depth n entailed by T . Example 9. Let T = {¬At∃r.C, ¬C t∃s.>, B t∀s.⊥}, Σ = {A, B, r, s}, Υ = {C} and = ∅. Then every (Tu , Υ, )-refutation from the universal clauses Cls(T ) and the initial clauses {A, ∀r.¬B} derives the clause ¬A t ∃r.(C, ∃s.>). We establish, however, that by choosing the maximal depth of derived clauses appro- priately, the procedure depicted in Algorithm 1 computes uniform interpolants that pre- serve consequences up to a specified depth n. Theorem 10. Let T be an ALC-TBox, Σ ⊆ sig(T ) a signature such that Σ ∩ NR = sig(T ) ∩ NR , and let n ≥ 0. Set m = n + 2|sub(Cls(T ))|+1 + max{ Depth(C) | C ∈ Cls(T ) }. Then it holds that FΣ,m (T ) is a depth n-bounded uniform interpolant of the TBox T w.r.t. Σ. We can combine this result with the bound on the size of uniform interpolants in [13]: for any ALC-TBox T and signature Σ (with Σ ∩ NR = sig(T ) ∩ NR ), if a Σ-uniform interpolant of T exists, there exists m, which can be computed based on the bound in Theorem 10 and the results of [13], such that FΣ,m (T ) is a Σ-uniform interpolant of T . The bound in Theorem 10 can be significantly improved if the TBox is acyclic. For an acyclic ALC-TBox T we define ExpansionDepth(T ) = max{ Depth(A[T ]) | A ∈ sig(T ) }, where A[T ] denotes the concept obtained by exhaustively replacing every concept B with CB if B v CB ∈ T or B ≡ CB ∈ T . Theorem 11. Let T be an acyclic ALC TBox, Σ ⊆ sig(T ) a signature such that Σ ∩ NR = sig(T ) ∩ NR , and let n ≥ 0. Set m = ExpansionDepth(T ) + n. Then it holds that FΣ,m (T ) is a uniform interpolant limited to consequence depth n of the TBox T w.r.t. Σ. Note that in the description logic EL (i.e. the fragment of ALC that does not allow ⊥, negation, disjunction, or universal quantification) the acyclicity of a TBox guarantees the existence of uniform interpolants [11] for any signature Σ. Interestingly, this is not so in the case of ALC. Moreover, as the following example shows, there exists an acyclic EL-TBox T and a signature Σ for which no ALC Σ-uniform interpolant exists. Example 12. Consider Σ = {A, A0 , A1 , A2 , E, r} and T = {A v ∃r.B, A0 v ∃r.(A1 u B), E ≡ A1 u B u ∃r.(A2 u B)}. Then for every n ≥ 0 ln T |= A0 u ∀r. . . ∀r.}(A u ¬E u (A1 t A2 )) v |∃r. .{z | .{z . . ∃r.} A1 . i=1 i n This infinite sequence of ALC consequences of T cannot be captured by any ALC Σ- TBox T 0 , which can be proved formally using the criterion (∗m ) of Theorem 9 in [13]. 4 Case Study We have implemented a prototype of an inference computation architecture using the calculus Tu and the inference relation ⇒α in Java. It has turned out that our initial implementation of Algorithm 1 did not perform well in practice. This is in particular due to the fact that clauses can contain sets F of other clauses in existential literals ∃r.F, which renders all the possible inferences on clauses from F ‘explicit’. For example, if we resolve the universal clause which just consists of the existential literal ∃r.(A) with the universal clauses ¬A t B1 , . . . , ¬A t Bn on the concept name A, then not only the clauses ∃r.(A, B1 ), ∃r.(A, B2 ), ∃r.(A, B3 ),. . . could be derived but all clauses of the form ∃r.(A, G), where G is a subset of {B1 , . . . , Bn }. A common technique to reduce the number of inferences that have to be made is to use forward- and backward deletion of subsumed clauses [2]. However, it is known [1] that the subsumption lemma (stating that if a clause E results from an inference in- volving two clauses C and D, and if there exist clauses C 0 , D0 such that C 0 subsumes C and D0 subsumes D, then either E is subsumed by one of C 0 , D0 , or a clause E 0 can be de- rived from C 0 and D0 such that E 0 subsumes E) does not hold even in the modal logic K for the standard minimal subsumption relation ≤s and ⇒α . To be able to prove that one can safely discard subsumed clauses, we have modified the inference relation ⇒α by introducing the following additional rule (rule ∃f ) C1 t ∀r.D, C2 t ∃r.F =⇒∃f C1 t C2 t ∃r.(F, D). We will denote the resulting inference relation by ⇒fα with α ∈ NC ∪ {⊥, ∃f }. One can then prove that a variant of the subsumption lemma holds for the relations ≤s and ⇒fα , which allows us to employ forward- and backward deletion of subsumed clauses in our implementation. In order to further speed up computations, we first extract the locality-based >⊥∗ Σ-module [5, 14] for a given TBox T . The locality-based module entails the same Σ- inclusions as the TBox T but it is often considerably smaller in size. We also rely on ontologies to have structure: if a concept name occurs in several inclusions, it is likely that it occurs in the same syntactic pattern. Thus, 1. If the clause set contains some clauses C1 t DΥ , . . . , Cm t DΥ such that for every 1 ≤ i ≤ m we have sig(Ci ) ∩ Υ = ∅, we rewrite them into X t DΥ , where X ≡ C1 u . . . u Cm , perform forgetting on Υ symbols and then replace X with its definition. 2. If the clause set contains a clause C t ∃r.(FΥ , G1 ) t . . . t ∃r.(FΥ , Gm ) where sig(Gi ) ∩ Υ = ∅ for every 1 ≤ i ≤ m, we rewrite it into C t ∃r.(FΥ , Y ), where Y ≡ G1 t . . . t Gm , perform forgetting on Υ and then replace Y with its definition. Experimental setting. All experiments were conducted on PCs equipped with an Intel Core i5-2500K CPU running at 3.30GHz. 15 GiB of RAM were allocated to the Java VM and an execution timeout of 60 CPU minutes was imposed on each problem. We only considered ALC-fragments of ontologies, i.e. for a given ALC-ontology T we re- moved all the axioms which contain non-ALC concept (or role) constructors to obtain its ALC-fragment. We used Algorithm 1 to forget concept names one by one (i.e. for sig(T ) \ Σ = {A1 , . . . , An }, Algorithm 1 was applied iteratively on A1 , . . . , An ), and we did not impose a bound on the depth of clauses. After each run of Algorithm 1 we further simplified the resulting clause set N by removing clauses D for which there exists C ∈ N with C ≤s D. Thus, in all the experiments reported on in this section we computed true Σ-uniform interpolants (i.e. not a depth-bounded variant). The correct- ness of our extensions to Algorithm 1 can be shown by model-theoretic arguments. Exhibiting Hidden Relations. We applied our uniform interpolation tool to compute uniform interpolants w.r.t. small signatures Σ ⊆ sig(T ) with sig(T ) ∩ NR = Σ ∩ NR for a fragment of version 1.4 of the Drug Interaction Knowledge Base3 (DIKB) and for a fragment of version 08.10e of the National Cancer Institute Thesaurus4 (NCI) that are of expansion depth 3 (that is, we removed all the axioms from both ontologies that led to an expansion depth greater than 3). The resulting DIKB fragment is a small acyclic terminology that contains 120 concept names, 27 roles names, and 127 axioms. The NCI fragment is also an acyclic terminology with 53571 concept names, 78 role names and 62494 axioms (of which 2362 are of the form A ≡ C). For each considered sample size x and terminology T we generated 100 signatures Σ by randomly choosing x concept names from sig(T ) and by adding all the role names from sig(T ) to Σ. 3 Available from http://bioportal.bioontology.org/ontologies/1672 4 Available from http://evs.nci.nih.gov/ftp1/NCI_Thesaurus Successful Average Nr Average Maximal Ontology |Σ ∩ NC | Computations of Axioms Axiom Size 5 85 7.482 19.176 DIKB v1.4 10 60 14.033 24.933 15 44 25.114 26.568 50 65 21.369 30.538 NCI v08.10e 100 56 41.089 62.839 150 41 63.146 104.756 Table 1. Computing Uniform Interpolants of DIKB and of NCI Limited to Expansion Depth 3 The results that we obtained for the two ontologies are shown in Table 1. The third column gives the number of signatures for which a uniform interpolant could be com- puted within the given time limit, and the average number of axioms and the average of the maximal size of the axioms contained in each uniform interpolant are shown in the subsequent columns. Most uniform interpolants could be computed within 60 seconds. In a few cases, however, the computations took up to 3487 seconds. One can see that the number of successful computations decreased with increas- ing size of Σ ∩ NC , which seems to be due to the fact that the >⊥∗ Σ-modules then contain more symbols that lead to a large number of inferences. Most uniform inter- polants that we have obtained are relatively small and contain a lot of expressions of the form ∃r1 . . . ∃rn .>. In some cases the process of forgetting certain intermediate concept names generated a few hundred clauses that were simplified or deleted in the remaining computation steps. Ontology Obfuscation. As a proof of concept, we applied our uniform interpolation tool on (a fragment of) the Lipid Ontology5 (LiPrO) to forget 45 concept names which are intermediate concept names in the ontology’s induced concept hierarchy, i.e. those concept names group certain subconcepts together to give structure to the ontology. LiPrO is an acyclic terminology with 593 axioms, 574 concept names and one role name. The maximal size of an axiom is 50. It then took 192 CPU seconds to compute the uniform interpolant, which contains 3415 axioms with a maximal size of 283. The uniform interpolant that we computed thus approximately contains 6 times more axioms than the original ontology and the maximal axiom size has increased by a factor of 6 as well. Notice that most of the ori- ginal structure of the ontology has been destroyed while preserving all the consequences entailed by the retained concept names. Logical Difference. We also used our implementation for the computation of the logical difference [9] between two versions of the NCI thesaurus on various signatures. The Σ-logical difference between ALC-TBoxes T1 and T2 is the set Diff Σ (T1 , T2 ) of all ALC-concept inclusions C v D such that sig(C v D) ⊆ Σ, T1 |= C v D, and T2 6|= C v D. Notice that if Diff Σ (T1 , T2 ) is nonempty, it is infinite already. (Σ) (Σ) It is easy to see that Diff Σ (T1 , T2 ) = ∅ if, and only if, T2 |= T1 where T1 is a (Σ) (Σ) Σ-uniform interpolant of T1 . Moreover, if T2 6|= T1 , every inclusion C v D ∈ T1 5 Available from http://bioportal.bioontology.org/ontologies/1183 |(sig(T ) \ Σ) Successful Average Nr Average Maximal ∩NC | Computations of Witnesses Witness Size 50 100 1976.600 69.380 100 99 1975.939 74.969 150 95 1974.874 72.284 Table 2. Computing Uniform Interpolants of NCI-08.10e Limited to Expansion Depth 3 with T2 6|= C v D is said to be a witness of Diff Σ (T1 , T2 ). In our experiments we used the reasoner FaCT++ v1.6.2 [15] to determine whether any axiom C v D ∈ (Σ) T1 is a witness of Diff Σ (T1 , T2 ). To make the experiments more challenging for the reasoner, we focused on comparing (a fragment of) NCI v08.10e (as T1 ) with (a fragment of) NCI v.08.09d (as T2 ), both limited to an expansion depth of 3, on large signatures Σ with Σ∩NR = sig(T )∩NR . In that way the computed uniform interpolants remain rather large as well. For each sample size x ∈ {50, 100, 150} we again generated 100 signatures by randomly choosing |sig(T )∩NC |−x concept names from sig(T ) and by including all the role names from sig(T ). The results that we obtained are now shown in Table 2. Note that these results are not directly comparable with the logical difference for the description logics of the EL family [9, 10] as illustrated by Example 12. One can observe that as size of sig(T ) \ Σ increases, i.e. more symbols have to be forgotten from the >⊥∗ Σ-modules, the success rate dropped slightly. Overall, the average number of witnesses and the average maximal size of the witnesses remains comparable throughout the different sample sizes. Also, the axioms generated by the computation of the uniform interpolant did not pose a problem for FaCT++ as comput- ing the logical difference for a given signature never took more than 20 seconds in our experiments. 5 Conclusion In this paper we presented an approach based on clausal resolution for computing uni- form interpolants of ALC-TBoxes T w.r.t. signatures Σ ⊆ sig(T ) that contain all the role names present in T . We proved that whenever the saturation process under ALC- resolution terminates, the algorithm computes a uniform interpolant. To guarantee ter- mination on all inputs, we introduced a depth-bounded version of our algorithm. We showed that by choosing an appropriate bound on the depth of clauses, one can axio- matise all Σ-inclusions implied by the given TBox up to a specified depth. Combined with a known bound on the size of uniform interpolants, our depth-bounded procedure always computes a uniform interpolant if it exists. In the second part of this paper we investigated how often our unrestricted resolution- based algorithm terminates with a uniform interpolant by applying our prototype im- plementation on a number of case studies. Our findings suggest that despite a high com- putational complexity uniform interpolants can be computed in many practical cases. The computation procedure could further benefit from better redundancy elimination techniques, which, together with extending our approach to forgetting role names, con- stitutes future work. It would also be interesting to explore proof strategies for our resolution calculi that guarantee termination when uniform interpolants exist. References 1. Auffray, Y., Enjalbert, P., Hébrard, J.J.: Strategies for modal resolution: Results and prob- lems. Journal of Automated Reasoning 6(1), 1–38 (1990) 2. Bachmair, L., Ganzinger, H.: Resolution theorem proving. In: Handbook of automated reas- oning, vol. 1, chap. 2, pp. 19–99. Elsevier (2001) 3. ten Cate, B., Conradie, W., Marx, M., Venema, Y.: Definitorially complete description logics. In: Proceedings of the Tenth International Conference on Principles of Knowledge Repres- entation and Reasoning (KR 2006). pp. 79–89. AAAI Press (2006) 4. Collberg, C.S., Thomborson, C.D., Low, D.: Manufacturing cheap, resilient, and stealthy opaque constructs. In: Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’98). pp. 184–196. ACM (1998) 5. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontologies: theory and practice. Journal of Artificial Intelligence Research (JAIR) 31, 273–318 (2008) 6. Eiter, T., Ianni, G., Schindlauer, R., Tompits, H., Wang, K.: Forgetting in managing rules and ontologies. In: Proceedings of the 2006 IEEE / WIC / ACM International Conference on Web Intelligence (WI 2006). pp. 411–419. IEEE Computer Society (2006) 7. Enjalbert, P., del Cerro, L.F.: Modal resolution in clausal form. Theoretical Computer Sci- ence 65(1), 1–33 (1989) 8. Herzig, A., Mengin, J.: Uniform interpolation by resolution in modal logic. In: Proceedings of the 11th European Conference on Logics in Artificial Intelligence (JELIA 2008). Lecture Notes in Computer Science, vol. 5293, pp. 219–231. Springer (2008) 9. Konev, B., Walther, D., Wolter, F.: The logical difference problem for description logic ter- minologies. In: Proceedings of the 4th International Joint Conference on Automated Reas- oning (IJCAR 2008). Lecture Notes in Computer Science, vol. 5195, pp. 259–274. Springer (2008) 10. Konev, B., Ludwig, M., Walther, D., Wolter, F.: The logical difference for the lightweight description logic EL. Journal of Artificial Intelligence Research (JAIR) 44, 633–708 (2012) 11. Konev, B., Walther, D., Wolter, F.: Forgetting and uniform interpolation in large-scale de- scription logic terminologies. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI 2009). pp. 830–835 (2009) 12. Lutz, C., Seylan, I., Wolter, F.: An automata-theoretic approach to uniform interpolation and approximation in the description logic EL. In: Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning (KR 2012). AAAI Press (2012) 13. Lutz, C., Wolter, F.: Foundations for uniform interpolation and forgetting in expressive de- scription logics. In: Walsh, T. (ed.) Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011). pp. 989–995 (2011) 14. Sattler, U., Schneider, T., Zakharyaschev, M.: Which kind of module should I extract? In: Proceedings of the 22nd International Workshop on Description Logics (DL 2009). CEUR Workshop Proceedings, vol. 477. CEUR-WS.org (2009) 15. Tsarkov, D., Horrocks, I.: FaCT++ Description Logic reasoner: System description. In: Proceedings of the Third International Joint Conference on Automated Reasoning (IJCAR 2006). Lecture Notes in Computer Science, vol. 4130, pp. 292–297. Springer (2006) 16. Wang, K., Wang, Z., Topor, R., Pan, J.Z., Antoniou, G.: Eliminating concepts and roles from ontologies in expressive descriptive logics. Computational Intelligence (Accepted for publication), DOI: 10.1111/j.1467-8640.2012.00442.x 17. Wang, Z., Wang, K., Topor, R., Pan, J.Z.: Forgetting Concepts in DL-Lite. In: Proceedings of the 5th European Semantic Web Conference (ESWC2008). Lecture Notes in Computer Science, vol. 5021, pp. 245–257. Springer (2008) 18. Wang, Z., Wang, K., Topor, R.W., Zhang, X.: Tableau-based forgetting in ALC ontologies. In: Proceedings of the 19th European Conference on Artificial Intelligence (ECAI 2010). Frontiers in Artificial Intelligence and Applications, vol. 215, pp. 47–52 (2010) 19. Zhou, Y., Zhang, Y.: Bounded forgetting. In: Proceedings of the Twenty-Fifth AAAI Confer- ence on Artificial Intelligence (AAAI 2011). AAAI Press (2011)