=Paper=
{{Paper
|id=Vol-2211/paper-13
|storemode=property
|title=Computing Best Ontology Excerpts via Weighted Partial Max-SAT Solving
|pdfUrl=https://ceur-ws.org/Vol-2211/paper-13.pdf
|volume=Vol-2211
|authors=Jieying Chen,Yue Ma,Dirk Walther
|dblpUrl=https://dblp.org/rec/conf/dlog/ChenM018
}}
==Computing Best Ontology Excerpts via Weighted Partial Max-SAT Solving==
Computing Best Ontology Excerpts via Weighted Partial Max-SAT Solving? Jieying Chen1 , Yue Ma1 , Dirk Walther2 1 Laboratoire de Recherche en Informatique, Université Paris-Sud, France 2 Fraunhofer IVI, Dresden, Germany {jieying.chen,yue.ma}@lri.fr, dirk.walther@ivi.fraunhofer.de Abstract. We consider the problem of computing best excerpts of on- tologies, which are selections of a certain, small number k of axioms that best capture the knowledge regarding a given set of weighted terms. Such excerpts can be useful for ontology selection, for instance. The weights of terms are to reflect different importances of considered terms, for which we propose an incompleteness measure of excerpts based on the term weights and the notion of logical difference. We present a method to compute best k-excerpts by finding all subsumption justifications and solving a weighted partial Max-SAT problem. We demonstrate the via- bility of our approach with an evaluation on biomedical ontologies. 1 Introduction In the presence of an ever growing amount of information, organizations and human users need to be able to focus on certain key pieces of information and to intentionally ignore all other possibly relevant parts. Knowledge about complex systems that is represented in ontologies yields collections of axioms that are too large for human users to browse, let alone to comprehend or reason about it. To address the need for ontology summary, the notion of an ontology excerpt was introduced [3]. An ontology excerpt is a fixed-size subset of an ontology, consisting of the most relevant axioms for a given set of terms. These axioms preserve as much as possible the knowledge about the considered terms described in the ontology. To evaluate the quality of ontology excerpts, we define a semantics-based measure, using Logical Difference [6], to quantify how much semantic meaning is preserved in an excerpt compared to the original ontology. We take the logical difference to be the set of queries that are relevant to an application domain and that produce different answers when evaluated over the considered ontologies. We consider concept subsumption queries over the terms of interest only. Using an exhaustive search to find the excerpts of an ontology that best preserve the semantic information w.r.t. the ontology is futile as it involves com- puting all (i.e. exponentially many) subsets of the ontology. In [3] efficient excerpt extraction techniques are investigated that are based on methods stemming from ? This work is partially funded by the ANR project GoAsQ (ANR-15-CE23-0022). the area of information retrieval [10], a research area which is generally concerned with developing techniques to extract the “most relevant” documents for a query from large data sources. However, these techniques do not yield the best excerpts possible. In this paper, we present an algorithm that computes a best excerpt of an ontology, i.e., there is no subset of the ontology of the same size that better preserves the knowledge about the selected terms. This paper extends the ap- proach from [4] using an encoding to a weighted partial Max-SAT problem to handle weighted signatures. We proceed as follows. We start in Section 2 with reviewing the notions of logical difference together with difference witness and best excerpt as well as the weighted partial Max-SAT problem. A best k-excerpt of a TBox is a sub-ontology that containing at most k axioms that best capture the knowledge about user’s interests in terms of weighted signature. Moreover, we introduce the notion of subsumption modules of a TBox for a given set of terms. Taking the union of some subsumption modules can yield best excerpts for a weighted signature, as shown in Section 3. We encode the problem of selecting the right subsumption modules on weighted signature as a weighted partial Max-SAT problem. Overall we obtain an algorithm for computing best k-excerpts on weighted signature. In Section 4, we propose a method to rank axioms in excerpts when presenting user a best excerpt. We evaluate our algorithm by using it with the prominent bio-medical ontologies Snomed CT in Section 5. We demonstrate computing best k-excerpts is a viable approach for the task of summarizing large ontologies with a controllable small number of axioms that are most relevant for a given set of terms. We conclude the paper in Section 6. 2 Preliminaries We briefly recall basic notions related to the ontology excerpts, and the logical difference between terminologies [6, 8]. We assume familiarity with Description Logic; see [1] for a detailed introduction. 2.1 Ontology Excerpt In this section, we review the notions of a k-excerpt, an incompleteness measure and a best k-excerpt of a TBox [3]. We recall that a signature is a finite set of concept and role names, and we use sig(ξ) to denote the signature of an syntactic object ξ. A k-excerpt of an ontology is a subset of the ontology containing no more than k axioms. An incompleteness measure quantifies how much of the knowledge of a TBox about the terms in a signature is captured (or how much is missing) by an excerpt. Definition 1 (Incompleteness Measure). Let T be a TBox, and let Σ be a signature. An incompleteness measure µΣ T for T and Σ is a function that maps every subset E ⊆ T to a non-negative real number µ(E). A best k-excerpt best captures the knowledge about Σ in T among all k- excerpts. How well a k-excerpt captures knowledge is measured by a suitable incompleteness measure. Note that best k-excerpts are generally not unique. Definition 2 (Best Excerpt). Let T be a TBox, let Σ be a signature, and let k > 0 be a natural number. Additionally, let µΣ T be an incompleteness measure for T and Σ. A best k-excerpt of T w.r.t. Σ under µΣ T is a k-excerpt E of T such that 0 0 µΣ Σ T (E) = min{ µT (E ) | E is a k-excerpt of T }. In [3,4], the incompleteness measure µ was set to be the number |cWtnΣ (T , E)| of concept inclusion difference witnesses in Σ w.r.t. T , which will be introduced in the next section. 2.2 Logical Difference We now recall basic notions related to the logical difference between two ELHr - terminologies for EL-inclusions over a given signature as query language [6, 8]. Definition 3 (Logical Difference). Let T1 and T2 be two EL-terminologies, and let Σ be a signature. The concept inclusion difference between T1 and T2 w.r.t. Σ is the set cDiff Σ (T1 , T2 ) of all EL-inclusions α of the form C v D for EL-concepts C and D such that sig(α) ⊆ Σ, T1 |= α, and T2 6|= α. In case two terminologies T1 and T2 are logically different, the set cDiff Σ (T1 , T2 ) consists of infinitely many concept inclusions. The primitive witnesses theorems from [6] allow us to consider only certain inclusions of a simpler syntactic form. Theorem 1. Let T1 and T2 be two EL-terminologies, and let Σ be a signature. If α ∈ cDiff Σ (T1 , T2 ), then either A v D or C v A is a member of cDiff Σ (T1 , T2 ), where A ∈ sig(α) is a concept name, and C, D are EL-concepts occurring in α. We use the primitive witnesses theorem to obtain a succinct representation of the set cDiff Σ (T1 , T2 ). Definition 4 (Primitive Witnesses). Let T1 and T2 be two EL-terminologies, and let Σ be a signature. We say that concept inclusion difference witnesses in Σ w.r.t. T1 and T2 are concept names contained in Σ that occur on the left-hand side of inclusions of the form A v C in cDiff Σ (T1 , T2 ) or on the right-hand side of inclusions of the form D v A in cDiff Σ (T1 , T2 ). The set of all such witnesses will be denoted by cWtnΣ (T1 , T2 ). The set cWtnΣ (T1 , T2 ) is finite as Σ is finite. In particular, it holds that cDiff Σ (T1 , T2 ) = ∅ iff cWtnΣ (T1 , T2 ) = ∅. For (possibly cyclic) EL-terminologies T1 and T2 , the set cWtnΣ (T1 , T2 ) can be computed using the current version of the CEX tool [7] implementing a proof- theoretic approach to the logical difference problem using hypergraphs as intro- duced in [5] and further extended in [9]. 2.3 Weighted Partial Max-SAT problem Let Π be a countably infinite set of propositional variables which we denote with p, q, etc. The set Φ of propositional logic formulas ϕ is built according to the following grammar rule: ϕ ::= p | ¬ϕ | ϕ ∧ ϕ | ϕ ∨ ϕ | ϕ → ϕ | ϕ ↔ ϕ with p ∈ Π. A propositional valuation v : Π → {0, 1} is function mapping a propositional variable p ∈ Π to a truth value v(p) ∈ {0, 1}. A propositional formula ϕ ∈ Φ is satisfied in v iff v(ϕ) = 1. We say that ϕ is satisfiable iff there exists a valuation v such that v(ϕ) = 1. A literal ` is a propositional variable p or a negated propositional variable ¬p, and a clause is a disjunction `1 ∨ . . . ∨ `n of literals `i (0 ≤ i ≤ n). For a propositional formula ϕ ∈ Φ, we denote with Cls(ϕ) the transformation of ϕ into an equi-satisfiable clause set, i.e. Cls(ϕ) V is a finite set of propositional clauses such that ϕ is satisfiable iff the formula ψ∈Ψ ψ is satisfiable, where Ψ = Cls(ϕ). It is known that every propositional formula ϕ ∈ Φ can be transformed in polynomial time (in the size of ϕ) into an equi-satisfiable set of clauses. Note that the clauses in Cls(ϕ) may contain propositional variables that are not present in ϕ. The Boolean Satisfiability Problem (SAT), also known as Propositional Sat- isfiability Problem, is the problem of deciding whether there exists a valuation that satisfies a given propositional formula ϕ, or alternatively, all the clauses contained in a set of clauses Ψ = Cls(ϕ). As an extension, the set Ψ is split into two sets H and S, called hard and soft clause sets. The problem of finding a valuation that satisfies all the clauses in H and a maximal number clauses in S is called the partial Max-SAT problem. A weighted clause is a pair (ψ, wψ ), where ψ is a clause and wψ is a natural number or infinity meaning the cost for falsifying the clause ψ. If a clause ψ ∈ H, then wψ is infinite. Given an assignment I and a multiset of weighted clauses Φ, the cost of assignment I on Φ is the sum of the weights of the clauses falsified by I. The Weighted Partial Max-SAT problem for weighted clauses is the problem of finding an optimal assignment to the variables of Φ that minimizes the cost of the assignment on Φ. 2.4 Subsumption Justifications Subsumption module and justification are defined to capture the set of ax- ioms required to preserve the logical consequences related to a particular con- cept name [4]. An algorithm for computing subsumption justifications for ELH- terminologies is presented in [4]. Definition 5 (Subsumption Module & Justification). Let T be a termi- nology, let Σ be a signature, and let X ∈ NC . Additionally, let M ⊆ T . We say that M is an hX, Σi-subsumption module of T iff for every C, D ∈ ELΣ : T |= X v C implies M |= X v C. Plus T |= D v X implies M |= D v X. An hX, Σi- subsumption justification M of T is an hX, Σi-subsumption mod- ule of T that is minimal w.r.t. (. Property 1. Let T be a terminology, and let Σ be a signature. Additionally, let M be a hX, Σi-subsumption justification of T . Then X 6∈ cDiff Σ (T , M). 3 Computing Best k-Excerpts for Weighted Signatures Best k-excerpts provide users with a way to extract the most relevant axioms from a big ontology with respect to a given signature. However, in certain cases, concepts or roles in the signature can have different importances. To this end, we propose a new incompleteness measure considering a weighted signature and study the corresponding best k-excerpts. First, we introduce the definition of weighted signature. Definition 6 (Weighted Signature). A weight function is a function w : NC ∪ NR → R mapping a symbol σ ∈ NC ∪ NR to a value w(σ) ∈ R ∪ {∞}. Given a signature Σ and a weight function w, a weighted signature Σw is a set Σw = { (σ, w(σ)) | σ ∈ Σ }. A signature Σ is weighted uniformly under a weight function w iff w assigns the same value to every symbol in Σ, i.e. w(σ1 ) = w(σ2 ) for every σ1 , σ2 ∈ Σ. For a weight function w whose range equals {1.0} (i.e., w maps every symbol to 1.0), we also write Σ ◦ instead of Σw . For instance, Σw = {(A, 2.0), (B, 0.9), (r, 1.3)} is the weighted signature for the signature Σ = {A, B, r} and a weight function w satisfying w(A) = 2.0, w(B) = 0.9 and w(r) = 1.3. On the other hand, a weight function w0 satisfying w0 (A) = w0 (B) = w0 (r) = 1.4 weighs the symbols in Σ uniformly resulting in Σw0 = {(A, 1.4), (B, 1.4), (r, 1.4)}, whereas Σ ◦ = {(A, 1.0), (B, 1.0), (r, 1.0)}. For best k-excerpts under a weighted signature Σw , we use, as an incom- pleteness measure µΣ T w for Σw and T , the sum of the weights of the EL-concept inclusion difference witnesses in Σ w.r.t. T and E, formally: X µΣ T (E) = w w(X) X∈cWtnΣ (T ,E) For computing the best k-excerpts for a weighted signature, we present an encoding of the problem to a weighted partial Max-SAT problem, with the aim of delegating the task of finding the best excerpt to a Max-SAT solver. In that way we can leverage the decades of research efforts dedicated to developing efficient SAT solvers for our problem setting. Note that our encoding requires that a cover of the hA, Σi-subsumption jus- tifications has already been computed for every concept name A ∈ Σ. Moreover our partial Max-SAT encoding is language agnostic, i.e., it works for any TBox as long as its subsumption justifications can be computed. As we are interested in finding a subset M ⊆ T of a TBox T that fulfills certain properties, we encode the presence of an axiom α ∈ T in M using a propositional variable pα . Intuitively, in a solution v to our partial Max-SAT problem we will have that v(pα ) = 1 iff the axiom α is contained in the best k-excerpt. Next we introduce the formulas FM and GM with M ⊆ T and M ⊆ 2T that we will use to encode the covers M of hA, Σi-subsumption justifications M. Definition 7 (Encoding of TBox Subsets). Let T be a terminology. For every axiom α ∈ T , let pα ∈ Π be a fresh propositional variable. V M ⊆ T , we define the propositional formula FM that encodes M For a subset as FM := α∈M pα . Moreover, for a set M ⊆ 2T , we define the propositional formula GM associated with M as _ _ ^ GM := FM = ( pα ). M∈M M∈M α∈M Example 1. Let T = {α1 , α2 , α3 , α4 , α5 }, M = {α1 , α2 , α3 }, and M = {{α2 , α3 }, {α1 , α4 }}. Then FM = pα1 ∧ pα2 ∧ pα3 and GM = (pα2 ∧ pα3 ) ∨ (pα1 ∧ pα4 ). We can now define our encoding of the best k-excerpt problem for weighted signature into weighted partial Max-SAT. Definition 8 (Encoding of the Best Excerpt Problem for Weighted Signature). Let T be an EL-TBox, let Σw be a weighted signature, and let 0 ≤ k ≤ |T |. Additionally, for every A ∈ Σw , let MA be a cover of the hA, Σw i- subsumption justifications of T , and let qA be a fresh propositional variable. The weighted partial Max-SAT problem for finding best k-excerpts of T w.r.t. Σw , denoted with Pk (T , Σw ), is defined as follows. We set Pk (T , Σw ) := (Hk (T ), Sk (T , Σw )), where [ Hk (T ) := Card(T , k) ∪ Cls(qA ↔ GMA ), A∈Σ∩NC Sk (T , Σw ) := { (qA , w(A)) | A ∈ Σw ∩ NC }, and Card(T , k) is a set of clauses specifying that at most k clauses from the set { pα | α ∈ T } must be satisfied. In the hard part of our weighted partial Max-SAT problem, the clauses in Card(T , k) specify that the cardinality of the resulting excerpt E ⊆ T must be equal to k. We do not fix a certain encoding that should be used to obtain Card(T , k), but we note that there exist several techniques that require a poly- nomial number of clauses in k and in the size of T 1 (see e.g. [11]). Moreover, for every concept name A ∈ Σw , the variable qA is set to be equivalent to the formula GMA , i.e. qA will be satisfied in a valuation iff the resulting excerpt will not have A as a difference witness (A 6∈ cWtnΣ (T , E)). In the soft part of the problem, the set Sk (T , Σw ) specifies that the sum of the weights of the concept names in the signature must be maximal by a solution of the problem, enforcing that the resulting excerpt E will yield the smallest 1 In our implementation we used the encodings for cardinality constraints that are provided by the Max-SAT solver Sat4j. possible sum of weights of difference witnesses (whilst obeying the constraint that |E| = k). We can now show the correctness of our encoding, i.e. a best k-excerpt for a weighted signature can be exactly obtained from any solution to the weighted partial Max-SAT problem Pk (T , Σw ). Theorem 2. Let T be an EL-TBox, and let Σw be a weighted signature. Then v is a solution of the weighted partial Max-Sat problem Pk (T , Σw ) if and only if { α ∈ T | v(pα ) = 1 } is a best k-excerpt of T w.r.t. Σw . Our algorithm for finding a best k-excerpt given a TBox T and a weighted signature Σw (and k with 0 ≤ k ≤ |T |) as input is shown in Algorithm 1. First, the algorithm checks whether k = 0 or k = |T |, in which case the empty set or T itself is the best k-excerpt, respectively. Otherwise, the algorithm continues by computing subsumption justifications for each concept name A in Σw and then transfer them to propositional formula GJA . After the iteration over all the concept names A in Σw is complete, the partial Max-SAT problem Pk (T , Σw ) is constructed with the help of the formulas GMA that are stored in S. Subsequently, a solution v of Pk (T , Σw ) is computed using a partial Max-SAT solver and the best k-excerpt is returned by analyzing which variables pα have been set to 1 in the valuation v. Algorithm 1: Computing Best k-Excerpts of T for Weighted Signature Σ 1 function ComputeBestExcerpt(T , Σw , k) 2 if k = 0 then 3 return ∅ 4 if k = |T | then 5 return T 6 S := ∅ 7 for every A ∈ Σw ∩ NC do 8 Compute hA, Σw i-subsumption justifications of T : JT (A, Σw ) 9 Transfer JT (A, Σw ) to its propositional formula GJA 10 S := S ∪ {GJA } 11 Compute Pk (T , Σw ) using S 12 Find the set of solutions V of Pk (T , Σw ) using weighted partial Max-SAT solver 13 return { α ∈ T | v(pα ) = 1, v ∈ V } Our algorithm first computes subsumption justifications, which requires ex- ponential time in the size of T and Σw . Then the weighted partial Max-SAT encoding runs in polynomial time. Deciding whether there is a solution of an in- stance of a Max-SAT problem is NP-complete. Overall we have that Algorithm 1 runs in exponential time in the size of T and Σw in the worst case. Example 2. Let T = {α1 := A v X1 u X2 u X3 u X4 , α2 := X1 v B1 , α3 := X2 v B1 , α4 := X3 v B2 , α5 := X4 v B2 } be a TBox. Let Σ = {A, B1 , B2 } be a signature. Then, for X ∈ Σ, we obtain the following hX, Σi- subsumption justifications MX : MA = {{α1 , α2 }, {α1 , α3 }, {α1 , α4 }, {α1 , α5 }}, MB1 = {{α1 , α2 }, {α1 , α3 }} and MB2 = {{α1 , α4 }, {α1 , α5 }}. Each subsump- tion justification MX is encoded as a propositional logic formula GX as follows: GA = ((p1 ∧ p2 ) ∨ (p1 ∧ p3 ) ∨ (p1 ∧ p4 ) ∨ (p1 ∧ p5 )), GB1 = ((p1 ∧ p2 ) ∨ (p1 ∧ p3 )) and GB2 = ((p1 ∧ p4 ) ∨ (p1 ∧ p5 )). Now let w1 , w2 , w3 be weight functions such that w1 (A) = w1 (B1 ) = w1 (B2 ) = 1.0, w2 (A) = w2 (B1 ) = 1.0 and w2 (B2 ) = 2.0, and w3 (A) = w3 (B2 ) = 1.0, w3 (B1 ) = 2.0. Feeding the resulting weighted signatures Σw1 , Σw2 and Σw3 to- gether with k = 2 to a SAT solver yields the following solutions. For Σw1 , we obtain four assignments of truth values for the propositional variables in the tu- ple hp1 , p2 , p3 , p4 , p5 i (in this order): h1, 1, 0, 0, 0i, h1, 0, 1, 0, 0i, h1, 0, 0, 1, 0i and h1, 0, 0, 0, 1i. Then we decode the assignments to sets of axioms. We obtain the following best 2-excerpts for Σw1 : {α1 , α2 }, {α1 , α3 }, {α1 , α4 } and {α1 , α5 }. In the case of Σw2 , however, the solutions returned by the SAT-solver are the assignments h1, 0, 0, 1, 0i and h1, 0, 0, 0, 1i. Consequently, the best 2-excerpts for Σw2 are {α1 , α4 } and {α1 , α5 }. Finally, for Σw3 , the SAT-solver yields h1, 1, 0, 0, 0i and h1, 0, 0, 1, 0i resulting in {α1 , α2 } and {α1 , α4 } as the best 2-excerpts for Σw3 . Example 2 shows that, different weights of Σ-symbols can influent the final results of best excerpts. By giving higher weights on preferring Σ-concept names (like Σw2 or Σw3 ), users can reduce the number of best k-excerpts. However, in some extreme cases, as shown in the following example where a signature has a much higher weight then the others, the best excerpts for the wighted signature can be totally irrelevant to those w.r.t. Σ ◦ . Example 3. Let T = {α1 := A1 v X, α2 := X v Y, α3 := Y v Z, α4 := A2 v B1 , α5 := A3 v B2 } be a TBox. Let Σ = {A1 , A2 , A3 , B1 , B2 , Y, Z} be a signa- ture. Now let w1 , w2 be weight functions such that w1 (A1 ) = w1 (A2 ) = w1 (A3 ) = w1 (B1 ) = w1 (B2 ) = w1 (Y ) = w1 (Z) = 1.0 (i.e., Σw1 is Σ ◦ ) and w2 (A1 ) = w2 (A2 ) = w2 (A3 ) = w2 (Y ) = 1.0 and w2 (Z) = 9.0. We obtain that the best 3-excerpt EΣ ◦ for Σ ◦ is {α4 , α5 } 2 . The best 3-excerpt EΣw2 for Σw2 is {α1 , α2 }. However, | cWtnΣ (T , EΣ ◦ ) |=| {A1 , X, Y } |= 3 and | cWtnΣ (T , EΣw2 ) |=| {A2 , A3 , B1 , B2 } |= 4. In Example 3, as the w2 (Z) is much higher than the rest concept names in signature, EΣ ◦ is not the same as EΣw . The following theorem provides a guide of how to restrict weight function in order to keep monotonicity of best excerpts even though w changes. Proposition 1. Let T be an EL-TBox, Σw be a weighted signature and let Σ ◦ be a uniformed signature under the weighted function w. 1 (i) If 1 ≤ w(σ) 1 + |Σ∩N C| for every σ ∈ Σ ∩ NC , then every best k-excerpt of PT w.r.t. Σw will be a best k-excerpt of T w.r.t. Σ ◦ , but not vice versa. (ii) If σi ∈Σ1 wσi > σj ∈Σ2 wσj for any Σ1 , Σ2 ∈ 2Σ∩NC and | Σ1 |>| Σ2 |, P iff every best k-excerpt of T w.r.t. Σw is a best k-excerpt of T w.r.t. Σ ◦ . 2 The best 3-excerpt EΣ ◦ for Σw1 only contains two axioms α4 and α5 , as for any best ◦ Σ◦ 3-excerpt E 0 , µΣ 0 T (E ) = µT ({α4 , α5 }) 4 Ranking Axioms in Best Excerpts Best excerpts are useful to work with when a whole ontology exceeds the capacity of a system or a human user. However, multiple axioms often exist in the best excerpts. To better assist users, it would be desirable to give a ranking of these axioms according to their importances. In this section we present an approach to weight axioms according to weighted signatures. To achieve a ranking, we introduce the definition of weighted axioms. Definition 9 (Weighted Axiom). An axiom weight function φ is a function φ : T → R+ mapping an axiom α ∈ T to a value φ(α) ∈ R+ . Given an axiom weight function, it is then natural to introduce an axiom ranking. Intuitively, an axiom with a larger weight should be ranked more highly. Definition 10 (Axiom Ranking). Given an axiom weight function φ and two axioms α and β, we say α is preferred than β, denoted α β, if φ(α) ≥ φ(β). To achieve an axiom ranking, in the following, we show that a signature weight function can induce an axiom weight function. Intuitively, an axiom that appears in subsumption justifications of more Σ-symbols with higher weights should be more highly weighted. Definition 11. Let T be a TBox, Σw a signature with the weight function w. Additionally, Mσ be the set of all subsumption justifications of σ. Then the axiom weight function φ(·) induced by w is: X φΣw (α) = w(σ) {σ∈Σw | ∃M ∈Mσ s.t. α∈M } We denote Σw the axiom ranking induced by φΣw . Example 4 (Example 2 contd.). Note that α1 (resp. α2 , α3 , α4 , α5 ) is contained in at least one of the subsumption justifications of Σ-symbols {A, B1 , B2 } (resp. {A, B1 }, {A, B1 }, {A, B2 }, {A, B2 }). So for w3 , φ(α1 ) = w(A)+w(B1 )+w(B2 ) = 4.0, φ(α2 ) = φ(α3 ) = w(A)+w(B1 ) = 2.0, and φ(α4 ) = φ(α5 ) = w(A)+w(B2 ) = 3.0. Hence, for the 2-best excerpts w.r.t. w3 : {α1 , α2 } and {α1 , α4 }, we can order the axioms by α1 Σw α2 and α1 Σw α4 . It is easy to see that the axiom function φ(·) in Definition 11 gives a total ordering over the axioms in T w.r.t. to a given weighted signature Σw . Moreover, it is independent of excerpts as stated in the following. Proposition 2. Let T be a TBox, Σw a signature with the weight function w, for any best excerpts E1 and E2 and axioms α1 , α2 ∈ E1 ∩ E2 , α1 α2 in E1 iff. α1 α2 in E2 . That is, is a stable ordering of the axioms given a weighted signature. However, note that the axioms in best k-excerpts are not monotonic with respect to the size of k (see a counter-example given in [3]). Snomed CT NCI Time (s) Minimal Maximal Median STDEV Minimal Maximal Median STDEV MaxSat 0.01 1.27 0.15 0.35 0.01 0.16 0.01 0.02 Table 1. Times for computing best excerpts encoding the input for and running SAT4J 5 Evaluation To demonstrate that our algorithm works in practise, we implemented a JAVA prototype using SAT4J as a Max-SAT solver [2]. All the tests were performed on a server running Ubuntu 15.10 with Intel Xeon 2.50GHz Core 4 Duo CPU with 64G RAM. We used the prominent medical ontologies Snomed CT (ver- sion Jan 2016)3 consisting of 317 891 axioms, and the NCI Thesaurus (NCIt, version 16.03d)4 containing 165 341 axioms. The first experiment is to test the efficiency of our encoding method. Table 1 shows the time needed to compute best excerpts of NCI and Snomed CT for randomly generated signatures of different sizes (10/30/50 concept names and 10/30/50 roles names). In total over 3 000 signatures have been considered. After precomputing the subsumption justifications for every concept name in every signature, the time was measured as that needed to prepare the input and run the partial Max-SAT solver to compute best k-excerpts for every size k. The minimal, maximal, median, and standard deviation of the execution times show that computing best excerpts of any size using Max-SAT is very efficient given that the subsumption justifications have been computed. In the second experiment, we compare the size of locality based modules with the number of axioms in best excerpts needed to preserve a certain amount of knowledge. We denote with #(PreservedΣ (best) = n), for n ∈ {1, 2}, the minimal number of axioms needed by a best excerpt to preserve the knowledge of n concept names w.r.t. the signature Σ (i.e., the number k of axioms of a best excerpt E of T such that n = |Σ ∩ NC \ cWtnΣ (T , E)|). Instead of using random signatures, however, we consider a scenario where a user searches for sub-ontologies of Snomed CT related to a particular concept name. We com- puted 2 500 different signatures each consisting of a concept name related to diseases, the TOP-concept (called ‘SNOMED CT Concept’) and all role names of Snomed CT. All weights of Σ-symbols are equal in this experiment. In Figure 1, the 2 500 signatures are presented along the x-axis in ascending order w.r.t. the size of their respective ⊥>∗ -local modules (black line). The y-axis represents the number of axioms in the module and excerpts for a signature. The red (resp. green) line presents the sizes of best excerpts that preserve the knowl- edge for one (resp. two) concept name(s), i.e., #P reservedΣ (best) = 1 (resp. #P reservedΣ (best) = 2). Best excerpts provide a conciser way for zooming in on an ontology, in particular, when sacrificing completeness is acceptable. For 3 http://www.ihtsdo.org/snomed-ct 4 http://evs.nci.nih.gov/ftp1/NCI_Thesaurus 140 Module #Preserved (best) = 1 120 #Preserved (best) = 2 Number of Axioms 100 80 60 40 20 0 0 500 0 0 0 0 100 150 200 250 Signature ID Fig. 1. Comparison of the size of best excerpts preserving one or two concept names with the size of ⊥>∗ -local modules for 2 500 signatures this experiment, after computing the subsumption justifications of all concept names in a signature, it only took 0.15s on average to compute the best excerpts. 6 Conclusion We have presented an algorithm for computing a best k-excerpt of a TBox for given weighted signature. Best k-excerpts are k-element subsets of an ontology that best preserve entailments over the target signature compared to the on- tology [3]. The measure of incompleteness of the excerpt w.r.t. the ontology is based on the incompleteness measure taking into account weights of concept names in the signature. We have implemented our algorithm and we have pre- sented an experimental evaluation to demonstrate the viability of computing best excerpts of large biomedical ontologies in practice. We leave investigating alternative incompleteness measures for future work. References 1. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F. (eds.): The description logic handbook: theory, implementation, and applications. Cambridge University Press, 2 edn. (June 2010) 2. Berre, D.L., Parrain, A.: The sat4j library, release 2.2. JSAT 7(2-3), 59–6 (2010) 3. Chen, J., Ludwig, M., Ma, Y., Walther, D.: Towards extracting ontology excerpts. In: Proceedings of KSEM’15. pp. 78–89 (2015) 4. Chen, J., Ludwig, M., Ma, Y., Walther, D.: Zooming in on ontologies: Minimal modules and best excerpts. In: Proceeding of ISWC’17. pp. 173–189 (2017) 5. Ecke, A., Ludwig, M., Walther, D.: The concept difference for EL-terminologies using hypergraphs. In: Proceedings of DChanges’13 (2013) 6. Konev, B., Ludwig, M., Walther, D., Wolter, F.: The logical difference for the lightweight description logic EL. Journal of Artificial Intelligence Research 44, 633–708 (2012) 7. Konev, B., Ludwig, M., Wolter, F.: Logical difference computation with CEX2.5. In: Proceedings of IJCAR’12. pp. 371–377 (2012) 8. Konev, B., Walther, D., Wolter, F.: The logical difference problem for description logic terminologies. In: Proceedings of IJCAR’08. pp. 259–274 (2008) 9. Ludwig, M., Walther, D.: The logical difference for ELHr-terminologies using hy- pergraphs. In: Proceedings of ECAI’14. pp. 555–560 (2014) 10. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008) 11. Sinz, C.: Towards an optimal cnf encoding of boolean cardinality constraints. In: Proceedings of CP’05. pp. 827–831 (2005)