<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Computing Best Ontology Excerpts via Weighted Partial Max-SAT Solving?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jieying Chen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yue Ma</string-name>
          <email>yue.mag@lri.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dirk Walther</string-name>
          <email>dirk.walther@ivi.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer IVI</institution>
          ,
          <addr-line>Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Laboratoire de Recherche en Informatique, Universite Paris-Sud</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We consider the problem of computing best excerpts of ontologies, which are selections of a certain, small number k of axioms that best capture the knowledge regarding a given set of weighted terms. Such excerpts can be useful for ontology selection, for instance. The weights of terms are to re ect di erent importances of considered terms, for which we propose an incompleteness measure of excerpts based on the term weights and the notion of logical di erence. We present a method to compute best k-excerpts by nding all subsumption justi cations and solving a weighted partial Max-SAT problem. We demonstrate the viability of our approach with an evaluation on biomedical ontologies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In the presence of an ever growing amount of information, organizations and
human users need to be able to focus on certain key pieces of information and to
intentionally ignore all other possibly relevant parts. Knowledge about complex
systems that is represented in ontologies yields collections of axioms that are
too large for human users to browse, let alone to comprehend or reason about
it. To address the need for ontology summary, the notion of an ontology excerpt
was introduced [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. An ontology excerpt is a xed-size subset of an ontology,
consisting of the most relevant axioms for a given set of terms. These axioms
preserve as much as possible the knowledge about the considered terms described
in the ontology.
      </p>
      <p>To evaluate the quality of ontology excerpts, we de ne a semantics-based
measure, using Logical Di erence [6], to quantify how much semantic meaning
is preserved in an excerpt compared to the original ontology. We take the logical
di erence to be the set of queries that are relevant to an application domain and
that produce di erent answers when evaluated over the considered ontologies.
We consider concept subsumption queries over the terms of interest only.</p>
      <p>
        Using an exhaustive search to nd the excerpts of an ontology that best
preserve the semantic information w.r.t. the ontology is futile as it involves
computing all (i.e. exponentially many) subsets of the ontology. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] e cient excerpt
extraction techniques are investigated that are based on methods stemming from
? This work is partially funded by the ANR project GoAsQ (ANR-15-CE23-0022).
the area of information retrieval [10], a research area which is generally concerned
with developing techniques to extract the \most relevant" documents for a query
from large data sources. However, these techniques do not yield the best excerpts
possible. In this paper, we present an algorithm that computes a best excerpt of
an ontology, i.e., there is no subset of the ontology of the same size that better
preserves the knowledge about the selected terms. This paper extends the
approach from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] using an encoding to a weighted partial Max-SAT problem to
handle weighted signatures.
      </p>
      <p>We proceed as follows. We start in Section 2 with reviewing the notions of
logical di erence together with di erence witness and best excerpt as well as the
weighted partial Max-SAT problem. A best k-excerpt of a TBox is a sub-ontology
that containing at most k axioms that best capture the knowledge about user's
interests in terms of weighted signature. Moreover, we introduce the notion of
subsumption modules of a TBox for a given set of terms. Taking the union of
some subsumption modules can yield best excerpts for a weighted signature, as
shown in Section 3. We encode the problem of selecting the right subsumption
modules on weighted signature as a weighted partial Max-SAT problem. Overall
we obtain an algorithm for computing best k-excerpts on weighted signature.
In Section 4, we propose a method to rank axioms in excerpts when presenting
user a best excerpt. We evaluate our algorithm by using it with the prominent
bio-medical ontologies Snomed CT in Section 5. We demonstrate computing best
k-excerpts is a viable approach for the task of summarizing large ontologies with
a controllable small number of axioms that are most relevant for a given set of
terms. We conclude the paper in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        We brie y recall basic notions related to the ontology excerpts, and the logical
di erence between terminologies [6, 8]. We assume familiarity with Description
Logic; see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for a detailed introduction.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Ontology Excerpt</title>
        <p>
          In this section, we review the notions of a k-excerpt, an incompleteness measure
and a best k-excerpt of a TBox [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. We recall that a signature is a nite set of
concept and role names, and we use sig( ) to denote the signature of an syntactic
object .
        </p>
        <p>A k-excerpt of an ontology is a subset of the ontology containing no more than
k axioms. An incompleteness measure quanti es how much of the knowledge of
a TBox about the terms in a signature is captured (or how much is missing) by
an excerpt.</p>
        <sec id="sec-2-1-1">
          <title>De nition 1 (Incompleteness Measure). Let T be a TBox, and let be a</title>
          <p>signature. An incompleteness measure T for T and is a function that maps
every subset E T to a non-negative real number (E ).</p>
          <p>A best k-excerpt best captures the knowledge about in T among all
kexcerpts. How well a k-excerpt captures knowledge is measured by a suitable
incompleteness measure. Note that best k-excerpts are generally not unique.
De nition 2 (Best Excerpt). Let T be a TBox, let be a signature, and let
k &gt; 0 be a natural number. Additionally, let T be an incompleteness measure
for T and . A best k-excerpt of T w.r.t. under T is a k-excerpt E of T
such that</p>
          <p>T (E ) = minf T (E 0) j E 0 is a k-excerpt of T g:</p>
          <p>
            In [
            <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
            ], the incompleteness measure was set to be the number jcWtn (T ; E )j
of concept inclusion di erence witnesses in w.r.t. T , which will be introduced
in the next section.
2.2
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Logical Di erence</title>
        <p>r
We now recall basic notions related to the logical di erence between two E LH
terminologies for E L-inclusions over a given signature as query language [6, 8].</p>
        <sec id="sec-2-2-1">
          <title>De nition 3 (Logical Di erence). Let T1 and T2 be two E L-terminologies,</title>
          <p>and let be a signature. The concept inclusion di erence between T1 and T2
w.r.t. is the set cDi (T1; T2) of all E L-inclusions of the form C v D for
E L-concepts C and D such that sig( ) , T1 j= , and T2 6j= .</p>
          <p>In case two terminologies T1 and T2 are logically di erent, the set cDi (T1; T2)
consists of in nitely many concept inclusions. The primitive witnesses theorems
from [6] allow us to consider only certain inclusions of a simpler syntactic form.
Theorem 1. Let T1 and T2 be two E L-terminologies, and let be a signature.
If 2 cDi (T1; T2), then either A v D or C v A is a member of cDi (T1; T2),
where A 2 sig( ) is a concept name, and C; D are E L-concepts occurring in .</p>
          <p>(T1; T2).</p>
          <p>We use the primitive witnesses theorem to obtain a succinct representation
of the set cDi</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>De nition 4 (Primitive Witnesses). Let T1 and T2 be two E L-terminologies,</title>
          <p>and let be a signature. We say that concept inclusion di erence witnesses in
w.r.t. T1 and T2 are concept names contained in that occur on the left-hand
side of inclusions of the form A v C in cDi (T1; T2) or on the right-hand side
of inclusions of the form D v A in cDi (T1; T2). The set of all such witnesses
will be denoted by cWtn (T1; T2).</p>
          <p>The set cWtn (T1; T2) is nite as is nite. In particular, it holds that
cDi (T1; T2) = ; i cWtn (T1; T2) = ;.</p>
          <p>
            For (possibly cyclic) E L-terminologies T1 and T2, the set cWtn (T1; T2) can
be computed using the current version of the CEX tool [7] implementing a
prooftheoretic approach to the logical di erence problem using hypergraphs as
introduced in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] and further extended in [9].
2.3
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Weighted Partial Max-SAT problem</title>
        <p>Let be a countably in nite set of propositional variables which we denote with
p; q, etc. The set of propositional logic formulas ' is built according to the
following grammar rule: ' ::= p j :' j ' ^ ' j ' _ ' j ' ! ' j ' $ ' with p 2 .</p>
        <p>A propositional valuation v : ! f0; 1g is function mapping a propositional
variable p 2 to a truth value v(p) 2 f0; 1g. A propositional formula ' 2 is
satis ed in v i v(') = 1. We say that ' is satis able i there exists a valuation v
such that v(') = 1.</p>
        <p>A literal ` is a propositional variable p or a negated propositional variable
:p, and a clause is a disjunction `1 _ : : : _ `n of literals `i (0 i n). For a
propositional formula ' 2 , we denote with Cls(') the transformation of ' into
an equi-satis able clause set, i.e. Cls(') is a nite set of propositional clauses such
tkhnaotw'n itshasatteisvearyblperioptohsietifoonrmalufloarmVul2a ' 2is scaatins baebtlrea,nwshfoerrmeed =inCplosl(y'n)o. mItiaisl
time (in the size of ') into an equi-satis able set of clauses. Note that the clauses
in Cls(') may contain propositional variables that are not present in '.</p>
        <p>The Boolean Satis ability Problem (SAT), also known as Propositional
Satis ability Problem, is the problem of deciding whether there exists a valuation
that satis es a given propositional formula ', or alternatively, all the clauses
contained in a set of clauses = Cls('). As an extension, the set is split into
two sets H and S, called hard and soft clause sets. The problem of nding a
valuation that satis es all the clauses in H and a maximal number clauses in S
is called the partial Max-SAT problem.</p>
        <p>A weighted clause is a pair ( ; w ), where is a clause and w is a natural
number or in nity meaning the cost for falsifying the clause . If a clause 2 H,
then w is in nite. Given an assignment I and a multiset of weighted clauses ,
the cost of assignment I on is the sum of the weights of the clauses falsi ed by
I. The Weighted Partial Max-SAT problem for weighted clauses is the problem
of nding an optimal assignment to the variables of that minimizes the cost
of the assignment on .
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Subsumption Justi cations</title>
        <p>
          Subsumption module and justi cation are de ned to capture the set of
axioms required to preserve the logical consequences related to a particular
concept name [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. An algorithm for computing subsumption justi cations for E
LHterminologies is presented in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <sec id="sec-2-4-1">
          <title>De nition 5 (Subsumption Module &amp; Justi cation). Let T be a termi</title>
          <p>nology, let be a signature, and let X 2 NC. Additionally, let M T .</p>
          <p>We say that M is an hX; i-subsumption module of T i for every C; D 2
E L : T j= X v C implies M j= X v C. Plus T j= D v X implies M j= D v
X.</p>
          <p>An hX; i- subsumption justi cation M of T is an hX; i-subsumption
module of T that is minimal w.r.t. (.</p>
          <p>Property 1. Let T be a terminology, and let be a signature. Additionally, let
M be a hX; i-subsumption justi cation of T . Then X 62 cDi (T ; M).
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Computing Best k-Excerpts for Weighted Signatures</title>
      <p>Best k-excerpts provide users with a way to extract the most relevant axioms
from a big ontology with respect to a given signature. However, in certain cases,
concepts or roles in the signature can have di erent importances. To this end,
we propose a new incompleteness measure considering a weighted signature and
study the corresponding best k-excerpts.</p>
      <p>First, we introduce the de nition of weighted signature.</p>
      <sec id="sec-3-1">
        <title>De nition 6 (Weighted Signature). A weight function is a function w :</title>
        <p>NC [ NR ! R mapping a symbol 2 NC [ NR to a value w( ) 2 R [ f1g.
Given a signature and a weight function w, a weighted signature w is a set
w = f ( ; w( )) j 2 g.</p>
        <p>A signature is weighted uniformly under a weight function w i w assigns
the same value to every symbol in , i.e. w( 1) = w( 2) for every 1; 2 2 . For
a weight function w whose range equals f1:0g (i.e., w maps every symbol to 1:0),
we also write instead of w. For instance, w = f(A; 2:0); (B; 0:9); (r; 1:3)g is
the weighted signature for the signature = fA; B; rg and a weight function w
satisfying w(A) = 2:0, w(B) = 0:9 and w(r) = 1:3. On the other hand, a
weight function w0 satisfying w0(A) = w0(B) = w0(r) = 1:4 weighs the symbols
in uniformly resulting in w0 = f(A; 1:4); (B; 1:4); (r; 1:4)g, whereas =
f(A; 1:0); (B; 1:0); (r; 1:0)g.</p>
        <p>For best k-excerpts under a weighted signature w, we use, as an
incompleteness measure T w for w and T , the sum of the weights of the E L-concept
inclusion di erence witnesses in w.r.t. T and E , formally:</p>
        <p>T
w (E ) =</p>
        <p>X w(X)</p>
        <p>X2cWtn (T ;E)</p>
        <p>For computing the best k-excerpts for a weighted signature, we present an
encoding of the problem to a weighted partial Max-SAT problem, with the aim of
delegating the task of nding the best excerpt to a Max-SAT solver. In that way
we can leverage the decades of research e orts dedicated to developing e cient
SAT solvers for our problem setting.</p>
        <p>Note that our encoding requires that a cover of the hA; i-subsumption
justi cations has already been computed for every concept name A 2 . Moreover
our partial Max-SAT encoding is language agnostic, i.e., it works for any TBox
as long as its subsumption justi cations can be computed.</p>
        <p>As we are interested in nding a subset M T of a TBox T that ful lls
certain properties, we encode the presence of an axiom 2 T in M using a
propositional variable p . Intuitively, in a solution v to our partial Max-SAT
problem we will have that v(p ) = 1 i the axiom is contained in the best
k-excerpt.</p>
        <p>Next we introduce the formulas FM and GM with M T and M 2T that
we will use to encode the covers M of hA; i-subsumption justi cations M.</p>
        <sec id="sec-3-1-1">
          <title>De nition 7 (Encoding of TBox Subsets). Let T be a terminology. For</title>
          <p>every axiom 2 T , let p 2 be a fresh propositional variable.</p>
          <p>For a subset M T , we de ne the propositional formula FM that encodes M
as FM := V 2M p . Moreover, for a set M 2T , we de ne the propositional
formula GM associated with M as</p>
          <p>GM :=</p>
          <p>_
M2M</p>
          <p>FM =</p>
          <p>_ ( ^
M2M
2M
p ):
Example 1. Let T = f 1; 2; 3; 4; 5g, M = f 1; 2; 3g, and M = ff 2; 3g;
f 1; 4gg. Then FM = p 1 ^ p 2 ^ p 3 and GM = (p 2 ^ p 3 ) _ (p 1 ^ p 4 ).</p>
          <p>We can now de ne our encoding of the best k-excerpt problem for weighted
signature into weighted partial Max-SAT.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>De nition 8 (Encoding of the Best Excerpt Problem for Weighted</title>
        <p>Signature). Let T be an E L-TBox, let w be a weighted signature, and let
0 k jT j. Additionally, for every A 2 w, let MA be a cover of the hA;
wisubsumption justi cations of T , and let qA be a fresh propositional variable.</p>
        <p>The weighted partial Max-SAT problem for nding best k-excerpts of T w.r.t.
denoted with Pk(T ; w), is de ned as follows.</p>
        <p>We set Pk(T ; w) := (Hk(T ); Sk(T ; w)), where
w,
Hk(T ) := Card(T ; k) [</p>
        <p>Cls(qA $ GMA );
[</p>
        <p>A2 \NC
Sk(T ; w) := f (qA; w(A)) j A 2
w \ NC g;
and Card(T ; k) is a set of clauses specifying that at most k clauses from the set
f p j 2 T g must be satis ed.</p>
        <p>In the hard part of our weighted partial Max-SAT problem, the clauses in
Card(T ; k) specify that the cardinality of the resulting excerpt E T must
be equal to k. We do not x a certain encoding that should be used to obtain
Card(T ; k), but we note that there exist several techniques that require a
polynomial number of clauses in k and in the size of T 1 (see e.g. [11]). Moreover,
for every concept name A 2 w, the variable qA is set to be equivalent to the
formula GMA , i.e. qA will be satis ed in a valuation i the resulting excerpt will
not have A as a di erence witness (A 62 cWtn (T ; E )).</p>
        <p>In the soft part of the problem, the set Sk(T ; w) speci es that the sum of the
weights of the concept names in the signature must be maximal by a solution
of the problem, enforcing that the resulting excerpt E will yield the smallest
1 In our implementation we used the encodings for cardinality constraints that are
provided by the Max-SAT solver Sat4j.
possible sum of weights of di erence witnesses (whilst obeying the constraint
that jE j = k).</p>
        <p>We can now show the correctness of our encoding, i.e. a best k-excerpt for
a weighted signature can be exactly obtained from any solution to the weighted
partial Max-SAT problem Pk(T ; w).</p>
        <p>Theorem 2. Let T be an E L-TBox, and let w be a weighted signature. Then
v is a solution of the weighted partial Max-Sat problem Pk(T ; w) if and only if
f 2 T j v(p ) = 1 g is a best k-excerpt of T w.r.t. w.</p>
        <p>Our algorithm for nding a best k-excerpt given a TBox T and a weighted
signature w (and k with 0 k jT j) as input is shown in Algorithm 1. First,
the algorithm checks whether k = 0 or k = jT j, in which case the empty set or T
itself is the best k-excerpt, respectively. Otherwise, the algorithm continues by
computing subsumption justi cations for each concept name A in w and then
transfer them to propositional formula GJA . After the iteration over all the
concept names A in w is complete, the partial Max-SAT problem Pk(T ; w) is
constructed with the help of the formulas GMA that are stored in S. Subsequently,
a solution v of Pk(T ; w) is computed using a partial Max-SAT solver and the
best k-excerpt is returned by analyzing which variables p have been set to 1 in
the valuation v.</p>
        <p>Algorithm 1: Computing Best k-Excerpts of T for Weighted Signature</p>
        <p>Our algorithm rst computes subsumption justi cations, which requires
exponential time in the size of T and w. Then the weighted partial Max-SAT
encoding runs in polynomial time. Deciding whether there is a solution of an
instance of a Max-SAT problem is NP-complete. Overall we have that Algorithm 1
runs in exponential time in the size of T and w in the worst case.
Example 2. Let T = f 1 := A v X1 u X2 u X3 u X4; 2 := X1 v B1;
3 := X2 v B1; 4 := X3 v B2; 5 := X4 v B2g be a TBox. Let =
fA; B1; B2g be a signature. Then, for X 2 , we obtain the following hX;
isubsumption justi cations MX : MA = ff 1; 2g; f 1; 3g; f 1; 4g; f 1; 5gg,
MB1 = ff 1; 2g; f 1; 3gg and MB2 = ff 1; 4g; f 1; 5gg. Each
subsumption justi cation MX is encoded as a propositional logic formula GX as follows:
GA = ((p1 ^ p2) _ (p1 ^ p3) _ (p1 ^ p4) _ (p1 ^ p5)), GB1 = ((p1 ^ p2) _ (p1 ^ p3))
and GB2 = ((p1 ^ p4) _ (p1 ^ p5)).</p>
        <p>Now let w1; w2; w3 be weight functions such that w1(A) = w1(B1) = w1(B2) =
1:0, w2(A) = w2(B1) = 1:0 and w2(B2) = 2:0, and w3(A) = w3(B2) = 1:0,
w3(B1) = 2:0. Feeding the resulting weighted signatures w1 , w2 and w3
together with k = 2 to a SAT solver yields the following solutions. For w1 , we
obtain four assignments of truth values for the propositional variables in the
tuple hp1; p2; p3; p4; p5i (in this order): h1; 1; 0; 0; 0i, h1; 0; 1; 0; 0i, h1; 0; 0; 1; 0i and
h1; 0; 0; 0; 1i. Then we decode the assignments to sets of axioms. We obtain the
following best 2-excerpts for w1 : f 1; 2g, f 1; 3g, f 1; 4g and f 1; 5g.</p>
        <p>In the case of w2 , however, the solutions returned by the SAT-solver are
the assignments h1; 0; 0; 1; 0i and h1; 0; 0; 0; 1i. Consequently, the best 2-excerpts
for w2 are f 1; 4g and f 1; 5g.</p>
        <p>Finally, for w3 , the SAT-solver yields h1; 1; 0; 0; 0i and h1; 0; 0; 1; 0i resulting
in f 1; 2g and f 1; 4g as the best 2-excerpts for w3 .</p>
        <p>Example 2 shows that, di erent weights of -symbols can in uent the nal
results of best excerpts. By giving higher weights on preferring -concept names
(like w2 or w3 ), users can reduce the number of best k-excerpts. However, in
some extreme cases, as shown in the following example where a signature has a
much higher weight then the others, the best excerpts for the wighted signature
can be totally irrelevant to those w.r.t. .</p>
        <p>Example 3. Let T = f 1 := A1 v X; 2 := X v Y; 3 := Y v Z; 4 := A2 v
B1; 5 := A3 v B2g be a TBox. Let = fA1; A2; A3; B1; B2; Y; Zg be a
signature. Now let w1; w2 be weight functions such that w1(A1) = w1(A2) = w1(A3) =
w1(B1) = w1(B2) = w1(Y ) = w1(Z) = 1:0 (i:e:; w1 is ) and w2(A1) =
w2(A2) = w2(A3) = w2(Y ) = 1:0 and w2(Z) = 9:0. We obtain that the best
3-excerpt E for is f 4; 5g 2. The best 3-excerpt E w2 for w2 is f 1; 2g.
However, j cWtn (T ; E ) j=j fA1; X; Y g j= 3 and j cWtn (T ; E w2 ) j=j
fA2; A3; B1; B2g j= 4.</p>
        <p>In Example 3, as the w2(Z) is much higher than the rest concept names in
signature, E is not the same as E w . The following theorem provides a guide
of how to restrict weight function in order to keep monotonicity of best excerpts
even though w changes.</p>
        <p>Proposition 1. Let T be an E L-TBox, w be a weighted signature and let
be a uniformed signature under the weighted function w.</p>
        <p>(i) If 1 w( ) 1 + j \1NCj for every 2 \ NC, then every best k-excerpt
of T w.r.t. w will be a best k-excerpt of T w.r.t. , but not vice versa.
(ii) If P i2 1 w i &gt; P j2 2 w j for any 1; 2 2 2 \NC and j 1 j&gt;j 2 j,
i every best k-excerpt of T w.r.t. w is a best k-excerpt of T w.r.t. .
2 The best 3-excerpt E
3-excerpt E0, T (E0) =
for w1 only contains two axioms 4 and 5, as for any best</p>
        <p>T (f 4; 5g)</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Ranking Axioms in Best Excerpts</title>
      <p>Best excerpts are useful to work with when a whole ontology exceeds the capacity
of a system or a human user. However, multiple axioms often exist in the best
excerpts. To better assist users, it would be desirable to give a ranking of these
axioms according to their importances. In this section we present an approach
to weight axioms according to weighted signatures.</p>
      <p>To achieve a ranking, we introduce the de nition of weighted axioms.</p>
      <sec id="sec-4-1">
        <title>De nition 9 (Weighted Axiom). An axiom weight function</title>
        <p>: T ! R+ mapping an axiom 2 T to a value ( ) 2 R+.
is a function</p>
        <p>Given an axiom weight function, it is then natural to introduce an axiom
ranking. Intuitively, an axiom with a larger weight should be ranked more highly.
De nition 10 (Axiom Ranking). Given an axiom weight function
axioms and , we say is preferred than , denoted , if ( )
and two
( ).</p>
        <p>To achieve an axiom ranking, in the following, we show that a signature
weight function can induce an axiom weight function. Intuitively, an axiom that
appears in subsumption justi cations of more -symbols with higher weights
should be more highly weighted.</p>
        <p>De nition 11. Let T be a TBox, w a signature with the weight function w.
Additionally, M be the set of all subsumption justi cations of . Then the axiom
weight function ( ) induced by w is:
w ( ) =</p>
        <p>X
f 2 w j 9M2M s.t. 2Mg
w( )
We denote
w the axiom ranking induced by
w .</p>
        <p>Example 4 (Example 2 contd.). Note that 1 (resp. 2; 3; 4; 5) is contained
in at least one of the subsumption justi cations of -symbols fA; B1; B2g (resp.
fA; B1g, fA; B1g, fA; B2g, fA; B2g). So for w3, ( 1) = w(A)+w(B1)+w(B2) =
4:0, ( 2) = ( 3) = w(A)+w(B1) = 2:0, and ( 4) = ( 5) = w(A)+w(B2) =
3:0. Hence, for the 2-best excerpts w.r.t. w3: f 1; 2g and f 1; 4g, we can order
the axioms by 1 w 2 and 1 w 4.</p>
        <p>It is easy to see that the axiom function ( ) in De nition 11 gives a total
ordering over the axioms in T w.r.t. to a given weighted signature w. Moreover,
it is independent of excerpts as stated in the following.</p>
        <p>Proposition 2. Let T be a TBox, w a signature with the weight function w,
for any best excerpts E1 and E2 and axioms 1; 2 2 E1 \ E2, 1 2 in E1 i .</p>
        <p>1 2 in E2.</p>
        <p>
          That is, is a stable ordering of the axioms given a weighted signature. However,
note that the axioms in best k-excerpts are not monotonic with respect to the
size of k (see a counter-example given in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]).
        </p>
        <p>
          Snomed CT NCI
Time (s) Minimal Maximal Median STDEV Minimal Maximal Median STDEV
MaxSat
To demonstrate that our algorithm works in practise, we implemented a JAVA
prototype using SAT4J as a Max-SAT solver [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. All the tests were performed
on a server running Ubuntu 15.10 with Intel Xeon 2.50GHz Core 4 Duo CPU
with 64G RAM. We used the prominent medical ontologies Snomed CT
(version Jan 2016)3 consisting of 317 891 axioms, and the NCI Thesaurus (NCIt,
version 16.03d)4 containing 165 341 axioms.
        </p>
        <p>The rst experiment is to test the e ciency of our encoding method. Table 1
shows the time needed to compute best excerpts of NCI and Snomed CT for
randomly generated signatures of di erent sizes (10/30/50 concept names and
10/30/50 roles names). In total over 3 000 signatures have been considered. After
precomputing the subsumption justi cations for every concept name in every
signature, the time was measured as that needed to prepare the input and run
the partial Max-SAT solver to compute best k-excerpts for every size k. The
minimal, maximal, median, and standard deviation of the execution times show
that computing best excerpts of any size using Max-SAT is very e cient given
that the subsumption justi cations have been computed.</p>
        <p>In the second experiment, we compare the size of locality based modules
with the number of axioms in best excerpts needed to preserve a certain amount
of knowledge. We denote with #(Preserved (best) = n), for n 2 f1; 2g, the
minimal number of axioms needed by a best excerpt to preserve the knowledge
of n concept names w.r.t. the signature (i.e., the number k of axioms of a
best excerpt E of T such that n = j \ NC n cWtn (T ; E)j). Instead of using
random signatures, however, we consider a scenario where a user searches for
sub-ontologies of Snomed CT related to a particular concept name. We
computed 2 500 di erent signatures each consisting of a concept name related to
diseases, the TOP-concept (called `SNOMED CT Concept') and all role names of
Snomed CT. All weights of -symbols are equal in this experiment.</p>
        <p>In Figure 1, the 2 500 signatures are presented along the x-axis in ascending
order w.r.t. the size of their respective ?&gt; -local modules (black line). The y-axis
represents the number of axioms in the module and excerpts for a signature. The
red (resp. green) line presents the sizes of best excerpts that preserve the
knowledge for one (resp. two) concept name(s), i.e., #P reserved (best) = 1 (resp.
#P reserved (best) = 2). Best excerpts provide a conciser way for zooming in
on an ontology, in particular, when sacri cing completeness is acceptable. For
3 http://www.ihtsdo.org/snomed-ct
4 http://evs.nci.nih.gov/ftp1/NCI_Thesaurus</p>
        <p>Module
#Preserved (best) = 1
#Preserved (best) = 2
0
500
1000 1500
Signature ID
2000
2500
this experiment, after computing the subsumption justi cations of all concept
names in a signature, it only took 0:15s on average to compute the best excerpts.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>
        We have presented an algorithm for computing a best k-excerpt of a TBox for
given weighted signature. Best k-excerpts are k-element subsets of an ontology
that best preserve entailments over the target signature compared to the
ontology [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The measure of incompleteness of the excerpt w.r.t. the ontology is
based on the incompleteness measure taking into account weights of concept
names in the signature. We have implemented our algorithm and we have
presented an experimental evaluation to demonstrate the viability of computing
best excerpts of large biomedical ontologies in practice. We leave investigating
alternative incompleteness measures for future work.
6. Konev, B., Ludwig, M., Walther, D., Wolter, F.: The logical di erence for the
lightweight description logic EL. Journal of Arti cial Intelligence Research 44,
633{708 (2012)
7. Konev, B., Ludwig, M., Wolter, F.: Logical di erence computation with CEX2.5.
      </p>
      <p>In: Proceedings of IJCAR'12. pp. 371{377 (2012)
8. Konev, B., Walther, D., Wolter, F.: The logical di erence problem for description
logic terminologies. In: Proceedings of IJCAR'08. pp. 259{274 (2008)
9. Ludwig, M., Walther, D.: The logical di erence for ELHr-terminologies using
hypergraphs. In: Proceedings of ECAI'14. pp. 555{560 (2014)
10. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval.</p>
      <p>Cambridge University Press (2008)
11. Sinz, C.: Towards an optimal cnf encoding of boolean cardinality constraints. In:
Proceedings of CP'05. pp. 827{831 (2005)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel-Schneider</surname>
            ,
            <given-names>P.F</given-names>
          </string-name>
          . (eds.):
          <article-title>The description logic handbook: theory, implementation, and applications</article-title>
          . Cambridge University Press,
          <volume>2</volume>
          <fpage>edn</fpage>
          . (
          <year>June 2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Berre</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parrain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <source>The sat4j library, release 2.2. JSAT 7</source>
          (
          <issue>2-3</issue>
          ),
          <volume>59</volume>
          {6 (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ludwig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walther</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Towards extracting ontology excerpts</article-title>
          .
          <source>In: Proceedings of KSEM'15</source>
          . pp.
          <volume>78</volume>
          {
          <issue>89</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ludwig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walther</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Zooming in on ontologies: Minimal modules and best excerpts</article-title>
          .
          <source>In: Proceeding of ISWC'17</source>
          . pp.
          <volume>173</volume>
          {
          <issue>189</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ecke</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ludwig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walther</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The concept di erence for EL-terminologies using hypergraphs</article-title>
          .
          <source>In: Proceedings of DChanges'13</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>