<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluation of Extraction Techniques for Ontology Excerpts?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jieying Chen</string-name>
          <email>chenjy12@mails.jlu.edu.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michel Ludwig</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yue Ma</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dirk Walther</string-name>
          <email>dirkg@tcs.inf.tu-dresden.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Advancing Electronics Dresden</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Jilin University, College of Computer Science and Technology</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TU Dresden</institution>
          ,
          <addr-line>Theoretical Computer Science</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We introduce the notion of an ontology excerpt as being a xed-size subset of an ontology that preserves as much knowledge as possible about the terms in a given vocabulary as described in the ontology. We consider di erent extraction techniques for ontology excerpts based on methods from Information Retrieval. To evaluate these techniques, we measure the degree of incompleteness of the resulting excerpts using the notion of logical di erence. We provide an experimental evaluation of the extraction techniques by applying them on the biomedical ontology SNOMED CT.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Introduction
Ontologies based on Description Logics (DL) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] have become a well-established
paradigm for representing knowledge. An increasing number of ontologies is
being developed, e.g., the CPO, FMA, GALEN, and SNOMED CT ontologies in
the biomedical domain, which are made available in repositories such as NCBO
Bioportal.1 Ensuring e cient access to the knowledge contained in such
repositories is thus becoming an important concern.
      </p>
      <p>We consider the problem of ontology selection. Suppose that we are in the
presence of a repository containing a number of ontologies that represent
knowledge in a certain domain of expertise. The task that we consider is for a human
user to e ciently select an ontology from the repository that is most \relevant"
for a given vocabulary of interest, called signature. Such a decision task could
be supported, for example, by providing summaries of the respective ontologies
w.r.t. a signature . As ontologies can become quite large, it is important,
however, to limit the size of summaries to e ectively support the decision process,
in particular, when human users are involved.</p>
      <p>The notion of modules provides a way to extract a subset of an ontology that
exactly captures the knowledge of the ontology regarding a signature. However,
a module notion typically allows for little control over the number of axioms that
are included in a module. Depending on the signature of interest, even minimal
modules can be as large as the entire ontology, essentially defeating the purpose
of module extraction. To in uence the size of a module, our only option is to
adapt the signature for which the module is extracted. Generally, we have that
the smaller the signature, the smaller are the modules of an ontology. But no
strict upper bound on the module size can be guaranteed in this way. Given a
relatively large signature of interest, obtaining small modules requires selecting
smaller subsets of the signature. However, nding a suitably small subset of the
initial signature may not be feasible. On the one hand, it may not be clear which
symbols can be removed, and on the other hand, an exhaustive search for the
\right" signature involves a combinatorial blowup.</p>
      <p>In this paper, we introduce the notion of an ontology excerpt as a xed-size
subset of an ontology. For the purpose of selecting the most relevant ontology in
a repository, one can rst extract excerpts of all the ontologies in the repository.
Based on such excerpts the user should be able to make an informed decision for
selecting the most relevant ontology. Obviously, when the user is interested in
terms contained in a signature , the excerpts should capture as much knowledge
of the terms from as possible. We propose to use excerpt extraction techniques
w.r.t. a signature from the area of Information Retrieval, i.e. a research area
which is generally concerned with developing techniques to extract the \most
relevant" documents to a query from large data sources.</p>
      <p>
        To evaluate the quality of ontology excerpts, we employ a semantics-based
measure, called Gain, based on Logical Di erence [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to quantify how much
semantic meaning is preserved in an excerpt w.r.t. the original ontology.
      </p>
      <p>The size of the excerpts can be made dependent, for instance, on the number
of considered ontologies and the similarity of their content to comply with the
time constraints associated with the decision process. If the user can invest more
time, excerpts may be larger. Similarly, if the content of the ontologies should
be similar, larger excerpts could help distinguish them more easily. Nevertheless
the size should be chosen such that the resulting excerpts can still be easily
understood by the user.</p>
      <p>The paper is organized as follows. After some preliminaries, we introduce
the notion of ontology excerpts in Section 3. Extraction techniques for excerpts
are proposed in Section 4, and we conclude with a practical evaluation of the
techniques in Section 5.
2</p>
      <p>
        Preliminaries
We brie y recall basic notions related to the description logic E L [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], modularity
of ontologies [
        <xref ref-type="bibr" rid="ref6 ref8">6, 8</xref>
        ] and the logical di erence between terminologies [
        <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
        ].
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Description Logic EL</title>
      <p>Let NC and NR be mutually disjoint (countably in nite) sets of concept names
and role names. In the following we use A, B, X, Y , Z to denote concept names,
and r, s stand for role names. The set of E L-concepts C and the set of E
Linclusions are built according to the following grammar rules:
C ::= &gt; j A j C u C j 9r:C
::= C v C j C</p>
      <p>C j r v s
where A 2 NC and r; s 2 NR. An E L-ontology O is a nite set of E L-inclusions,
which are also referred to as axioms.</p>
      <p>The semantics is de ned using interpretations I = ( I ; I ), where the
domain I is a non-empty set, and I is a function mapping each concept name A
to a subset AI of I and every role name r to a binary relation rI over I . The
extension CI of a possibly complex concept C is de ned inductively as: (&gt;)I :=
I , (C u D)I := CI \ DI , and (9r:C)I := fx 2 I j 9y 2 CI : (x; y) 2 rI g.</p>
      <p>An interpretation I satis es a concept C, an axiom C v D, C D, or r v s
if CI 6= ;, CI DI , CI = DI , or rI sI , respectively. We write I j= if
I satis es the axiom . Note that every E L-concept and axiom is satis able,
but a particular interpretation does not necessarily satisfy a concept/axiom. An
interpretation I is a model of O if I satis es all axioms in O. An axiom follows
from an ontology O, written O j= ', if for all models I of O, we have that I j= .</p>
      <p>An E L-terminology O is an E L-ontology consisting of axioms of the form
A v C, A C, or r v s, where A is a concept name, C an E L-concept and
no concept name A occurs more than once on the left-hand side of an axiom.
A terminology is said to be acyclic if it can be unfolded (i.e., the process of
substituting concept names by the right-hand sides of their de ning axioms
terminates). We denote with jOj the number of axioms in the ontology O. A
signature is a nite subset of NC [ NR. For a syntactic object X, the signature
sig(X) is the set of concept and role names occurring in X.
2.2</p>
      <sec id="sec-2-1">
        <title>Modularity</title>
        <p>
          A module M of an ontology O w.r.t. a signature is a subset of O that preserves
all entailments formulated in : for all inclusions with sig( ) , M j= if
and only if O j= . The interesting direction of the equation in this de nition is
from right to left; the other direction holds by monotonicity of description logics.
We can equivalently de ne M with the condition that O is a -conservative
extension of M. For practical purposes, we are generally interested in modules
to be as small as possible. Note that minimal modules are not necessarily unique.
Computing minimal modules (which is equivalent to deciding whether or not
an ontology is a conservative extension of the module) is usually harder than
standard reasoning in the underlying DL, and often it is even undecidable [
          <xref ref-type="bibr" rid="ref11 ref5">5,
11</xref>
          ]. For E L-terminologies, however, minimal modules are unique and they can
be computed in polynomial time [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>Example 1. Let O consist of the following four axioms:</p>
        <p>1 = fA; Bg. Then M = f 1; 2; 3g is the min. module of O w.r.t. .</p>
        <p>
          In this paper, due to the availability of tool support, we employ locality-based
modules [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. These are approximations of minimal modules and can be e ciently
computed in polynomial time. We use the ?&gt; -locality notion. We denote with
ModO( ) the ?&gt; -local module of the ontology O w.r.t. the signature . Note
that locality-based modules are unique, and they contain any minimal module
for a signature and possibly more axioms. However, it was shown that
localitybased modules are not much larger than minimal modules on real ontologies [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
2.3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Logical Concept Di erence</title>
        <p>
          We now recall basic notions related to the logical di erence [
          <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
          ] between two
E L-terminologies for E L-inclusions as query language.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>De nition 1 (Logical Di erence). Let O1 and O2 be E L-terminologies, and</title>
      <p>let be a signature. The E L-concept inclusion di erence between O1 and O2
w.r.t. is the set Di (O1; O2) of all E L-inclusions of the form C v D for
E L-concepts C and D such that sig( ) , O1 j= , and O2 6j= .</p>
      <p>
        In case two terminologies are logically di erent, the set Di (O1; O2) consists
of in nitely many concept inclusions. The primitive witnesses theorems from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
allow us to consider only certain inclusions of a simpler syntactic form.
Theorem 1. Let O1 and O2 be E L-terminologies and let be a signature. If
2 Di (O1; O2), then either A v C or D v A is a member of Di (O1; O2),
where A 2 sig( ) is a concept name, and C, D are E L-concepts occurring in .
      </p>
    </sec>
    <sec id="sec-4">
      <title>De nition 2 (Primitive Witnesses). Let O1 and O2 be E L-terminologies</title>
      <p>and let be a signature. We say that E L-concept inclusion di erence witnesses
in w.r.t. O1 and O2 are concept names contained in that occur on the
left-hand side of inclusions of the form A v C in Di (O1; O2) or on the
righthand side of inclusions of the form D v A in Di (O1; O2). The set of all such
witnesses will be denoted by Wtn (O1; O2).</p>
      <p>
        Observe that Wtn (O1; O2) . Consequently, this set is nite and it can
be seen as a succinct representation of the set Di (O1; O2) in the sense that:
Di (O1; O2) = ; i Wtn (O1; O2) = ; [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Example 2. Let O be de ned as in Ex. 1. For = fA; Bg, Wtn (O; f 1; 2g) =
Di (O; f 1; 2g) = ; and Wtn (O; ;) = as A v B; B v A 2 Di (O; ;). If
= fA; rg, we have that Wtn (O; O n f 1g) = fAg as A v 9r:&gt; 2 Di (O; O n
f 1g).</p>
      <p>Algorithms for computing the witness sets and, thus, for deciding whether a
logical di erence w.r.t. a signature exists, have been implemented in the CEX2.5
tool.2 Given two acyclic E L-terminologies and a signature as input, CEX2.5
computes and outputs the set Wtn(O1; O2) in a fully automatic way.
2 CEX2.5 is available under an open-source license from http://lat.inf.tu-dresden.
de/~michel/software/cex2/.
2.5
le 2
u
d
o
fM1.5
o
e
izS 1
0.5
00
0.5
1 1.5 2
Number of Concept Names in Signature
2.5
vs. size of ?&gt; -local module w.r.t.</p>
      <p>
        We note that a new approach for computing logical di erences that can also
handle large cyclic terminologies has recently been introduced [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
3
      </p>
      <p>Ontology Excerpts
Ontologies appear to exhibit a strong dependency between the size of a
signature and the size of a module for the symbols in . This dependency is a
natural consequence of the structure of the ontology which, depending on the
application at hand, we may not be able to a ord due to resource restrictions.
We are interested in gaining more control over the size of a module to be able
to reuse the knowledge in an ontology in a scenario with limited resources.</p>
      <p>Figure 1 illustrates the dependency between signature size and module size in
the case of SNOMED CT consisting of 297 090 axioms, 297 079 concept names,
and 62 role names. The coordinates of a point in Figure 1 are a pair (n; m) of
numbers, where n corresponds to the number of terms in a signature and m
to the number of axioms in a subset S of the ontology O. The curve connects
over 30 data points, each of which represents the median value of the sizes of 500
?&gt; -local modules of SNOMED CT. Each module is extracted w.r.t. a signature
consisting of n concept names, for n with 200 n 290 000, and 30 role names
that are randomly selected from the signature of SNOMED CT. The special role
name `RoleGroup' is always selected. Note that we xed the number of role names
arbitrarily; similar results can be expected for di erent numbers of role names.
The time needed to extract 500 modules ranges from about 30 min for small
signatures (containing 200 concept names) to about 90 min for large signatures
(containing 250 000 concept names).3</p>
      <p>We can see from Figure 1 that the module sizes increase with the size of
the input signature. For small signatures, the slope is steep, showing that the
modules are relatively large compared to the signature size. However, with
increasing signature sizes, the slope attens. For di erent signatures of the same
size, there are still variations in the sizes of the modules for these signatures. In
3 The experiments were conducted on a PC equipped with an Intel Xeon E5-2640 CPU
running at 2.50GHz and with 100GB of RAM. We used Debian GNU/Linux 7.3 as
operating system, Java version 1.7.0 51 and OWLAPI version 3.4.8.
this experiment, the module sizes vary from 2 633 to 4 086 for signatures up to
100 000 concept names and 30 role names. For larger signatures the variation in
module size reduces to 224 for signatures with 290 000 concept names and 30 role
names, and converges to 0 as signature being expanded to the whole signature
of SNOMED CT.</p>
      <p>Let us consider the coordinates of a point in the chart in Figure 1 as a pair
(n; m) of numbers, where n corresponds to the size of a signature and m to the
size of a subset S of the ontology O. Note that, for a signature 0 and a subset
S0 O that correspond to a point in the area above the curve for the ontology O
in Figure 1, we may have that Mod 0 (O) ( S0. In this case, S0 likely contains
axioms that do not contribute to the meaning of the symbols in 0. Therefore,
we are mainly interested in the area below the curve for an ontology O. Let
(n; m) be a point in that area, and let S O and be such that j j = n and
jSj = m. We have that S contains fewer axioms than the module Mod (O) (not
considering the variation of module sizes), i.e. jSj jMod (O)j. Therefore, S is
likely incomplete in capturing the meaning of the symbols in . The trade-o
for obtaining full control over the size of S is a certain degree of incompleteness
of S. This inspires the notion of ontology excerpts as introduced below.</p>
    </sec>
    <sec id="sec-5">
      <title>De nition 3 (Ontology Excerpt). Let O be an ontology and let k &gt; 0 be a</title>
      <p>natural number. A k-excerpt of O is a subset E O consisting of k axioms, i.e.
jE j = k.</p>
      <p>An ontology excerpt is a subset of the ontology of a certain size. However,
we are interested in those excerpts that preserve the meaning of the symbols in
a signature of interest. To quantify the meaning of an excerpt, we need some
metric . We assume that the lower the value of for an excerpt is, the more
meaning is preserved by the excerpt. This is made precise as follows.</p>
    </sec>
    <sec id="sec-6">
      <title>De nition 4 (Incompleteness Measure). Let O be an ontology. An incom</title>
      <p>pleteness measure is a function that maps every triple (O; ; E ) consisting of
an ontology O, a signature , and an excerpt E O to a non-negative natural
number.</p>
      <p>In this paper we use as incompleteness measure the number ldi (O; ; E ) of
E L-concept inclusion di erence witnesses in w.r.t. O and E , which is de ned
formally as ldi (O; ; E ) = jWtn (O; E )j.</p>
      <p>De nition 5 (Best Excerpt). Let O be an ontology, let be a signature, and
let k &gt; 0 be a natural number. Additionally, let be an incompleteness measure.
A best k-excerpt of O w.r.t. under is a k-excerpt E of O such that
(O; ; E ) = minf (O; ; E 0) j E 0 is a k-excerpt of O g:</p>
      <p>To preserve the largest possible amount of semantic information in a
kexcerpt, it would be preferable to extract k-excerpts that have the lowest ldi
value among all the subsets of size k. However, it is di cult in general to compute
all such excerpts in an exhaustive way as all the jOk j many subsets of size k
would have to be enumerated.</p>
      <p>Extraction Techniques
In this section, we introduce two di erent k-excerpt extraction approaches. One
is based on the simple intuition that axioms comprising more elements from
should be preferred to be included in an excerpt for . The other approach
is inspired by ideas from the area of Information Retrieval (IR): we view each
axiom in O as a document, and the input signature as the set of keywords from
a query. The top-k retrieved documents for the given keywords then correspond
to a k-excerpt. These two approaches share a common methodology in the sense
that they de ne a \similarity" between each axiom w.r.t. a given signature such
that selecting the k axioms closest to the given signature results in a k-excerpt.
We make this idea more precise in the following de nition.</p>
      <p>De nition 6. Let O be an ontology and let sig(O). Additionally, let s be a
function that maps every pair ( ; ) consisting of an E L-axiom together with
a signature to a real number. We de ne the ranking B of axioms w.r.t.
induced by s as follows: B if, and only if, s( ; ) &gt; s( ; ). Given an
integer k, we de ne a k-excerpt of an ontology O for a signature under s as
the set f 2 O j jf 2 O j s( ; ) &gt; s( ; ) gj k g:
A k-excerpt hence consists of those axioms in O for which there are at most
k 1 axioms in O that precede w.r.t. B. Note that such a de nition leaves the
possibility that k-excerpts can contain more than k axioms due to an equivalent
distance of several axioms w.r.t. . In real-world applications there would exist
di erent remedies to such a situation. Since we aim to compare di erent excerpt
extraction techniques in this paper, we choose to apply a random cut whenever
there are more than k axioms contained in a k-excerpt.
4.1</p>
      <sec id="sec-6-1">
        <title>Common Signature based k-Excerpts</title>
        <p>A nave extraction method for k-excerpts w.r.t. a signature simply consists in
a random selection of k axioms from the considered ontology. As a rst
improvement of the random selection, it is possible to guide the selection of the axioms
by considering the number of concept and role names shared by an axiom and ,
de ned formally as follows:
De nition 7. Given an axiom and a signature , the COM-similarity
between and is de ned as scom( ; ) = jsig( ) \ sig( )j.</p>
        <p>Example 3. Let 1, 2, 3, 4 be four axioms de ned as in Ex.1 and let =
fA; B; rg. Then we have scom( 1; ) = 3, scom( 2; ) = 2, scom( 3; ) = 2, and
scom( 4; ) = 3. Therefore, the ranking of the axioms will be: 1; 4 B 2; 3.
The rst and the last axiom are ranked higher than the other two, but no
preference between 1 and 4, or between 2 and 3, exists.
In IR vector representations of documents and queries are a fundamental tool
to model problems, based on which di erent retrieval strategies can be applied.
We rst de ne the vector representation for axioms and signatures.</p>
        <p>In the remainder, we assume that every ontology O is associated with a strict
total order on the elements of sig(O). Whenever we want to access the i-th
signature element of O we refer to the i-element w.r.t. the assumed order ,
starting from the smallest element. For a signature sig(O) or axiom 2 O,
we can de ne the signature vector of and the axiom vector of as follows:</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>De nition 8 (Signature and Axiom Vector). For a signature sig(O),</title>
      <p>the signature vector of , written ! = [v1; v2; ], is a vector of length jsig(O)j
such that vi = 1 if the i-th element of sig(O) appears in , otherwise vi = 0.
Similarly, for an axiom</p>
      <p>2 O we de ne ! = sig(!).</p>
      <p>Example 4. Let O be the ontology de ned as in Ex. 1, and let = fA; B; rg.
We assume the strict total order sig(O) sig(O) given by A B X
Y r s. We obtain the following signature vector for and axiom vectors
for each axiom of O:
! = [1; 1; 0; 0; 1; 0]
!1 = [1; 1; 1; 0; 1; 0]
!3 = [1; 1; 1; 0; 0; 0]
!2 = [1; 1; 0; 0; 0; 0]
!4 = [1; 1; 1; 1; 1; 1]</p>
      <p>We can then de ne the distance between an axiom and a signature by a
distance measure between the axiom and signature vectors. A rst measure is
the cosine value, resulting in the COS-k-module.</p>
      <sec id="sec-7-1">
        <title>De nition 9 (COS-distance between Axiom and Signature). Given an</title>
        <p>axiom and a signature set , the COS-distance between and is de ned as
dcos( ; ) = cos(!; !) =</p>
        <p>Pn</p>
        <p>i=1 xiyi
pPn i=1 yi2
i=1 xi2pPn
;
where ! = [x1; x2; :::; xn] and ! = [y1; y2; :::; yn].</p>
        <p>Example 5. Let O be the ontology as de ned in Ex. 1, let be the total order
on sig(O) as de ned in Ex. 4, and let = fA; B; rg. Then we have that:
dcos( 1; ) = 3=(p4p3)
dcos( 3; ) = 2=(p3p3)
dcos( 2; ) = 2=(p2p3)
dcos( 4; ) = 3=(p6p3)
Therefore, the ranking of the axioms will be 1 B
In this section, we present a detailed evaluation of the proposed excerpt
extraction techniques. To this end, we implemented the excerpt extraction methods,
and we compare them on SNOMED CT using a normalized evaluation metric
based on ldi . To speed up the experiments, we use a module of SNOMED CT for
a randomly generated signature. The considered fragment consists of 50 034
axioms and it contains 50 587 concept and role names. We employ the CEX2.5
system as evaluation tool for the logical di erence, which can currently process
acyclic terminologies only. However, the proposed extraction techniques work for
ontologies formulated in any DL.</p>
        <p>As baseline, we use a random choice strategy which randomly selects k
axioms from the input ontology to extract a k-excerpt. To estimate the quality of
excerpts E , we make use of the following metric, named Gain (G):
G ( ; ) = 1</p>
        <p>O E
j
ldi (O; ; E )
\ sig(O) \ NCj
;
Note that Gain is inverse to ldi normalized by the total number of possible
witness concept names. Intuitively, the higher the Gain value of an excerpt E for
a signature , the more semantic information is preserved by E .</p>
        <p>Fig. 2 reports on the results for the di erent excerpt extraction techniques
on the considered ontology. The excerpts were generated w.r.t. ten randomly
generated signatures, containing 100 concept names and 30 role names. The
values along the x-axis represent the parameter k, i.e. the excerpt size, whereas
the average Gain value of the corresponding k-excerpts over the 10 signatures
is shown along the y-axis. From the chart, one can see that the Gain values
for IR-based excerpts (yellow) are higher than or equal to the values for other
excerpt extraction strategies, and that the Gain values for common
signaturebased excerpts (green) lie above the values for the random choice strategy (blue).
The gain value reaches 1 in Fig. 2 only for values of k approaching the size of
the input ontology O. However, extracting excerpts of size larger than the size
of the module ModO( ) of O for a signature is of limited use as selecting the
axioms in the module is su cient to obtain a Gain value of 1.</p>
        <p>We note that the IR-based excerpt extraction technique does not guarantee
to select axioms that belong to the module ModO( ). The average Gain values
in Fig. 2 can be improved upon by ensuring that only axioms from ModO( ) are
0.8
included in the excerpts as it is shown in Fig. 3. The curves in Fig. 3 show the
di erence between the Gain values for the IR-excerpts extracted from the entire
SNOMED CT fragment (yellow) and those from the module of this fragment for
the input signature (pink). We can see that the quality of IR-excerpts in Fig. 3
is improved signi cantly when using the module as an input, reaching 1 when
k is close to the size of the module (i.e. 4 673) already. As baseline reference for
selecting axioms from the module, the Gain values of the random choice strategy
are also given (blue). Note that selecting about 500 axioms at random from the
module already yields an excerpt of better quality than the IR-based excerpt
of the same size of the entire ontology. One can see that using a precomputed
module can improve the quality of the extracted excerpts. Excerpt extraction
can thus bene t from state-of-the-art module computation tools as well.</p>
        <p>However, for obtaining best excerpts (cf. De nition 4), we cannot solely base
excerpt extraction techniques on algorithms for computing modules. We call
M a module-based k-excerpt for a signature of an ontology O if there is a
subsignature 0 such that M = ModO( 0) and jMj = k. Note that
each such 0 determines an excerpt size k with k = jModO( 0)j. The results
depicted in Fig. 4 show that module-based excerpts (orange) are not of better
quality than IR-based excerpts (pink). The values along the x-axis denote the
size of subsignature 0. The values along the y-axis denote the average Gain
values over the excerpts for ten randomly selected subsignatures 0 of each size.
The Gain values for the module-based excerpts are the values GO(M; ) with
M = ModO( 0) for every 0, whereas GO(E ; ) are the Gain values for the
IR-based k-excerpts E for k = jModO( 0)j.</p>
        <p>As a comparison, we include in Fig. 4 the average Gain values for the
IRbased excerpt w.r.t. the entire SNOMED CT fragment (yellow). For small sizes
of subsignatures 0 up to 32 concept names and 30 role names, the IR-based
excerpt of size jModO( 0)j performs better than the corresponding module-based
excerpt, even though the IR-based excerpt is not restricted to ModO( ).</p>
        <p>In our experiments so far, IR-based excerpts performed better than other
extraction techniques. However, the question arises whether the IR-based
ex0.8</p>
        <p>Example 6. Let O consist of the three axioms 1 = A1 v B1 u 9r:X, 2 = A3 v
A2 u B3, 3 = A2 v B2, and let = sig(O). Then the ldi -values for all 1- and
2-excerpts of O are respectively shown in the left- and right-hand side of the
table below:</p>
        <p>f 1g f 2g f 3g f 1; 2g f 1; 3g f 2; 3g
ldi
4
5
6
3
4
2</p>
        <p>The COS-distance between each of the three axioms i and is as
follows (using an implicit order on the signature elements): dcos( 1; ) 0:707,
dcos( 2; ) 0:612, dcos( 3; ) = 0:5. Thus, we obtain the following IR-ranking
for the axioms: 1 B 2 B 3. Although the best 1-excerpt is f 1g, the best
2-excerpt is given by f 2; 3g without having the highest ranked axiom 1.</p>
        <p>We conjecture that the size parameter k has to be an input parameter to any
algorithm that aims at extracting best excerpts for a given signature.
6</p>
        <p>Conclusion
We introduced the notion of ontology excerpts that can serve as a summary of
an input ontology w.r.t. a signature of interest. We presented several strategies
for excerpt extraction and we evaluated them based on how well the resulting
excerpts capture the knowledge about the input signature. The extraction strategy
based on IR-techniques clearly outperformed the others in our experiments.</p>
        <p>
          We leave nding an algorithm for computing best excerpts as future work,
for which we want to investigate the use of simulation-based techniques that are
capable of identifying logical di erences [
          <xref ref-type="bibr" rid="ref10 ref4">4, 10</xref>
          ].
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brandt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          .
          <article-title>Pushing the EL envelope</article-title>
          .
          <source>In Proceedings of IJCAI-05</source>
          , pages
          <fpage>364</fpage>
          {
          <fpage>369</fpage>
          . Morgan-Kaufmann Publishers,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-</surname>
          </string-name>
          Schneider, editors.
          <article-title>The description logic handbook: theory, implementation, and applications</article-title>
          . Cambridge University Press,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Del Vescovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Klinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          , U. Sattler,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schneider</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Tsarkov</surname>
          </string-name>
          .
          <article-title>Syntactic vs. semantic locality: How good is a cheap approximation</article-title>
          ?
          <source>In Proceedings of WoMO'12</source>
          , volume
          <volume>875</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Ecke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ludwig</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Walther</surname>
          </string-name>
          .
          <article-title>The concept di erence for EL-terminologies using hypergraphs</article-title>
          .
          <source>In Proceedings of DChanges'13</source>
          , volume
          <volume>1008</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghilardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Did i damage my ontology? a case for conservative extensions in description logics</article-title>
          .
          <source>In Proceedings of KR2006</source>
          , pages
          <fpage>187</fpage>
          {
          <fpage>197</fpage>
          . AAAI Press,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Grau</surname>
          </string-name>
          , I. Horrocks,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kazakov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>Modular reuse of ontologies: theory and practice</article-title>
          .
          <source>JAIR</source>
          ,
          <volume>31</volume>
          :
          <fpage>273</fpage>
          {
          <fpage>318</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B.</given-names>
            <surname>Konev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ludwig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Walther</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>The logical di erence for the lightweight description logic EL</article-title>
          . JAIR,
          <volume>44</volume>
          :
          <fpage>633</fpage>
          {
          <fpage>708</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>B.</given-names>
            <surname>Konev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Walther</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Semantic modularity and module extraction in description logics</article-title>
          .
          <source>In Proceedings of ECAI'08</source>
          , volume
          <volume>178</volume>
          of Frontiers in
          <source>Arti cial Intelligence and Applications</source>
          , pages
          <volume>55</volume>
          {
          <fpage>59</fpage>
          . IOS Press,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>B.</given-names>
            <surname>Konev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Walther</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>The logical di erence problem for description logic terminologies</article-title>
          .
          <source>In Proceedings of IJCAR'08</source>
          , volume
          <volume>5195</volume>
          <source>of LNCS</source>
          , pages
          <volume>259</volume>
          {
          <fpage>274</fpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>M.</given-names>
            <surname>Ludwig</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Walther</surname>
          </string-name>
          .
          <article-title>The logical di erence for ELHr-terminologies using hypergraphs</article-title>
          .
          <source>In Proceedings of ECAI</source>
          <year>2014</year>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Deciding inseparability and conservative extensions in the description logic EL</article-title>
          .
          <source>Journal of Symbolic Computation</source>
          ,
          <volume>45</volume>
          (
          <issue>2</issue>
          ):
          <volume>194</volume>
          {
          <fpage>228</fpage>
          ,
          <string-name>
            <surname>Feb</surname>
          </string-name>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>