<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Elastiq: Answering Similarity-threshold Instance Queries in EL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andreas Ecke</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maximilian Pensel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anni-Yasmin Turhan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Theoretical Computer Science, Technische Universitat Dresden</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recently an approach has been devised how to employ concept similarity measures (CSMs) for relaxing instance queries over EL ontologies in a controlled way. The approach relies on similarity measures between pointed interpretations to yield CSMs with certain properties. We report in this paper on Elastiq, which is a rst implementation of this approach and propose initial optimizations for this novel inference. We also provide a rst evaluation of Elastiq on the GeneOntology.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>yields, for a pair of concepts, a value from the interval [0; 1]|indicating how
similar the concepts are. To answer a relaxed instance query is to compute for a
given concept C, a CSM , and a threshold t between 0 and 1, a set of concepts
such that each of these concepts are similar to C by a threshold of at least t, if
measured by the CSM , and then nding all their instances.</p>
      <p>
        Concept similarity measures are widely used in ontology-based applications.
For ontologies from the bio-medical eld, such as the GeneOntology ontology
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], they are employed to discover functional similarities of genes. Furthermore,
CSMs are used in ontology alignment algorithms. For DLs there exists a whole
range of CSMs, which could be employed for the task of answering relaxed
instance queries [
        <xref ref-type="bibr" rid="ref1 ref12 ref15 ref5">1, 5, 12, 15</xref>
        ]. In particular the CSMs generated by the framework
described in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] allow users to specify which part of the ontology's signature is
to be regarded more important when it comes to the assessment of similarity of
concepts. Thus, these measures naturally allow users to select important features
of the query concept and which aspect of the query concept to relax.
      </p>
      <p>
        We investigated algorithms for computing answers to relaxed instance queries
for EL. This DL has good computational properties and large, well-known
biomedical ontologies such as the GeneOntology [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] are written in (polynomial
extensions of) EL. Our algorithm for computing relaxed instances w.r.t. EL-KBs
with general TBoxes relies on canonical models of the query concept and of the
queried KB. The employed CSM is derived from a similarity measure for pointed
interpretations, which essentially implements relaxed forms of equisimulation
between interpretations. A similar idea in spirit is pursued in [
        <xref ref-type="bibr" rid="ref16">16, 17</xref>
        ] for
ELconcepts, where similarity is measured in terms of `how much is missing' to
establish a homomorphism between graph representations of EL-concepts.
      </p>
      <p>Now, for computing answers to relaxed instance queries, the similarity values
for all pairs of elements between the two canonical models need to be computed
in the worst case. Thus a naive implementation would hardly be e cient. We
report in this paper in rst optimizations for this novel inference, which we have
implemented in the system Elastiq (EL answering of similarity-threshold
instance queries). A rst evaluation on the GeneOntology shows that the proposed
optimizations are vital for this kind of inference, but that response times for a
single query over large ontologies are still about a second.</p>
      <p>The remainder of the paper in structured as follows. Next, we give an
introduction to the technical terms used and the relaxed instance inference. In
Section 3 we discuss the algorithm for computing relaxed instances and the
similarity measures employed for it. The Elastiq reasoner is introduced in Section 4
together with some optimizations and the evaluation of its performance on the
GeneOntology. The paper ends with conclusions and future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>EL-concepts are built from two mutually disjoint sets NC of concept names, and
NR of role names using the syntactic rule: C; D ::= &gt; j A j C u D j 9r:C,
where A 2 NC and r 2 NR. The semantics of EL-concepts are de ned by means
of interpretations, in which concept names are interpreted as subsets of the
interpretation domain and roles as binary relations. The semantics are extended
to complex concepts as usual. An EL-TBox consists of a nite set of general
concept inclusion axioms (GCIs) of the form C v D. An interpretation is a
model for a TBox if it satis es all its GCIs. An EL-ABox describes individuals
from a set NI of individual names using concept assertions of the form C(a)
and role assertions of the form r(a; b). Again, an interpretation is a model for
an ABox if it satis es all its assertions. An EL-knowledge base (KB) is a pair
K = (T ; A) of a TBox T and an ABox A.</p>
      <p>The following commonly used reasoning tasks are implemented in most DL
reasoning systems. Concept subsumption C vT D asks, for a TBox T and two
concepts C and D, if CI DI for all models I of T . Given an individual a, a
concept C, and a KB K, a is called an instance of C w.r.t. K, denoted K j= C(a),
i aI 2 CI for all models I of K. Given a KB K = (T ; A) and a concept C,
instance retrieval returns all individuals from A that are instances of C.</p>
      <p>For the DL EL, the polynomial-time complexity of most reasoning procedures
rely on the fact that canonical models can be built, from which it is possible to
read o entailments directly. These canonical models represent the most general
model for a concept or the individuals of an ABox w.r.t. to a TBox. Before we
can formally de ne these canonical models, we need to introduce some notation.
If X is a concept, TBox, ABox, or KB, then:
{ Sig(X) denotes the signature of X; that is, the set of concept, role, and
individual names appearing in X, and
{ sub(X) is the set of all sub-concepts of concepts occurring in X.
De nition 1. (canonical models) Let C be an EL-concept and K = (T ; A) an
EL-KB. The canonical model IC;T = ( IC;T ; IC;T ) of C w.r.t. the TBox T is:
{ IC;T = fdC g [ fdD j 9r:D 2 sub(C) [ sub(T )g
{ AIC;T = fdD j D vT Ag, for all concept names A, and
{ rIC;T = f(dD; dE ) j D vT 9r:Eg for all role names r.</p>
      <p>The canonical model IK = ( IK ; IK ) of the KB K = (T ; A) is de ned as follows:
{ IK = fda j a 2 Sig(A) \ NI g [ fdC j 9r:C 2 sub(A) [ sub(T )g,
{ AIK = fdD j D vT Ag [ fda j K j= A(a)g,
{ rIK = f(dD; dE ) j D vT 9r:Eg [ f(da; dD) j K j= 9r:D(a)g [
f(da; db) j r(a; b) 2 Ag.</p>
      <p>
        Note that canonical models for EL are always nite. Canonical models can be
used to decide instance checking problems since a is an instance of C w.r.t.
a KB K if and only if aIK is an element of CIK in the canonical model IK
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].These canonical models and their use for instance checking, are important
for the algorithm for answering relaxed instance queries in Section 3.
      </p>
      <p>
        Instance checking can be relaxed by using a concept similarity measure. Such
a measure is a function that assigns to each pair of concepts (w.r.t. a TBox T )
a similarity value from the interval [0; 1] with C C = 1 for all concepts C. A
value C D = 0 means that the concepts C and D are totally dissimilar, while a
value of 1 indicates total similarity. A set of properties for CSMs was presented in
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In particular, a CSM is called symmetric, i C D = D C; equivalence
invariant, i for all C T D and all concepts E it holds that C E = D E;
and equivalence closed, i C T D () C D = 1. Using those similarity
measures, we de ne relaxed instances as follows:
De nition 2 (relaxed instance). The individual a is a relaxed instance of
the query concept Q w.r.t. the KB K = (T ; A), the CSM T and the threshold
t 2 [0; 1) i there exists a concept X such that Q T X &gt; t and K j= X(a).
To compute the relaxed instances of an EL-concept (w.r.t. an EL-KB) it is not
feasible to compute all su ciently similar concepts and then perform instance
checking for those, since (1) the number of such concepts can be in nite leading
to an in nite number of queries and (2) a CSM does not necessarily provide a
method how to obtain a `su ciently similar' concept.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>The Algorithm for Computing Relaxed Instances</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], we proposed the CSM c to be used to answer relaxed instance queries.
This measure compares two concepts C and D w.r.t. a TBox T by computing
their canonical models IC;T and ID;T and comparing the structure of the
models starting from the elements dC and dD, respectively. For this, we de ne a
similarity measure i on pointed interpretations. Given a pointed interpretation
p = (I; d), we denote with
{ CN(p) = fA 2 NC j d 2 AI g the set of concept names that d is an instance
of in I, and
{ SC(p) = f(r; (I; e)) j (d; e) 2 rI g the set of direct successors of d in I.
      </p>
      <sec id="sec-3-1">
        <title>The interpretation similarity</title>
        <p>i to be de ned depends on three parameters:
1. A primitive measure p : NC NC [ NR NR ! [0; 1] assigns a similarity
value to each pair of concept names and each pair of role names. Any
primitive measure has to satisfy that x p x = 1 for any concept or role name x.
Additionally, for the similarity measure i to be symmetric, p needs to be
symmetric as well. We give a default primitive measure, that simply assigns
similarity 0 to pairs of di erent concept or role names x and y:
x
default y =
(1 if x = y</p>
        <p>0 otherwise
However, other primitive measures are imaginable and useful. For example,
one might want to express that two amounts Medium and High are more
similar than Low and High, which can be achieved by using a primitive
measure with Medium p High = 0:5 and Low p High = 0.
2. A weighting function g : NC [NR ! R&gt;0 to prioritize di erent features in the
similarity measure. We give a default weighting function gdefault, that assigns
1 to every concept and role name, but again, other weighting functions can
be useful for certain cases.
3. A discounting factor w is a constant that allows for discounting of successors,
and should have a value 0 &lt; w &lt; 1.</p>
        <p>De nition 3. Given a primitive measure p, a weighting function g and the
discounting factor w, the interpretation similarity measure i is de ned as:
p
i q =
simCN(p; q) + simCN(q; p) + simSC(p; q) + simSC(q; p)</p>
        <p>If all of the sets CN(p), CN(q), SC(p), and SC(q) are empty for pointed
interpretations p; q, we de ne p i q = 1.</p>
        <p>Note that i does not necessarily yield an equivalence closed or equivalence
invariant CSM. To regain these properties, one can rst normalize the
interpretations before applying the i. An interpretation I = ( I ; I ) is in
interpretation normal form if there are no elements a; b; c 2 I , b 6= c, such that
f(a; b); (a; c)g rI and for all concepts C with b 2 CI also c 2 CI holds; i.e.,
no node has two successors for the same role name whose instantiators are in a
subset relation3. Any interpretation I can be transformed into normal form in
polynomial time by removing all redundant successors.</p>
        <p>Let IC0;T and ID0;T denote canonical models in interpretation normal form,
then the CSM c is de ned as follows:</p>
        <p>C</p>
        <p>c D = (IC0;T ; dC ) i (ID0;T ; dD):
c is indeed symmetric, equivalence invariant, and equivalence</p>
      </sec>
      <sec id="sec-3-2">
        <title>Example 1. Consider the concepts:</title>
        <p>
          Cex = Server u 9hasLoad:Medium u 9provides:(VideoStreamService u Service) and
Dex = Server u 9hasLoad:Low u 9provides:(DBService u Service u 9queryLang:SQL).
We use the primitive measure p, which is almost the default primitive
measure, except for Low p Medium = Medium p High = 0:5 instead of 0. We also
3 Formally, one describes this using the notion of a simulation between b and c in I,
see [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] for more details.
use the default weighting function gdefault and the discounting factor w = 0:8.
To compute the similarity between Cex and Dex, we have to compute their
normalized canonical models (w.r.t. to the empty TBox). Based on these, we
obtain: The hasLoad-successors of Cex and Dex have a similarity of 0:5, since
Medium p Low = 0:5. Both provides-successors of Cex and Dex, are instances
of Service, while the concept names VideoStreamService and DBService have no
correspondence. Similarly, only the provides-successor of Dex has an existential
restriction, resulting in a value of 0 for simSC in both directions. Overall, this
yields a similarity of (1+0)+(1+0)+0+0 = 0:4 for the two services. Using this, we
2+2+0+1
can nally compute the similarity between Cex and Dex:
simCN(Cex; Dex) = simCN(Dex; Cex) = 1
simSC(Cex; Dex) = simSC(Dex; Cex) = (0:2 + 0:8 0:5) + (0:2 + 0:8 0:4) = 1:12
1 + 1 + 1:12 + 1:12
        </p>
        <p>Cex c Dex = 1 + 1 + 2 + 2 = 0:707:</p>
        <p>
          The procedure to compute relaxed instances of a query concept Q w.r.t. a
KB K = (T ; A), a threshold t, and the CSM c has the following steps [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]:
1. Compute the canonical models IQ;T and IK of the query concept Q and the
        </p>
        <p>ABox A w.r.t. the TBox T .
2. Transform these models into IQ0;T and IK0.
3. De ne a maximal interpretation similarity imax between the normalized
canonical models. The measure imax is behaves like i, but chooses a subset
of the concept names and successors for elements in the canonical model IK
in a way to maximize the similarity value.
4. For each individual a occurring in K, check if the maximal interpretation
similarity between the element dQ in IQ0;T and the element da in IK0 is larger
than the given threshold t. If so, a is a relaxed instance and is returned.
Formally, imax is de ned as the unique solution of the following equation system:
simCN CN(p); Cq + simCN Cq; CN(p)</p>
        <p>+ simSC SC(p); Sq + simSC Sq; SC(p)
where
simCN(C1; C2) =</p>
        <p>
          X max g(A)(A
We showed that using the maximal interpretation similarity indeed solves the
problem of answering relaxed instance queries correctly, see [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          Theorem 1 ([
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]). Individual a is a relaxed instance of Q w.r.t. K =(T , A), t,
and c i (IQ0;T ; dQ) imax (IK0; da) &gt; t.
        </p>
        <p>We showed an upper complexity bound of NP by (non-deterministically)
translating the equation system that de nes (IQ0;T ; dQ) imax (IK0; da) into a linear
programming problem of polynomial size and solving it. However, this approach
is not practical. Instead, Elastiq implements an iterative approach, that re nes
the similarity values and converges to imax in the limit. This approach which we
present next is sound, as it converges from below, but not necessarily complete.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The Elastiq Reasoner</title>
      <p>Elastiq is the rst implementation for answering instance queries relaxed by
a similarity measure. Given an EL-ontology, an EL-query concept, an
instantiation of c and a value for a threshold, Elastiq computes a result set of ABox
individuals, where each of these individuals is relaxed instance. The CSM
employed here is xed to c as de ned in Section 3, but it and can be adjusted by
a custom weighting function, primitive similarity measure and the discounting
factor. Computing an answer to a relaxed instance query by Elastiq consists
of four main steps.</p>
      <p>
        Step 1: Global preprocessing. The canonical model IK of the ABox and the TBox
is generated by the use of a standard DL reasoner, currently Elastiq uses the
Elk system [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>Step 2: Local preprocessing. The canonical model of the query concept w.r.t. the
TBox IQ;T is generated|as in Step 1 by the use of Elk.</p>
      <p>
        Elastiq distinguishes the two preprocessing steps for the sake of computing
several relaxed instance queries against the same KB faster. Obviously, IK does not
depend on the query concept and can therefore be reused for every subsequent
queries. The model IQ;T , however, needs to be recreated for every di erent query
concept Q. In both steps we use the Elk reasoner [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to compute classi cation
and realization of the ontology, and then retrieve subsumption and instance
relationships from the results. Elastiq only needs to consider those domain
elements that are reachable from elements representing ABox individuals and thus
can be used by the main algorithm. Similarly, for IQ;T Elastiq creates only
those domain elements that are reachable through successor relations from dQ.
The normal forms of the canonical models are computed on demand. Finally,
Step 2 also initializes the data structure for the main computation in Step 3.
Step 3: Computing the maximal interpretation similarity imax. Recall, that
Elastiq implements an iterative approach, that re nes the similarity values and
converges to imax in the limit. Thus the main computation yields a sequence of
matrices, each representing an iteration of the computation. The rows of such
a matrix Mj represent domain elements from IQ;T and the columns domain
elements from IK. The values inside each cell of Mj , are identi ed by two domain
elements d 2 IQ;T and e 2 IK , and converge towards (IQ;T ; d) imax (IK; e)
for j ! 1 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Instead of computing the similarity values for all pairs of elements
from the canonical models in each iteration, Elastiq restricts the entries in
Mj to those elements that are reachable from (elements representing) ABox
individuals by paths in IK. To this end, M0 is initialized with one row (for dQ)
and as many columns as there are individuals in the ABox. The set of columns
is extended with new elements (d0; e0) if there exists an element (d; e) in M0
such that d and d0 are connected in IQ;T via some role r, and e and e0 are
connected in IK via some role s. Since the canonical models IQT and IK are
nite, the size of M0 is bounded by j IQ;T j j IK j. Once all reachable pairs have
been added to M0, it contains values exactly for those pairs that are necessary
for computing similarities between the domain elements that we are interested
in|namely similarities of the query concept and each ABox individual (dQ; dai ).
      </p>
      <p>Each iteration j + 1 creates a new matrix Mj+1, and computes the values
by applying Equation (1) to the values in Mj . Elastiq needs only to keep the
current matrix Mj+1 and the last one Mj (j 0) in memory. The iterations for
the re nement of similarity values proceeds until one of the following termination
criteria is met:
{ the maximal amount of iterations imax speci ed by the user is reached; or
{ no interesting similarity value (dQ; da) has changed during the last iteration
by more than a relative factor speci ed by the user.</p>
      <p>Step 4: Comparison with t. After the iteration stopped, the interesting similarity
values Mj (dQ; da) are compared to the input threshold t and the answer set of
individuals is compiled. This set is then listed in descending order of similarity.
4.1</p>
      <sec id="sec-4-1">
        <title>Optimizations for Computing Relaxed Instances</title>
        <p>A naive implementation of the algorithm can hardly compute relaxed instances
for reasonably large ontologies in acceptable time. As mentioned before, a highly
e ective optimization is the reuse of IK for multiple queries. Since ABoxes are
usually much larger than query concepts, the model IK is also be much more
costly to create than the models IQ;T . Additionally, the normalization of
canonical models can be done more e ciently than by computing simulations to
determine unnecessary role-successors. Before adding a domain element dC as an
r-successor to some element dD Elastiq checks whether there already exists an
r-successor dE for dD such that E v C. In this case normalization would
eliminate dC , thus avoiding the introduction of dC (and its role successors) improves
the runtime of canonical model generation further. Similarly, when adding dC
as an r-successor to dD, Elastiq eliminates all r-successors dE of dD if C v E.</p>
        <p>During the generation of the canonical models Elastiq performs many
subsumption checks. Although Elk is currently one of the fastest reasoners for EL,
caching of sub- and superclass relations yielded a great performance boost, since
Elastiq needs to access sub- and superclass relationships for the same class
several times.</p>
        <p>The algorithm from Section 3 suggests to iterate over all subsets of CN and
SC successors in order to nd the maximal similarity. This exponential procedure
can be improved by looking at the primitive similarities between elements. Let
d 2 IQ;T and e 2 IK. By de nition of imax we are looking for those subsets of
the concept names and successors of e that maximize the similarity. Instead of
iterating over all subsets of CN(e) to nd the best pairing, we showed that if
B 2 CN(e) such that 9A 2 CN(d) with A p B = 1, we can always keep B in
the subset of CN(e), because it will always increase the similarity. Conversely, if
B0 2 CN(e) such that 8A 2 CN(d), then A p B0 = 0, and B0 can be left out of
the subset of CN(e), since it cannot increase the similarity. Analogously, we can
remove (s; q) from SC(e) if for all (r; p) 2 SC(d) we have r p s = 0. This can
dramatically reduce the number of subsets to be checked. In fact, for the default
primitive measure, this means that the best subset of concept names can always
be computed in linear time by checking each concept name in CN(e) separately.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluation</title>
        <p>Our preliminary performance evaluation of Elastiq used di erent versions of
the GeneOntology that describe schizosaccharomyces pombe|some species of
yeast. These ontologies ranged from 9,157 concept names and 34,875 individuals
in the rst version to 51,949 concept names and 289,206 individuals for the 15th
version. The sizes of canonical models ranged from 77,941 to 602,548 elements.</p>
        <p>
          We obtained our test ontologies by custom dataset generation provided by
the Manchester OWL Corpus [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. These GeneOntology versions are anonymised
and therefore any contentual interpretation of our results is virtually impossible.
We restricted our investigations solely to the performance of Elastiq and leave
the intricate task of a quality assessment for future work. We discovered that for
each individual e there exists a very fragmented concept assertion in the ABox of
the form 9is a:Ce(e), where the quali cation Ce is rarely larger than 3 conjuncts
with a role-depth of at most 2.
        </p>
        <p>Our test suite contains 10 randomly generated query concepts with
increasingly complex structures. These queries were built over the common signature
of versions 1{15 of the GeneOntology (approximately 1,000 concept names and
4 roles). The smallest query (Query 1) only contained 6 concept and role names
and had a role-depth of 2, while the Query 10 had a size of 670 and a
roledepth of 5. Due to the plain structure of concept assertions we wrapped each
query concept Qi with 9is a:Qi in order to provoke a more complex
computation. For these queries, the sizes of the canonical models IQ;T ranged from 2 to
236 elements. We evaluated the queries for the default primitive measure and
weighting function, and counted the number of relaxed instances for a threshold
of t = 0:333. The test system had a 1800 MHz dual core processor AMD Turion
II Neo and 6 GB of RAM. Figure 1 shows the runtime of Elastiq for answering
all 10 relaxed instance queries w.r.t. each ontology version. The high runtimes
for ontology versions 11, 12, and especially 13{15 is mainly due to the increase
of the size of the canonical model IK. Most queries returned a lot more relaxed
instances for the ontologies 11{15 than for ontologies 1{10. Queries 8 and 9
returned the largest number of relaxed instances, up to over 200,000 for Query 8
evaluated on nnotations15.owl.</p>
        <p>When breaking down the times for preprocessing and the query answering
further, it shows that the preprocessing time is dominated by the attening of
the ontology, the reasoning done by Elk, and the construction of the canonical
model IK, while the time to construct the canonical models IQ;T is negligible.
However, the overall query answering time is largely spend on Step 3, i.e., the
iterations to compute the maximal similarity.</p>
        <p>Elastiq performs ABox realization to obtain IK and in addition a kind of
relaxed ABox realization for the query concepts in the test suite. Now, while
it is clear that Elastiq is slower than Elk for ABox realization, it showed,
suprisingly, that this does not need to be the case for other optimized DL
reasoners. We compared Elastiq's overall reasoning times with the ABox
realization times of the commonly used FaCT++ reasoner [18]. Figure 2 shows that
Elastiq mostly performed better than FaCT++, although solving a more
complex task.4 However, computation times of more than a minute for 10 relaxed
queries over ABoxes with 1,000 individuals still calls for further improvement.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>In this paper we investigated the novel inference of answering relaxed instance
queries. These queries can be gradually relaxed by varying the threshold, while
4 Note that FaCT++ classi cation resulted in an error for ontologies 13{15.
the similarity measure allows to specify which parts of the query can be relaxed
and which should be kept. We devised a concept similarity measure, c, that
works for general EL-TBoxes, and showed how to relax instance queries using this
measure. We presented Elastiq, a prototype sytem for relaxed instance query
answering, some straight-forward, but highly e ective optimization techniques,
and gave a rst performance evaluation using di erent queries on increasingly
complex versions of the GeneOntology.</p>
      <p>It turned out that for ontologies with large ABoxes, Elastiq is still not fast
enough. We want to explore further optimizations for the computation of imax.
Currently, the matrices converge to imax from below. With upper bounds on
the maximal similarity, it would be possible to prune computation early on
individuals that are certainly not relaxed instances. While currently the algorithm
decides which individuals are certainly relaxed answers, an upper bound could
also be used to determine which individuals are certainly not relaxed answers,
therefore making the approach not just sound, but complete. We also need to
perform further evaluations for the performance on ontologies and query concepts
from other domains, but also to evaluate the quality of the answers returned by
Elastiq in regard of the di erent CSMs in use.</p>
      <p>Currently, one needs to specify a threshold, above which individuals are
considered as relaxed answers. This threshold approach guarantees a minimal
similarity and hence quality of the result, but it is hard to predict how many relaxed
answers a query would return for a certain threshold. In cases where just the
rst few most similar relaxed answers are of interest, top-k answering would be
more useful. We are investigating to e ciently implement this type of answering
mechanism. Finally, it would be useful to not just consider instance queries, but
more expressive query types as well. We are currently working on the theoretical
basis for relaxing conjunctive queries.
17. S. Tongphu and B. Suntisrivaraporn. On desirable properties of the structural
subsumption-based similarity measure. In T. Supnithi, T. Yamaguchi, J. Z. Pan,
V. Wuwongse, and M. Buranarach, editors, Proceedings of the 4th Joint
International Conference on Semantic Technology, (JIST 2014), volume 8943 of LNCS,
pages 19{32. Springer, 2014.
18. D. Tsarkov and I. Horrocks. Fact++ description logic reasoner: System
description. In Proceedings of the Third International Joint Conference on Automated
Reasoning, IJCAR'06, pages 292{297, Berlin, Heidelberg, 2006. Springer-Verlag.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Borgida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Walsh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Hirsh</surname>
          </string-name>
          .
          <article-title>Towards measuring similarity in description logics</article-title>
          .
          <source>In Proc. of the 2005 Description Logic Workshop (DL</source>
          <year>2005</year>
          ), volume
          <volume>147</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Borgwardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Distel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Pen</surname>
          </string-name>
          <article-title>~aloza. How fuzzy is my fuzzy description logic? In B</article-title>
          .
          <string-name>
            <surname>Gramlich</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Miller</surname>
          </string-name>
          , and U. Sattler, editors,
          <source>Proceedings of the 6th International Joint Conference on Automated Reasoning (IJCAR'12)</source>
          , volume
          <volume>7364</volume>
          <source>of Lecture Notes In Arti cial Intelligence</source>
          , pages
          <fpage>82</fpage>
          {
          <fpage>96</fpage>
          . Springer-Verlag,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Borgwardt</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Pen</surname>
          </string-name>
          <article-title>~aloza. Undecidability of fuzzy description logics</article-title>
          . In G. Brewka,
          <string-name>
            <given-names>T.</given-names>
            <surname>Eiter</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. A</surname>
          </string-name>
          . McIlraith, editors,
          <source>Proc. of the 12th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR-12)</source>
          , pages
          <fpage>232</fpage>
          {
          <fpage>242</fpage>
          . AAAI Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Cerami</surname>
          </string-name>
          and
          <string-name>
            <given-names>U.</given-names>
            <surname>Straccia</surname>
          </string-name>
          .
          <article-title>On the (un)decidability of fuzzy description logics under lukasiewicz t-norm</article-title>
          .
          <source>Inf. Sci.</source>
          ,
          <volume>227</volume>
          :1{
          <fpage>21</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. C.
          <string-name>
            <surname>d'Amato</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Fanizzi</surname>
            , and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Esposito</surname>
          </string-name>
          .
          <article-title>A semantic similarity measure for expressive description logics</article-title>
          .
          <source>In Proc. of Convegno Italiano di Logica Computazionale, CILC05</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>A.</given-names>
            <surname>Ecke</surname>
          </string-name>
          .
          <article-title>Similarity-based relaxed instance queries in EL++</article-title>
          .
          <source>In Proceedings of the First Workshop on Logics for Reasoning about Preferences</source>
          , Uncertainty, and Vagueness,
          <source>CEUR-WS. CEUR</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Ecke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pen</surname>
          </string-name>
          <article-title>~aloza, and</article-title>
          <string-name>
            <given-names>A.-Y.</given-names>
            <surname>Turhan</surname>
          </string-name>
          .
          <article-title>Towards instance query answering for concepts relaxed by similarity measures</article-title>
          .
          <source>In Workshop on Weighted Logics for AI (in conjunction with IJCAI'13)</source>
          , Beijing, China,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>A.</given-names>
            <surname>Ecke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pen</surname>
          </string-name>
          <article-title>~aloza, and</article-title>
          <string-name>
            <given-names>A.-Y.</given-names>
            <surname>Turhan</surname>
          </string-name>
          .
          <article-title>Answering instance queries relaxed by concept similarity</article-title>
          .
          <source>In Proceedings of the Fourteenth International Conference on Principles of Knowledge Representation and Reasoning (KR'14)</source>
          , pages
          <fpage>248</fpage>
          {
          <fpage>257</fpage>
          . AAAI Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>A.</given-names>
            <surname>Ecke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pen</surname>
          </string-name>
          <article-title>~aloza, and</article-title>
          <string-name>
            <given-names>A.-Y.</given-names>
            <surname>Turhan</surname>
          </string-name>
          .
          <article-title>Similarity-based relaxed instance queries</article-title>
          .
          <source>Journal of Applied Logic</source>
          ,
          <year>2015</year>
          . In press.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. T. Gene Ontology Consortium.
          <article-title>Gene Ontology: Tool for the uni cation of biology</article-title>
          .
          <source>Nature Genetics</source>
          ,
          <volume>25</volume>
          :
          <fpage>25</fpage>
          {
          <fpage>29</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kazakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Krotzsch, and</article-title>
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Simanc k. The incredible ELK: from polynomial procedures to e cient reasoning with EL ontologies</article-title>
          .
          <source>J. Autom. Reasoning</source>
          ,
          <volume>53</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>61</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>K.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.-Y.</given-names>
            <surname>Turhan</surname>
          </string-name>
          .
          <article-title>A framework for semantic-based similarity measures for ELH-concepts</article-title>
          .
          <source>In L. F. del Cerro</source>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herzig</surname>
          </string-name>
          , and J. Mengin, editors,
          <source>Proc. of the 13th European Conf. on Logics in A.I. (JELIA 2012), Lecture Notes In Arti cial Intelligence</source>
          , pages
          <fpage>307</fpage>
          {
          <fpage>319</fpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Deciding inseparability and conservative extensions in the description logic EL</article-title>
          .
          <source>Journal of Symbolic Computation</source>
          ,
          <volume>45</volume>
          (
          <issue>2</issue>
          ):
          <volume>194</volume>
          {
          <fpage>228</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>N.</given-names>
            <surname>Matentzoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bail</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          .
          <article-title>A snapshot of the OWL web</article-title>
          .
          <source>In The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference</source>
          , Sydney,
          <string-name>
            <surname>NSW</surname>
          </string-name>
          , Australia,
          <source>October 21-25</source>
          ,
          <year>2013</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , pages
          <volume>331</volume>
          {
          <fpage>346</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>B.</given-names>
            <surname>Suntisrivaraporn</surname>
          </string-name>
          .
          <article-title>A similarity measure for the description logic EL with unfoldable terminologies</article-title>
          .
          <source>In 5th International Conference on Intelligent Networking and Collaborative Systems (INCoS)</source>
          , pages
          <fpage>408</fpage>
          {
          <fpage>413</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>S.</given-names>
            <surname>Tongphu</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Suntisrivaraporn</surname>
          </string-name>
          .
          <article-title>A non-standard instance checking for the description logic ELH</article-title>
          .
          <source>In Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD</source>
          <year>2014</year>
          ), Rome, Italy,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>