<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Similarity-based Relaxed Instance Queries in ++</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andreas Ecke?</string-name>
          <email>ecke@tcs.inf.tu-dresden.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Theoretical Computer Science</institution>
          ,
          <addr-line>TU Dresden</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Description Logic (DL) knowledge bases (KBs) allow to express knowledge about concepts and individuals in a formal way. This knowledge is typically crisp, i.e., an individual either is an instance of a given concept or it is not. However, in practice this is often too restrictive: when querying for instances, one may often also want to nd suitable alternatives, i.e., individuals that are not instances of query concept, but could still be considered `good enough'. Relaxed instance queries have been introduced to gradually relax this inference in a controlled way via the use of concept similarity measures (CSMs). So far, those algorithms only work for the DL EL, which has limited expressive power. In this paper, we introduce a suitable CSM for EL++-concepts. EL++ adds nominals, role inclusion axioms, and concrete domains to EL. We extend the algorithm to compute relaxed instance queries w.r.t. this new CSM, and thus to work for general EL++ KBs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Description Logics (DLs) are a family of knowledge representation formalisms
widely used in AI to describe and reason about categories and objects
(individuals) of an application domain [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Each DL has a set of concept constructors, that
allow to build complex concepts to formalize those categories, and are used in
axioms and assertions to de ne the relations between di erent concepts and
individuals. The set of axioms and assertions that describe the terminological and
the assertional knowledge of the application domain, respectively, are collected
in the TBox and the ABox. Together, TBox and ABox form a DL knowledge
base (KB).
      </p>
      <p>
        The formal semantics of DLs allows for the de nition of reasoning services,
i.e., inferences that allow to compute implicit knowledge from that explicitly
described in the KB. Standard reasoning services include consistency of a KB,
subsumption tests between di erent concepts, and instance checking, which
derives whether an individual is an instance of a concept or not. Those reasoning
services have been implemented in many highly optimized DL systems. One DL
? Supported by the German Research Foundation (DFG) Graduiertenkolleg 1763
(QuantLA).
that is especially interesting in terms of reasoning is EL; while quite restricted in
the constructors it o ers, all the standard inferences can be implemented using
polynomial-time algorithms. EL has been extended to a maximal superset EL++
that still retains the favorable computational properties in [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>Since DLs are the underlying logics of the OWL ontology language and its
pro les (including OWL 2 EL, which is based on EL++) standardized by the
W3C, their usage has increased rapidly in many elds like the Semantic Web,
biomedical ontologies and more. By now, there is a large collection of di erent
KBs available written in those languages. However, for many applications,
retrieving strict instances from these KBs is often too restrictive, as often one may
want to nd suitable alternatives, even in cases where no individual completely
matches the query concept. Those alternatives may be individuals that are not
strict instances, but ful ll most of the requirement and are thus quite similar to
the query concept.</p>
      <p>
        The reasoning services of relaxed instance queries has been introduced in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
This inference relaxes the instance retrieval problem to return more individuals
by the use of concept similarity measure (CSM). Given a CSM and a threshold
t, this inference will return all instances of concepts that have a similarity of at
least t to the query concept. Algorithms to compute relaxed instances have been
introduced for unfoldable and general EL-TBoxes in [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
      </p>
      <p>However, the limited expressiveness of EL is often a problem; especially for
query answering, it is useful to be able to use concrete domains to describe
quantitative aspects of individuals and use these for querying. For example, one can
use this to describe the geographic location of objects, the bandwidth of servers
or time points in measurement series and incorporate the similarity between
these values to nd relaxed instances. Similarly, other features of EL++, like
nominals, that allow to refer to speci c individuals in concepts, role inclusions,
and domain and range restrictions can be very useful in practice.</p>
      <p>In this paper, we will extend the problem of relaxed query answering to EL++.
To do so, after formally de ning EL++ and CSMs in Section 2, we introduce
pseudo-interpretations in Section 3, which can than be used to de ne simulations
and canonical models that correspond to the semantics of EL++. In Section 4 we
de ne a parameterizable similarity measure on pointed pseudo-interpretations,
which can be lifted to EL++-concepts using the canonical models. This CSM can
then be used to query for relaxed instances in general EL++-KBs as shown in
Section 5. We conclude the paper in Section 6.
2</p>
      <p>Preliminaries
This section will give a brief introduction to the DL EL++ and de ne concept
similarity measures and some of their properties.
2.1</p>
      <p>The DL EL++
EL++ concepts are built from four countable, pairwise disjoint sets: The set NC
of concept names; the set NR of role names; the set NI of individual names; and
concept name
top concept
bottom concept
nominal
conjunction
existential
restriction
syntax
A
&gt;
?
fog
C u D
9r:C
concrete domain
the set NF of feature names. Using the concept constructors given in the upper
part of Table 1, these names are used to construct complex concept descriptions.
The set of all EL++-concept descriptions is denoted with C(EL++).</p>
      <p>When formulating a knowledge domain in terms of DLs, one expresses all
classes of interest as (possibly complex) EL++-concepts, and possible relations
between those classes as roles. The general knowledge about the classes can
then be formalized using the axioms given in the middle part of Table 1, while
the knowledge about speci c objects can be expressed using concept and role
assertions of the form C(a) and r(a; b). The axioms and assertions are collected in
the TBox and ABox, respectively, which together form a knowledge base (KB).</p>
      <p>EL++ allows the use of p-admissible concrete domains. Such a concrete
domain D = ( D; P D) consist of a set of concrete values D and a set of
predicates p 2 P , each associated with an arity n &gt; 0 and an extension pD ( D)n.
Features connect objects described by the DL to elements of the concrete
domain. For example, using the concrete domain Q with Q = Q the set of
rational numbers and predicates P = f=; p; =pg for p 2 Q with the obvious
meanings, one can express that adults are persons that are at least 18 using
Adult v Person u 18(age) or that Anna is 171 cm tall and her age is the same
as her shoe size: (=171(height) u =(age; shoeSize))(anna).</p>
      <p>
        P-admissible concrete domains in EL++ only allow for limited
expressiveness, in order to retain tractability. Besides the obvious requirement that
satisability and implication in these concrete domains must be decidable in
polynomial time, there are two other changes when compared to general concrete
domains: First, as there are no abstract features, predicates can only compare
features of a single element. This means that EL++ does not allow to express
Person v &lt;(age; mother age) u &lt;(age; father age), i.e., that every person is
younger than her parents. Second, the concrete domains need to be convex, i.e.,
if a set of predicates implies the disjunction of some predicates, then it must
also imply one of the disjuncts. This is a rather big restriction, but there still
exist useful p-admissible concrete domains, like those given in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which allow
to refer to rational numbers and strings. Indeed, we argue that for our purpose,
relaxed instance queries, for any set of concrete values D, even a single unary
predicate =d for d 2 D to attach concrete values to individuals is useful.
Example 1. Consider the concrete domain G to represent geographic coordinates
as a pair of latitude and longitude, with G = [ 90; 90] [ 180; 180] R R
and the unary predicates =p for p 2 G. This allows a service provider to
describe the location of all its branch o ces in the ABox using assertions like
(=(51:026;13:723)(location))(o ce1). If we construct the similarity measure used for
relaxing the queries in such a way that it assigns larger similarities to locations
closer together, an instance query which includes the predicate =l(location) for
the location of the user will try to nd the closest branch o ces that also match
the rest of the query. Indeed, one could also construct a similarity measure that
returns similarity 0 for locations more than a set distance away, allowing the
user to specify the maximum distance. Thus, while the concrete domain itself
is extremely inexpressive, it allows the relaxed instance queries to include the
distance between locations in its similarity evaluation.
      </p>
      <p>
        Note that in this paper we restrict to a single concrete domain D, but it is
easy to generalize the similarity measure and the algorithms to compute relaxed
instance queries to handle multiple concrete domains at once. Also note that it is
possible to remove domain and range restrictions from the KB without changing
its semantics [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. To do so, we can replace every domain restrictions dom(r) v C
with 9r:&gt; v C and for any range restrictions ran(r) v C, we replace all 9r:D
occurring in the KB with 9r:(C u D) and for any role assertion r(a; b) we add
D(b). Thus, in the remainder of this paper, we assume that KBs do not contain
any domain or range restrictions.
      </p>
      <p>The semantics of EL++-concepts is given by means of interpretations. An
interpretation I = ( I ; I ) is a tuple consisting of a domain I and an
interpretation function I that assigns to each concept name C 2 NC a subset CI I
of the the domain, to each role name r 2 NR a binary relation rI I I ,
to each individual name a 2 NI an element aI 2 I , and to each feature name
f 2 NF a partial function f I : I 9 D. The interpretations can be extended
to complex EL-concepts as shown in the last column of Table 1.</p>
      <p>Instead of viewing an interpretation I as a tuple of functions that assign
subsets, binary relations and elements of I to the elements of NC , NR and NI ,
and partial functions to the elements of NF , one can also view it as a tuple of
functions</p>
      <p>I : ( I ! P(NC ); I ! P(NR</p>
      <p>I ); I ! P(NI ); I ! NF 9</p>
      <p>D)
from the domain I to a subset of NC (the concept names that this element is
an instance of), to a binary relation between NR and I (the successors of the
element), to a subset of NI (the individual names that map to this element), and
to a partial function mapping feature names to concrete values, respectively. If we
require that each individual name occurs only once, this de nition is equivalent
to the usual one.
2.2</p>
    </sec>
    <sec id="sec-2">
      <title>Concept similarity measures</title>
      <p>Given a EL++-KB K, a concept similarity measure (CSM) is a function K :
C(EL++) C(EL++) ! [0; 1] such that C K C = 1 for all concepts C.
Intuitively, a value C K D = 0 means that the concepts C and D are totally
dissimilar w.r.t. K, while a value of 1 indicates total similarity. We often simply
write instead of K if the KB K is clear from the context.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] a set of properties for CSMs is de ned. We extend the de nition of
these properties to the case of general TBoxes.
      </p>
      <p>De nition 1. A CSM K : C(EL++) C(EL++) ! [0; 1] is:
symmetric i C K D = D K C;
equivalence invariant i for all C K D it holds that C K E = D K E;
equivalence closed i C K D () C K D = 1;
bounded i the existence of E 6 K &gt; with C vK E and D vK E implies C K</p>
      <p>D &gt; 0; and
dissimilar closed i C; D 6 K &gt; and there is no E 6 K &gt; with C vK E and</p>
      <p>D vK E implies C K D = 0.</p>
      <p>
        These formally de ned properties make CSMs more predictable for users. The
measures in [5{7] ful ll most of these properties. The measures from [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] are
additionally parameterizable, which allows users to calibrate the measure to t
their expectations. In our setting these parameterizable CSMs enable users to
specify which features of query concepts should be relaxed.
3
      </p>
      <p>Pseudo-interpretations
Unlike in EL without concrete domains, the de nition of interpretations for EL++
given in the last section does not admit canonical models. For example, in the
concrete domain of the rational numbers Q = (Q; P Q) introduced before, a
concept like &gt;0(f ) will have in nitely many models (one for each positive rational
number) without any of them being preferable and therefore canonical. One way
to avoid this problem and ensure the existence of canonical models is to only
consider pseudo-interpretations.</p>
      <p>These pseudo-interpretations di er from the usual interpretations in just the
fourth component: Instead of assigning each domain element a partial function
from the feature names to concrete elements, we simply assign to each element
directly a subset of the set of all predicates of D over the feature names, denoted
with PredD(NF ). In that way, each pseudo-interpretation corresponds to a set
of usual interpretations, namely all those whose concrete elements assigned to
the feature names of a domain element satisfy all the predicates mapped to the
domain element by the pseudo-interpretation.</p>
      <p>De nition 2. A pseudo-interpretation J = ( J ; fCJ ; fRJ ; fIJ ; fFJ ) consists of
an interpretation domain J and the interpretation functions fCJ : J !
P(NC ), fRJ : J ! P(NR J ), fIJ : J ! P(NI ), and fFJ : J !
P(PredD(NF )), such that for each a 2 NI there exists exactly one d 2 J with
a 2 fIJ (d), and the conjunction
conj((J ; d)) =</p>
      <p>^
p(f1;:::;fn)2fFJ (d)
is satis able in D for any d 2</p>
      <p>Pseudo-interpretations can be used exactly as usual interpretations, with
the exception that it does not interpret feature names itself; however, it does
interpret predicates of the concrete domain:</p>
      <p>AJ = fd 2</p>
      <p>J j A 2 fCJ (d)g
rJ = f(d; e) 2</p>
      <p>J</p>
      <p>J j (r; e) 2 fRJ (d)g
aJ = d
()</p>
      <p>a 2 fIJ (d)
p(f1; : : : ; fn)J = fd 2</p>
      <p>J j D j= conj((J ; d)) ) p(f1; : : : ; fn)g
Any other concept constructors, axioms, and assertions can then be interpreted
as given in Table 1. We say that a pseudo-interpretation J is a model a KB K,
if it satis es all axioms and assertions in K. This is the case if and only if all
corresponding usual interpretations are models of K.</p>
      <p>We call a pair (J ; d) consisting of a pseudo-interpretation J and an element
d 2 J a pointed pseudo-interpretation and denote the set of all pointed
pseudointerpretations as P. We sometimes use fC (p) (and similarly for fR, fI and fF )
instead of fCJ (d) for p = (J ; d).
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Simulations</title>
      <p>
        Simulations allow the characterization of elements of interpretations w.r.t. the
concepts they are instance of. To extend the simulation relation between
interpretations w.r.t. EL given in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to pseudo-interpretations w.r.t. EL++, we
observe the following:
{ role inclusions, range and domain restrictions are not concept constructors,
and thus do not matter for the set of concepts that an element of a
pseudointerpretation is instance of;
{ the bottom concept ? can not occur in pseudo interpretations;
{ nominals allow to use individual names in concepts, and thus simulations
need to preserve individuals; and
{ for concrete domains, simulations need to preserve the valuations that
satisfy the elements, which can be formalized using implications between the
predicate sets of pointed pseudo-interpretations.
      </p>
      <p>Thus, we can de ne a simulation relation for EL++ as follows:
De nition 3. Let J1 and J2 be pseudo-interpretations. A relation S J1
J2 is a simulation between J1 and J2, if the following conditions hold:
1. For all (d; e) 2 S and A 2 NC , if d 2 AJ1 then e 2 AJ2 .
2. For all (d; e) 2 S, r 2 NR and (d; d0) 2 rJ1 , there is an (e; e0) 2 rJ2 with
(d0; e0) 2 S.
3. For all (d; e) 2 S and a 2 NI , if d = aJ1 then e = aJ2 .
4. For all (d; e) 2 S, we have that D j= conj((J2; e)) ) conj((J1; d)).</p>
      <p>Given two pointed pseudo-interpretations p = (J1; d) and q = (J2; e), we say
that p simulates q (denoted p . q), if there exists a simulation S J1 J2
between J1 and J2 with (d; e) 2 S. p and q are equisimilar (denoted p ' q), if
p . q and q . p.</p>
      <p>
        This de nition of simulations is reasonable, as it corresponds with the set
of concepts that the elements in the simulation are instances of. Indeed, we can
extent the following result from [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to simulations of pseudo-interpretations in
EL++:
Theorem 1. Let p and q be pointed pseudo-interpretations, then:
1. p . q i C(p) C(q), and
2. p ' q i C(p) = C(q).
3.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Canonical models</title>
      <p>Next, we need to de ne canonical models for EL++. For these, the additional
axioms like role inclusions are important. However, if the concept C contains
the bottom concept ?, it must be equivalent to ?, and thus can not be instance
of any element in an interpretation { in particular, it does not have a canonical
model. Thus, by requiring that C is satis able w.r.t. K, we do not have to worry
about ? at all.</p>
      <p>Since individuals can be part of concepts via nominals, we need to take care
of the case that 2 individuals are equivalent, e.g. by the GCI fag v fbg. In this
case, we cannot create two elements in the canonical interpretation for the two
concepts fag and fbg, since this would not yield a model of the TBox anymore.
Instead, we need to take one representative for all equivalence classes of concepts
that are subsumed by the same individual:</p>
      <p>[C] = fD 2 C(EL++) j 9a 2 NI : K j= C v fag ^ K j= D v fagg
Finally, we need the notion Sig(X) of the signature of X, i.e., the set of all
concept, role, individual and feature names occurring in X, and the notion sub(C)
and sub(K) of the set of all sub-concepts of C and all sub-concepts of concepts
occurring in K, respectively. Then, we can de ne canonical models as follows:
De nition 4. Let K be a satis able EL++-KB and C 2 C(EL++) be an
EL++concept with C 6 K ?. The canonical model JC;K = ( JC;K ; fC ; fR; fI ; fF ) of C
w.r.t. K is a pseudo-interpretations de ned as follows:
{ JC;K = fd[C]g [ fd[fag] j a 2 (Sig(K) [ Sig(C)) \ NI g [ fd[D] j 9r:D 2
sub(C) [ sub(K)g,
and for all d[D] in JC;K :
{ fC (d[D]) = fA 2 NC j K j= D v Ag,
{ fR(d[D]) = f(r; d[E]) 2 NR JC;K j K j= D v 9r:Eg,
{ fI (d[D]) = fa 2 NI j K j= D v fagg, and
{ fF (d[D]) = fp(f1; : : : ; fn) 2 PredD(NF ) j K j= D v p(f1; : : : ; fn)g.
d[D] 2</p>
      <p>It can be shown that the canonical model JC;K is indeed a model of the KB
K, and its elements d[D] are instances of the corresponding concept D, for all</p>
      <p>JC;K .</p>
      <p>Lemma 1. Let K be a satis able EL++-KB and C; D be EL++-concepts with
C 6 K ?. Then:
1. if d[D] 2 JC;K , then d[D] 2 DJC;K , and
2. JC;K j= K.</p>
      <p>Finally, it can be shown that the canonical model is indeed `canonical', i.e.,
it can simulate all other models (and is thus least w.r.t. .):
Theorem 2. Let K be a satis able EL++-KB and C; D be EL++-concepts with
C 6 K ?. Then:
1. for all pseudo-models J of K and all elements d 2 J it holds d 2 CJ i
(JC;K; d[C]) . (J ; d),
2. for all pseudo-models J of K, all individuals a occurring in K, and all
elements d 2 J it holds d = aJ i (JK ; d[fag]) . (J ; d), and
3. C vK D i d[C] 2 DJC;K i (JD;K; d[D]) . (JC;K; d[C]).</p>
      <p>Those results, besides being needed to prove formal properties of the
similarity measure, show that canonical models are reasonably de ned.
4</p>
      <p>
        A Concept Similarity Measure for EL++
Similarly to [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we will de ne the CSM via a similarity measure on pointed
pseudo-interpretations, by translating the concepts into interpretations by
taking their canonical model. To de ne the similarity measure on pointed
pseudointerpretations, we need a few basic ingredients:
{ a primitive measure prim: NC NC [ NR NR [ NI NI ! [0; 1] that
assigns a similarity value to each pair of concept names, role names, and
individual names,
{ a weighting function g : NC [ NR [ NI [ NF ! R&gt;0, which allows more
important features of interpretations to contribute more to the nal similarity
values than others,
{ a similarity measure D: D D ! [0; 1] on the concrete domain,
{ a discounting factor w 2 (0; 1), and a concrete domain factor c &gt; 0.
      </p>
      <p>We will extend the concrete similarity measure D to handle unde ned
values, i.e., D: ( D [f?g) ( D [f?g) ! [0; 1] by setting ? D d = d D ? = 0
for d 2 D and ? D ? = 1. This can be further extended to similarity on
valuations, i.e., partial functions u; v : NF 9 D, by computing the weighted
average of the similarity values for all features:
u</p>
      <p>D v =</p>
      <p>Pf2dom(u)[dom(v) g(f ) simD(u(f ); v(f ))</p>
      <p>Pf2dom(u)[dom(v) g(f )
:
Finally, we can de ne the similarity of conjunctions of predicates on the concrete
domain using a similar construction to the Hausdor metric, where the
valuations u, v are restricted to those feature names occurring in fF (p) or fF (q):
simD(p; q) = min</p>
      <p>inf sup
uj=conj(p) vj=conj(q)
u</p>
      <p>D v;</p>
      <p>inf sup
uj=conj(q) vj=conj(p)
u</p>
      <p>D v
!
{ CN(p) =
{ SC(p) =
(fR(p)
(</p>
      <p>f(r&gt;; d)g</p>
      <p>All other things, i.e., concept names, successors, and individual names, can
be compared directly. For this, we introduce a new role r&gt; and a new individual
a&gt;, in case that a pointed pseudo-interpretation does not have any successors or
individuals, similarly to how &gt; is used for concept names. Then we can de ne
for a pointed pseudo-interpretation p:
(
fC (p) if fC (p) 6= ; , the set of concept names of p,
f&gt;g otherwise
if fR(p) 6= ; , the set of successors of p,
otherwise
{ IN(p) = fI (p) if fI (p) 6= ; , the set of individuals of p.</p>
      <p>fa&gt;g otherwise</p>
      <p>To compare how similar two pointed pseudo-interpretations are for these
aspects, we use pairings. A pairing P X Y between sets X and Y is a total
binary relation, where totality means that all elements x 2 X and y 2 Y occur
in some some tuple of P . For two pointed pseudo-interpretations p = (J1; d) and
q = (J2; e), we are mainly interested in the following pairings:
{ PC (p; q) P((CN(p) CN(q)) n f(&gt;; &gt;)g) is the set of all concept name
pairings on the concepts that p and q are instance of.
{ PS (p; q) P((SC(p) SC(q)) n f((r&gt;; d); (r&gt;; e))g) is the set of all successor
pairings of p and q.
{ PI (p; q) P((IN(p) IN(q)) n f(a&gt;; a&gt;)g) is the set of all individual pairings
of p and q.</p>
      <p>Using these pairings, we can nally de ne the similarity measure i for
pointed pseudo-interpretations. It works by averaging over the weighted
similarity of the pairs in the best concept name, successor, and individual pairings.
The similarity between pairs of successors is computed recursively. If at least
one of the pointed interpretations contain any predicates, the similarity between
these predicates as de ned before is added, weighted with the concrete domain
factor c.</p>
      <p>max
pC2PC(p;q)
pS2PS(p;q)
pI 2PI (p;q) (A;B)2pC
simC (pC ) + simS (pS ) + simI (pI ) + simF (p; q)
X g(A; B) +</p>
      <p>X g(r; s) +</p>
      <p>X g(a; b) + gF (p; q)
((r;d);(s;e))2pS
(a;b)2pI
with:
simC (pC ) =
simS (pS ) =
simI (pI ) =</p>
      <p>X
(A;B)2pC</p>
      <p>X
((r;d);(s;e))2pS</p>
      <p>X
(a;b)2pI
g(A; B)(A</p>
      <p>prim B);
g(a; b)(a</p>
      <p>prim b);
g(r; s)(r
prim s)(w + (1
w)(J1; d) i (J2; e));
simF (p; q) = gF (p; q) simD(p; q);
gF (p; q) = (c if fF (p) 6= ; _ fF (q) 6= ; :</p>
      <p>0 otherwise</p>
      <p>Since i can be seen as a contraction mapping on the similarity values
between all elements of J1 and J2, the Banach xed-point theorem will yield the
following result:</p>
    </sec>
    <sec id="sec-5">
      <title>Theorem 3.</title>
      <p>i is well-de ned, i.e., p</p>
      <p>i q has a unique solution.</p>
      <p>This de nition of i is not equivalence invariant and equisimulation closed.
In order to regain these properties, we need to normalize the pointed
pseudointerpretations before evaluating i. We say that an pseudo-interpretation J is
in normal form if there are no elements a; b; c 2 J with f(a; b); (a; c)g 2 rJ
and (J ; b) . (J ; c), i.e., no node has two successor nodes for the same role name
that are in a simulation relation.</p>
      <p>Any pseudo-interpretation J can be transformed into normal form as follows:
1. remove all edges (a; b) 2 rJ from J , for which there exists an edge (a; c) 2 rJ
such that (J ; b) . (J ; c) but not (J ; b) ' (J ; c);
2. for all edges (a; b0) 2 rJ , check if there are other edges (a; bi) 2 rJ , i &gt; 0,
with (J ; b0) ' (J ; bi) and choose one representative bj ; then remove all
other edges (a; bi), i 6= j, from rJ .</p>
      <p>Finally, we can de ne the CSM c for concept descriptions w.r.t. an EL++
KB K as follows:</p>
      <p>C</p>
      <p>c D = (JC0 ;K; d[C]) i (J D0;K; d[D]);
where JC0 ;K and J D0;K are the normalized canonical models of C and D w.r.t. K.
If C or D are equivalent to ?, they do not have a canonical model. In this case,
we set C c ? = ? c D = 0 for C; D 6 K ?. c has all of the properties given
in De nition 1:
Theorem 4. The CSM c is symmetric, bounded, dissimilar closed, equivalence
invariant, and equivalence closed.</p>
      <p>
        Relaxed instance queries w.r.t.
c
We established in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that in order to compute the maximal similarity between a
query concept C and all concepts that an individual a is instance of, it is enough
to check all generalized concepts of the msc of a, or in terms of i: It is enough to
compute the maximal similarity (JC0 ;K; d[C]) i q for all pointed interpretations
(J&gt;;K; d[fag]) . q. This can also be achieved by using (J&gt;;K; d[fag]) directly
in the computation of the i and allowing to generalize this pointed
pseudointerpretations, i.e. nding the best subsets of fC , fR, and fI , and taking the
best set of predicates that follow from fF .
      </p>
      <p>Since fC , fR and fI are nite, nding the best subsets is always possible by
checking all of them. However, there can be in nitely many predicate sets
following from fF . Note that in order to maximize simconc(p; q), generalizing q can
always increase the left part of simconc(p; q), infuj=conj(p) supvj=conj(q) simD(u; v),
to a value of 1, by simply taking the empty set of predicates (which has all
valuations as model), but it can never increase the right part. Thus, the
maximal value for simconc(p; q) that can be achieved by generalizing q is simply
infuj=conj(q) supvj=conj(p) simD(u; v).</p>
      <p>Procedure: maxsim (J1; J2; prim; D; g; w; c)
Input: J1; J2: nite pseudo-interpretations; prim: primitive measure; D: similarity
measure on D; g: weighting function; w 2 (0; 1): discount factor; c &gt; 0: concrete
domain factor
Output: maximal similarities between p = (J1; a) and all generalizations of q = (J2; b)
J2
1: msim0(d; e) 0 for all d 2 J1 and e 2
2: for i 1; 2; 3; : : : do
3: for all d 2 J1 and e 2 J2 do
4: msimi(d; e) max max similarity(pC ; pS; pI ; d; e; i)
SSCSNC CSCN((ee)) ppCS CSCN((dd)) SSSCCN</p>
      <p>SIN IN(e) pI IN(d) SIN
5: end for
6: end for
1: compute canonical models IQ;T and IK
2: maxsim(d; e) maxsim(JQ;K; J&gt;;K; prim; D; g; w; c)
3: return fa 2 NI \ Sig(K) j maxsim(d[Q]; d[fag]) &gt; tg
Fig. 2. Algorithm to compute all relaxed instances of a query concept Q w.r.t. a
knowledge base K and threshold t.</p>
      <p>The algorithm to compute the maximal similarities between elements of
a pseudo-interpretation J1 and all generalizations of elements of a
pseudointerpretation J2 is shown in Figure 1. Using this, the algorithm to actually
compute all relaxed instances of a query concept Q w.r.t. c and an EL++-KB
K is conceptually quite easy, as it only needs to compute the maximal
similarities between Q and the individuals a 2 K and check whether they are larger
than t. The algorithm is depicted in Figure 2.</p>
      <p>The maxsimi values computed in the algorithm monotonically converge from
below to the maximal similarities between generalized concepts of the
individuals and the query concept. Thus, the algorithm is sound and complete in the
following sense:
Theorem 5. Let c be the CSM derived from i with the primitive measure
prim, concrete measure D and factor c, weighting function g and discounting
factor w. Then the algorithm relaxed-instances is sound and complete:
1. Soundness: If a 2 relaxed-instances(Q; K; t; prim; D; g; w; c) for a number
n of iterations, then a 2 Relaxt c (Q).
2. Completeness: If a 2 Relaxt c (Q), then there exists an i 2 N such that for
n i iterations it holds that a 2 relaxed-instances(Q; K; t; prim; D; g; w; c).</p>
      <p>Note the the number of of iterations i needed in the completeness part of
Theorem 5 is not bounded. However, since the algorithm converges quite fast,
this should not be a problem in most practical applications.
6</p>
      <p>
        Conclusions
In this paper we extended the concepts similarity measure for general TBoxes
introduced in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to the DL EL++. Since concrete domains do not allow do
de ne canonical models for standard interpretations in EL++, we de ned
pseudointerpretations, which correspond to a set of standard interpretations. This is
used to de ne a similarity measure on pointed pseudo-interpretations, which
is extended to concept descriptions w.r.t. a KB via the canonical models. We
use the proposed CSM for relaxed instance querying of EL++ KBs and give an
algorithm that computes all relaxed instances.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel-Schneider</surname>
          </string-name>
          , P.F., eds.: The Description Logic Handbook: Theory, Implementation, and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          . Cambridge University Press, New York, NY, USA (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brandt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Pushing the EL envelope</article-title>
          .
          <source>In: Proceedings of the Nineteenth International Joint Conference on Arti cial Intelligence IJCAI-05</source>
          , Edinburgh, UK, Morgan-Kaufmann Publishers (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brandt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Pushing the el envelope further</article-title>
          . In Clark,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Patel-Schneider</surname>
          </string-name>
          , P.F., eds.:
          <source>In Proceedings of the OWLED 2008 DC Workshop on OWL: Experiences and Directions</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ecke</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Pen~aloza, R.,
          <string-name>
            <surname>Turhan</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          :
          <article-title>Towards instance query answering for concepts relaxed by similarity measures</article-title>
          . In Godo, L.,
          <string-name>
            <surname>Prade</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qi</surname>
          </string-name>
          , G., eds.: Workshop on Weighted Logics for
          <article-title>AI (in conjunction with</article-title>
          <source>IJCAI'13)</source>
          , Beijing, China (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ecke</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Pen~aloza, R.,
          <string-name>
            <surname>Turhan</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          :
          <article-title>Answering instance queries relaxed by concept similarity</article-title>
          .
          <source>In: Proceedings of the Fourteenth International Conference on Principles of Knowledge Representation and Reasoning (KR'14)</source>
          , Vienna, Austria, AAAI Press (
          <year>2014</year>
          ) To appear.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turhan</surname>
          </string-name>
          , A.Y.:
          <article-title>A framework for semantic-based similarity measures for ELH-concepts</article-title>
          .
          <source>In del Cerro</source>
          ,
          <string-name>
            <given-names>L.F.</given-names>
            ,
            <surname>Herzig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Mengin</surname>
          </string-name>
          , J., eds.
          <source>: Proc. of the 13th European Conf. on Logics in A.I. (JELIA 2012). Lecture Notes In Arti cial Intelligence</source>
          , Springer (
          <year>2012</year>
          )
          <volume>307</volume>
          {
          <fpage>319</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Suntisrivaraporn</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A similarity measure for the description logic EL with unfoldable terminologies</article-title>
          .
          <source>In: 5th International Conference on Intelligent Networking and Collaborative Systems (INCoS)</source>
          . (
          <year>2013</year>
          )
          <volume>408</volume>
          {
          <fpage>413</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Deciding inseparability and conservative extensions in the description logic EL</article-title>
          .
          <source>Journal of Symbolic Computation</source>
          <volume>45</volume>
          (
          <issue>2</issue>
          ) (
          <year>2010</year>
          )
          <volume>194</volume>
          {
          <fpage>228</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>