Anytime Classification by Ontology
                       Approximation

    S.Schlobach, E.Blaauw, M.El Kebir, A.ten Teije, F.van Harmelen, S.Bortoli,
     M.Hobbelman, K.Millian, Y.Ren, S.Stam,, P.Thomassen, R.van het Schip,
                                W.van Willigem

                              Vrije Universiteit Amsterdam
                              contact:schlobac@few.vu.nl

     .


          Abstract. Reasoning with large or complex ontologies is one of the
          bottle-necks of the Semantic Web. In this paper we present an anytime
          algorithm for classification based on approximate subsumption. We give
          the formal definitions for approximate subsumption, and show its mono-
          tonicity and soundness; we show how it can be computed in terms of
          classical subsumption; and we study the computational behaviour of the
          algorithm on a set of realistic benchmarks. The most interesting finding
          is that anytime classification works best on ontologies where classical
          subsumption is hardest to compute.


1        Introduction

Reasoning with large or complex terminology is computationally difficult and
is one of the known bottlenecks in application of ontologies on the Semantic
Web. It is known that for expressive ontology languages such as OWL, most
reasoning services are untractable, i.e. already for very small ontologies sound
and complete reasoning is practically infeasible. Also for some applications it is
more important to have some answers quickly, rather than all answers after a
long period of waiting. In both cases, approximate reasoning can be useful, in
particular when algorithms are monotonic: when with increasing time the quality
of the output increases.
    Cadoli and Schaerf proposed a formal method of approximation of propo-
sitional entailment in their seminal paper [1], which also includes some ideas
on approximation of Description Logics reasoning. Stuckenschmidt [2] recently
expanded these ideas to ALCQ concept subsumption, and provided a simple
algorithm so that approximate subsumption can be calculated using standard
DL reasoner. The idea is, basically, to ignore a subset of the atomic concepts
in the query (i.e. interpret them as trivially fulfilled, namely as always true), or
alternatively to interpret the literals for these atoms as always false.
    In this paper we will extend this idea to approximate terminological subsump-
tion, i.e. the question whether two concepts C and D subsume each other in the
presence of a terminology T . Instead of the approaches proposed in [2, 3] which
answers an approximated query Q0 based on an ontology O (i.e: O ` Q0 ?) we
approximate the ontology O rather than the query Q (i.e: O0 ` Q?). As rea-
soning with ontologies is computationally much harder than purely conceptual
reasoning, the need for approximation seems even more urgent in this case. The
proposed method (more details later) is sound, but incomplete. We also show
that the sequence of approximations is monotonic, i.e. that with decreasing set
of ignored concepts, none of the logical consequences is lost. This result is the
basis for an anytime algorithm for ontology reasoning.
      The variable factor in the anytime algorithm is then the choice of the set of
concepts that remain intact, in Cadoli and Schaerf’s work called the set S. For
an anytime algorithm we need to produce a sequence S of subsets ∅ ⊆ S0 ⊆
. . . Sn ⊆ S, where S is the set of all atoms in the TBox and query, such that
most of the consequences are found as early as possible. Examples for possible
choices could be purely random orders, frequency or connectedness, but also the
use of other ontologies as background knowledge is conceivable.
      In this paper we evaluate the quality of such an anytime algorithm for the
problem of classification of atomic concept names, comparing both the result of
the classification hierarchy calculated for an approximated terminology, as well
as the runtime needed for classification.
      The main contribution of this paper is twofold: first, we extend known ideas
about approximate reasoning in Description Logics to terminological reasoning,
by approximating the ontology, rather than the query, and secondly, we show in
a number of empirical experiments on a set of real-world ontologies that these
approximations can be useful in anytime classification.
      In Section 2 we introduce the theory of anytime classification based on on-
tology approximation. Section 3 describes our experiments, before we conclude
in Section 4.


2     Approximating terminological reasoning

In their seminal paper [1] Cadoli and Schaerf introduced approximate reasoning
for Propositional as well as Description Logics, and our approach is a rather sim-
ple combination of their two results. The main idea is to treat only a particular
subset S of the vocabulary1 in the classical way, and either ignore the rest, or
interpret the remaining concepts as necessarily empty. The hope is that such an
S-approximated ontology reasoning becomes computationally easier, but where
still some of the logical consequences remain valid. In existing work on approx-
imation of DL reasoning, usually the query is approximated. In our approach,
we approximate the terminology, i.e. we consider a different set of models for
a TBox, and calculate logical consequences for these models in the standard
way. We will first recall the basic notions of DL reasoning before outlining our
proposed methods in all detail.
1
    In this paper we will only consider concept names, but extension to relations is
    conceptually straightforward.


                                          2
2.1     Classical subsumption
For Description Logics the handbook [4] is an excellent reference. In the classical
definition of a model of a terminology (TBox) in Description Logics, U is a set of
objects, called the universe, and (·)I a mapping, which interprets DL concepts
as subsets of U. An interpretation I = (U, (·)I ) is then called a model of a
terminological axiom C v D, if, and only if, C I ⊆ DI . A model for a TBox T is
an interpretation which is a model for all axioms in T . Based on these semantics
the notion of subsumption with respect to T is defined formally: a concept D
is subsumed by a concept C in a terminology T (we write T |= C v D) if, and
only if, C I ⊆ DI for all models I of T .
    This definition is based on the classical interpretation of a concept, for ex-
ample, in ALC, which is a simple yet relatively expressive DL with conjunction
(C uD), disjunction (C tD), negation (¬C) and universal (∀r.C) and existential
quantification (∃r.C).2
    The interpretation function interprets an atomic concept A as a subset AI
of the domain, and is extended to the different language constructs as follows:
    • (C u D)I = C I ∩ DI
    • (C t D)I = C I ∪ DI
    • (¬C)I = U \ C I
    • (∃R.C)I = {d ∈ U | ∃e ∈ U : (d, e) ∈ RI and e ∈ C I }
    • (∀R.C)I = {d ∈ U | ∀e ∈ U : (d, e) ∈ RI implies e ∈ C I }

2.2     Approximate terminological subsumption
The idea of approximate terminological reasoning is now based on interpreting
an ontology in a non-standard way. In particular, the idea is to look at particular
subsets and supersets of interpretations called lower and upper approximations
of an interpretation. An approximate model is then an interpretation which
interprets any axiom in an approximate way. Following the ideas of Cadoli &
Schaerf [1] we restrict the vocabulary that is taken into account in the definition
of an interpretation to a set usually denoted by the letter S. We will call S the
approximation set. In the lower approximation, those atoms not in S are usually
interpreted as empty sets, in the upper one as the entire domain.
    The definition of lower and upper S-approximation is a simple reformulation
for ALC of the respective definition of Cadoli and Schaerf in [1]:
Definition 1 (Lower and Upper Approximation). The lower approxima-
         −                                 +
tion (.)IS and the upper approximation (.)IS are defined as follows:
 – for atomic concepts A,
                              −       +
                           AIS = AIS = AI  if A ∈ S
                             −           +
                            IS          IS
                           A = ∅ and A = U if A 6∈ S
2
    Most definitions can be extended to more expressive languages, but throughout the
    paper we will work with ALC for the sake of simplicity.


                                          3
 – for the Boolean operators:
                  −               +                                          +           −
         (¬C)IS = U \ C IS                                           (¬C)IS = U \ C IS
              −      −     −                                                +      +     +
      (C u D)IS = C IS ∩ DIS                                        (C u D)IS = C IS ∩ DIS
              −      −     −                                                +      +     +
      (C t D)IS = C IS ∪ DIS                                        (C t D)IS = C IS ∪ DIS

 – for the quantifiers:
                      −                                                              −
             (∃R.C)IS = {d ∈ U | ∃e ∈ U : (d, e) ∈ RI and e ∈ C IS }
                    −                                                −
             (∀R.C)IS = {d ∈ U | ∀e ∈ U : (d, e) ∈ RI implies e ∈ C IS }
                    +                                            +
             (∃R.C)IS = {d ∈ U | ∃e ∈ U : (d, e) ∈ RI and e ∈ C IS }
                    +                                                +
             (∀R.C)IS = {d ∈ U | ∀e ∈ U : (d, e) ∈ RI implies e ∈ C IS }
                                                                                 −       +
   Stuckenschmidt [2] shows the property of monotonicity C IS ⊆ C I ⊆ C IS for
any subset S of the concept-names. Even stronger, the property of generalised
monotonicity holds for sub-vocabularies.

Lemma 1 (Generalised Monotonicity) Given a lower IS− and an upper S-
approximate interpretation IS+ , and two sub-vocabularies S1 ⊆ S2 , the following
equations hold for all concept expressions C.
                              −           −                    +         +
                      1) C IS1 ⊆ C IS2                    2) C IS2 ⊆ C IS1

Proof. Our version of the property of generalised monotonicity differs from the
one published in [2]. Therefore, we provide the general idea of the proof, which
is by induction over the structure of C.

1. Let C be an atom A. Then there are three cases, in all of which condition
   1) is true. This is because
                                      −               −
   (a) let A ∈ S1 . Then AIS1 = AI =IS2 , as A must also be in S2 because of
       S1 ⊆ S2 .
                                         −               −
   (b) let A ∈ S2 , but A 6∈ S1 . Then AIS1 = U ⊆ AI = AIS2 .
                                              −                 −
   (c) A 6∈ S2 and A 6∈ S1 . Then AIS1 = U = AIS2 .
   Similarly for condition 2) we have
                                      +               +
   (a) let A ∈ S1 . Then AIS1 = AI =IS2 , as A must also be in S2 because of
       S1 ⊆ S2 .
                                         +               +
   (b) let A ∈ S2 , but A 6∈ S1 . Then AIS1 = ∅ ⊇ AI = AIS2 .
                                              +                 +
   (c) A 6∈ S2 and A 6∈ S1 . Then AIS1 = ∅ = AIS2 .
                                                                                         −
2. Let C = ¬C1 be a negation. Then, by induction hypothesis, (C1 )IS1 ⊆
         −            +           +                                              −           +
   (C1 )IS2 and C IS2 ⊆ C IS1 . It follows by definition that C IS1 = U \(C1 )IS1 ⊆
              +           −
   U \ (C1 )IS2 = C IS2 . The argument for condition 2 is dual.

The remaining cases are simple, and follow the same principle.

                                                  4
    Intuitively Lemma 1 states that with increasing size of the approximation
set S the size of the interpretation of concepts increases for IS− and decreases
for IS+ .
    Based on these approximations of an interpretation we can now define the
notion of an approximate model for a terminology. The basic intuition is that we
slightly release the constraints on the terminology by forcing the left-hand sides
(lhs) of axioms to be more specific, and the right-hand side (rhs) of the axioms to
be more general, than in the classical case. This can be achieved by considering
also interpretations to be models of axioms, in which the lower approximation
of the lhs is a subset of the upper approximation of the rhs.
Definition 2 (S-approximate models). Let C and D be Description Logic
concepts, and T a TBox. An interpretation I is an S-approximate model of
                                       −        +
an axiom C v D if, and only if, C IS v DIS . An interpretation I is an S-
approximate model of the TBox T if, and only if, it is an S-approximate model
of all axioms ax ∈ T . In this case we write I |=S T .
   Approximations preserve models, i.e. if an interpretation is a model for an
axiom or a TBox, it is also an S-approximate model for any subset S of the
vocabulary. This is formalised in the following lemma.
Lemma 2 If I is a (classical) model for an axioms ax, it is an S-approximate
model for ax for any S. Similarly, if I is a classical model for a TBox T , it is
an S-approximate model for T .
   Finally we need to consider monotonicity for increasing size of set S to be
able to apply an anytime algorithm to approximate classification.
Lemma 3 Let S1 ⊆ S2 , T be a TBox. Then, I |=S2 T implies that I |=S1 T .
    Note that this lemma shows inverse monotonicity for models, i.e. with in-
creasing size of S there are fewer models for any TBox.
    Approximate subsumption is now defined in the classical way as a subset
relation between interpreted concepts, but now we only take into account the
approximate models of a TBox as previously defined (rather than the classical
models).
Definition 3 (Approximate subsumption). Let T be a TBox. We say that a
concepts D S-approximately subsumes a concept C w.r.t. a TBox T (abbreviated
T |=S C v D) if, and only if, C I ⊆ DI for all S-approximate models I of T .
    Note that this definition is crucially different to previous approaches to
approximated subsumption: e.g, in [2], the subset relation is interpreted non-
classically; in our case, we consider a different set of models.3
    From this definition and the monotonicity of terminological reasoning mono-
tonicity for approximated subsumption follows.
3
    Of course, combinations of both approaches are conceivable. As in this paper, we
    only consider the taxonomy between atomic concepts, approximation of the query
    is necessarily trivial.


                                          5
Theorem 1 (Monotonocity). Let S1 ⊆ S2 and T be a TBox, and C and D
DL concepts. We then have that T |=S1 C v D implies that T |=S2 C v D.

   The theorem follows immediately from Lemma 3. This theorem also imme-
diately implies the soundness of approximate subsumption (simply take in the
theorem for S2 the set of all atoms in T ):
Corrolary 1 (Soundness) For any Si , T |=Si C v D implies T |= C v D
    A note of caution seems in order: in Theorem 1 we only consider approx-
imations of the terminology, and the query is interpreted classically. We can
also approximate the queries, but in this paper we restrict our attention to
non-approximate queries as, in this paper, we are interested in approximate
classification of concept names only.

2.3   Calculating approximate subsumption
Now, what does this all mean? Based on theorem 1 we can calculate an approxi-
mation of the classical subsumption hierarchy (classification) in an incremental,
i.e. anytime, way. The idea is simple, starting with a small S we can calculate
the classification step by step by approximating the classical interpretation of
T . Theorem 1 guaranties that any subsumption relation we find, is also valid in
the un-approximated TBox. As we will show that we can effectively reduce S-
approximate subsumption to classical reasoning (thus being able to use standard
reasoner), and as S-approximations are easier (and thus faster) to calculate, we
have developed a new way to iteratively create the classical subsumption hierar-
chy, which is guarantied to find the correct solution at the end of the iteration.
     What remains to show is that there is an efficient way to reduce S approx-
imated subsumption (and inner S approximated subsumption) to classical rea-
soning. The basic idea follows again the proposal of Stuckenschmidt in [2], and
is a simple recursive rewriting procedure in which the atoms outside S are re-
                                                                      S          S
placed by either ⊥ or >. Stuckenschmidt defines two procedures (·)+ and (·)−
in a recursive way, where the left-hand side of an equation is rewritten by the
right-hand side. A corrected version is given in the (otherwise unchanged from
[2]) definition:
                                                                      S         S
Definition 4 (Rewrite Procedure). The rewriting procedures (·)+ and (·)−
are defined as follows:
                  S                              S
               A− = AI                       A+ = AI        if A ∈ S
                 S                             S
               A− = ⊥                        A+ = >        if A 6∈ S
                 S       S                     S       S
            (¬C)− = ¬C +                  (¬C)+ = ¬C −
                 S     S     S                 S     S      S
         (C u D)− = C − u D−           (C u D)+ = C + u D+
                 S     S     S                 S     S      S
         (C t D)− = C − t D−           (C t D)+ = C + t D+
                 S         S                   S         S
          (∃R.C)− = ∃R.C −              (∃R.C)+ = ∃R.C +
                 S         S                   S         S
          (∀R.C)− = ∀R.C −              (∀R.C)+ = ∀R.C +


                                        6
    It is easy to see that both procedures terminate on any concept C in ALC as
the complexity of the formula decreases whenever a rule is applied. If in a TBox
                                                 S       S
T any axiom C v D has been rewritten as C − v D+ we will denote the re-
                    S                   S
sulting TBox as T . This new TBox T can be used to calculate S-approximate
subsumption in terms of classical entailment, as the following theorem shows.
Theorem 2. Let T be a TBox. T |=S C v D if, and only if, T S |= C v D.
    Theorem 2 provides the basis for an anytime algorithm for calculating the
concept hierarchy of a DL terminology: in each step of this algorithm we add a
subset of the vocabulary to get a new approximation set S, and calculate the
hierarchy according to the partial order T |=S C v D. Following Theorems 1
and 2 for increasing S this hierarchy is always a sub-hierarchy of the original
classification of increasing quality.

3     Experiments
In the previous section we have introduced the theoretical foundation of an any-
time algorithm for concept classification based on approximate terminological
subsumption. However, we have not yet discussed how to determine the approx-
imation set, and have yet to show (or disprove) that our method indeed helps
to reason with large and expressive ontologies. To answer these questions, we
have run experiments to study different choices of approximation sets, and their
effect on accuracy of classification, and the respective runtime.

3.1   Data
For our experiments we used the following well known ontologies: DICE, a medi-
cal terminology used for registering diagnoses in Intensive Care units4 , MGED5 ,
an ontology for microarray experiments, UNSPSC6 , an ontology version of a
coding system to classify both products and services, FMA7 , the foundational
model of anatomy, and the pizza ontology8 .
    All ontologies were simplified from their original OWL versions into corre-
sponding ALC versions. This of course causes some loss of information, since
ALC does not allow transitivity or symmetry of roles, no qualified cardinality
constraints, no data-valued roles, among other things. Notwithstanding this loss
of information, we believe that our experiments still give a good insight into the
properties of approximation. Extending the approach to full OWL is planned for
future research.
    Table 3.1 summarises some properties of these ontologies, e.g. their respective
classification time with RACER Version 1.7.24 as well as the number of occur-
rences of operators. It also contains information about three ontologies Kpoly5,
4
  http://kik.amc.uva.nl/dice/home.jsp
5
  http://mged.sourceforge.net/ontologies/index.php
6
  http://www.unspsc.org/
7
  http://sig.biostr.washington.edu/projects/fm/AboutFM.html
8
  www.co-ode.org/ontologies/pizza


                                        7
                   classification time (sec) #axioms #∀       #∃ #u #t #¬
           DICE                       60          4859 3734 5606 1951 784 0
          MGED                         0.2         792    0 171 12        5 12
         UNSPSC                        7         19590    0     0    0    0 0
           FMA                        50          3824 9348 17280 2654 2527 0
          PIZZA                        0.2        1052 23 148 796 26 796
          Kpoly5                       4           317    0 202 114       0 163
          Kt4p13                       5           410    0 289 120       0 224
          Kphp5                        8           242    0    62 179     0 213

           Table 1. Some properties of the ontologies used in our experiments


Kt4p13 and Kphp5 taken from the DL benchmark9 . They have been chosen to
investigate particular properties of our method as will be discussed later.

3.2     Experimental setup
What remains to be decided is the choice of an iterative method for building an
increasing sequence of approximation sets, and a way of evaluating the results.

Choosing approximation sets We implemented three strategies for choosing new
concepts to be added to the approximations set: RANDOM (Choose concepts
randomly). MORE (add the most occurring first) or LESS (add the least occur-
ring concepts first). There are of course sophisticated strategies, which we will
evaluate in future research.
    There are several options for choosing concept names to be added to the
approximation set in each step of the anytime algorithm. In a relatively arbitrary
ad-hoc decision we decided to increase the subset size after each round by 10%.

Measuring Performance To study the effect of our approximation method, and of
the choice of the approximation set, we need appropriate performance measures.
When evaluating an approximation algorithm, there are usually two interesting
measures: the speed of the algorithm, and the recall of the approximation.10
     In our experiments we only consider the runtime of the approximate clas-
sification, i.e. the determination of the taxonomy using the reduced T-Box. If
we take reasonably sized ontologies for evaluation, we can calculate the “cor-
rect” classification, and can calculate the portion of found and not-yet-found
consequences for each set Si (0 ≤ i ≤ n). Based on this measure of partial com-
pleteness we can evaluate our choice of sequence S. When the correct taxonomy
(a.k.a golden standard) is known a priori we can rate the approximated one
against it.
 9
     http://dl.kr.org/dl98/comparison/data.html
10
     Note that Theorem 1 ensures that all the found subsumption relations for the ap-
     proximation must hold in the gold-standard as well. We thus always have 100%
     precision, and the only interesting measure is the recall.


                                             8
   We will do this by determining all the subsumption relations in the approx-
imated taxonomy and also in the golden standard. Next, we will express the
number of correct subsumption relations in the approximated taxonomy as a
percentage of the number of subsumption relations in the golden standard. In
Figure 1 we consider as an example three taxonomies, one is the golden standard,
and the other two are different approximations.


                      A                     A                      A
                     / \                   / \                    /
                    B C                   B C                    B C
                      / \                   /                      / \
                     D    E                D   E                  D    E

                Gold Standard       Approximation 1       Approximation 2


                         Fig. 1. Example of the recall measure


    In the Gold Standard, the following 6 relations hold: B v A, C v A, D v
A, E v A, D v C, E v C. All of these are also captured in Approximation
1, with the exception of E v C and E v A. Approximation 2 however only
captures 3 relations (B v A, D v C, E v C). This gives Approximation 1 a
recall of 4/6=66%, and Approximation 2 a recall of only 3/6=50%.
    This performance measure counts the number of subsumption queries that
can be answered correctly using the approximated ontology. The intuition behind
this metric is that subsumptions higher in the hierarchy are considered more
important, since they are involved in establishing more pairwise subsumptions.
It puts a small penalty on mistakes in the details lower down in the ontology,
and a high penalty on mistakes in the important top-level categorisations.


3.3     Results

In this section, we will answer the following questions:
 1. Under which circumstances is anytime classification based on approximate
    terminological subsumption effective to deal with large and complex ontolo-
    gies.
 2. Does any of the strategies outperform the others, and if so, under which
    circumstances.
    As a measure of effectiveness, we consider the relation between (gain in) run-
time and (loss of) completeness. For each of the ontologies described in Section
3.1 we run our experiments 10 times to adjust for statistical outliers in runtime.11
We then calculate for both runtime and recall the ratio between the value for
11
     Note that, for consistency reasons, we run the experiments for the RANDOM strat-
     egy 10 times on the same random sample.


                                           9
the gold-standard and the approximation. The desired outcome is a high recall
and low runtime in the early stages of the algorithms, in other words a convex
performance profile. We will see that this is often, but not always, achieved.
    We will now answer our research questions systematically. We run all our
experiments on the ontologies described in 3.1 and calculated recall and runtime
depending on the size of the approximation sets. In all graphics we show runtime
and recall in percentage of the gold-standard on the y-axis depending on the
size (in percentage of the full vocabulary) of the approximation set on the x-
axis. Furthermore, we calculated cumulative runtime for a set S, which is the
sum of the run-times for classifying w.r.t. S and all smaller approximation sets
previously considered. This measure describes the real gain one makes with an
anytime algorithm: as long as the cumulative runtime remains below 100% we
gained w.r.t. classical classification. The desired outcome is thus a curve that
crosses the 100% mark as late as possible.


                   LESS strategy                                   MORE strategy
   100                                              100
                             runtime                                         runtime
    80             runtime cumulative               80             runtime cumulative
    60                         recall               60                         recall

    40                                              40
    20                                              20
     0                                               0
         0   20       40     60         80   100          0   20      40     60         80   100

                  RANDOM strategy
   100
                             runtime
    80             runtime cumulative
    60                         recall

    40
    20
     0
         0   20       40     60         80   100


                                   Fig. 2. Results for DICE


Anytime classification benefits some cases Figure 2 shows the results for
the three strategies on the DICE terminology, which confirm our hope that
anytime classification based on approximate terminological subsumption is a
useful reasoning method for large and complex ontologies. In two of the three
methods (MORE and RANDOM), the (relative) recall is significantly higher
than the (relative) runtime. Clear winner is the MORE strategy for which the
difference betwee recall and runtime is greatest.

                                               10
Anytime classification does not benefit all cases Unfortunately, not all
the results show the same positive properties: in Figure 3 even the best strategy,
MORE, never exhibits higher recall than runtime. Basically, this means that
relatively more time is lost than correct answers found.


                      LESS strategy                                    MORE strategy
      100                                              100
                                runtime                                          runtime
       80             runtime cumulative               80              runtime cumulative
       60                         recall               60                          recall

       40                                              40
       20                                              20
        0                                               0
            0   20       40     60         80   100          0    20      40     60         80   100

                     RANDOM strategy
      100
                                runtime
       80             runtime cumulative
       60                         recall

       40
       20
        0
            0   20       40     60         80   100


                                     Fig. 3. Results for UNSPSC


When does anytime classification benefit Given the difference in perfor-
mance on DICE and UNSPSC, it is now interesting to study in which cases
approximate subsumption provides gains over classical subsumption, and when
not. For this purpose we compared the differences between relative runtimes and
recalls for the remaining ontologies.
    The difference between the strategies is shown in the graphs of Figure 4.
These graphs only show the difference-plots between the percentages of recall
and runtime for the three strategies on the different test-cases. At at the y=0
line, the percentage-gain in runtime is off-set by an equally sized percentage-loss
in recall, hence no benefit is gained by approximate subsumption. Above this
line, the gain in runtime is greater than the loss in recall, hence approximate
subsumption creates a benefit, and vice versa below the y=0 line12 . For example,
in the first plot in Figure 4, for the succesfull DICE case, it is clearly visible that
the MORE strategy always beats classical subsumption because it lies above the
y=0 line almost everywhere.
12
     Note that for visual clarity, the 0% baseline (at which the gain in runtime equals the
     loss in recall) is shifted along the y-axis in the different plots.


                                                  11
DICE: Difference in Runtime and Recall depending on the size of S      UNSPCS: Difference in Runtime and Recall depending on the size of S
       60                                                                    10
       50                              MORE                                   0                                MORE
       40                               LESS                                -10                                 LESS
       30                            RANDOM                                 -20                              RANDOM
       20                                                                   -30
       10                                                                   -40
        0                                                                   -50
      -10                                                                   -60
      -20                                                                   -70
      -30                                                                   -80
      -40                                                                   -90
            0     20       40        60       80       100                           0    20      40        60        80       100

MGED: Difference in Runtime and Recall depending on the size of S     FMA: Difference in Runtime and Recall depending on the size of S
       15                                                                   25
       10                              MORE                                                                   MORE
        5                               LESS                                20                                 LESS
        0                            RANDOM                                 15                              RANDOM
       -5
      -10                                                                   10
      -15
      -20                                                                    5
      -25                                                                    0
      -30
      -35                                                                   -5
            0     20       40        60       80       100                       0       20      40        60        80        100

Pizza: Difference in Runtime and Recall depending on the size of S    Kphp5: Difference in Runtime and Recall depending on the size of S
       20                                                                   70
       10                              MORE                                 60                                MORE
                                        LESS                                                                   LESS
        0                            RANDOM                                 50                              RANDOM
      -10
                                                                            40
      -20
                                                                            30
      -30
      -40                                                                   20
      -50                                                                   10
      -60                                                                    0
            0     20       40        60       80       100                       0       20      40        60        80        100

kt4p13: Difference in Runtime and Recall depending on the size of S   kpoly5: Difference in Runtime and Recall depending on the size of S
       50                                                                   35
       40                              MORE                                 30                                MORE
       30                               LESS                                                                   LESS
       20                            RANDOM                                 25                              RANDOM
       10                                                                   20
        0
      -10                                                                   15
      -20                                                                   10
      -30
      -40                                                                    5
      -50                                                                    0
            0     20       40        60       80       100                       0       20      40        60        80        100


                                           Fig. 4. Comparison of strategies


                                                                 12
     The plots in Figure 4 show that anytime subsumption beats the classical
subsumption for the test-cases DICE, FMA, and the three DL benchmark cases
(KPHP5, KT4P13, KPOLY5). For these 5 cases (out of 8), there is at least one
strategy that lies above the y=0 trade-off line for nearly all iterations of the
algorithm. For the PIZZA case, one strategy (LESS) more or less hovers around
the trade-off line without every convincingly beating classical subsumption. For
the remaining cases (MGED, FMA), the costs of approximate subsumption are
consistently higher than the gains.
     The above findings can be explained by a look at the expressiveness of the
ontologies in Table 3.1: both DICE and FMA are expressive, and have com-
parably high classification time, whereas MGED and UNSPSC have almost no
expressiveness. The pizza ontology is a bit more expressive, but still easily clas-
sifiable. This seems to indicate that our method is most suitable for expressive
ontologies that are difficult to classify.
     This is of course very good news: Approximate subsumption works best in
exactly those cases when it is needed most, namely for those ontologies that are
expensive to classify with classical algorithms.
     Furthermore, it seems that absolute runtime costs are not the right indicator:
the classification time of UNSPSC in table 3.1 is comparable to those of the
three DL-benchmark ontologies. Still, UNSPSC should be considered “easy” in
classical terms, because it contains two orders of more axioms than the DL-
benchmark ontologies.


Which strategy works best? What remains to be investigated is which strat-
egy performs best. When studying the plots in Figure 4 for those 5 out of 8 cases
where approximate subsumption is beneficial (DICE, FMA, KPOLY5, KT4P13,
KPHP5), we see that the MORE strategy always achieves the highest gains dur-
ing almost all iterations of the anytime algorithm for DICE, FMA and KT4P13,
while this is not the case for KPHP5 (LESS) and KPOLY5 (RANDOM). Some
clarification can be gained when taking some more information about the ontolo-
gies into account, namely the distribution of atomic concepts over the axioms.


                         Ontology freq # freq # freq #
                          Kpoly5    7 7 6 8 5 1
                          Kt4p13   104 1 11 1 5 1
                          Kphp5     6 30 2 240 1 1
                          DICE     511 1 461 1 272 1
                          MGED     35 1 34 1 27 1
                         UNSPCS    53 1 49 1 45 1
                           FMA     756 1 400 1 394 1
                          PIZZA    72 1 54 1 53 5

             Table 2. Distribution of atomic concepts over the axioms


                                        13
    Table 2 summarises the frequency of the most occurring atomic concepts.
For example, for DICE, the single most frequent concept name occurs 511 times,
whereas for Kphp5 there are 30 atoms occurring 30 times each. Only the three
highest frequencies and the number of concepts with those frequencies are shown.
    Now, a clear pattern can be recognised: for those ontologies, for which there
are atoms occurring significantly more often then others, the MORE strategy has
the best performance (DICE, FMA, KT4P13, UNSPCS), For the other ontologies
the results are less conclusive: though one would guess that the equal distribution
for KPOLY5 lends itself for a RANDOM approach. Why for PIZZA and Kphp5
the LESS strategy is best is surprising; apparently, there must be a rare concept
with high influence on the classification.
    The conclusion seems to be that the MORE strategy works best when the fre-
quency table for concepts is skewed (i.e. an uneven distribution of the frequency
of occurrence of concepts in the axioms).


4   Conclusion

In this paper we have presented a method for approximate subsumption based
on ignoring a selected part of the set of concepts in an ontology. By incremen-
tally increasing this set, we can construct an anytime algorithm for approximate
subsumption, which yields sound but possibly incomplete answers w.r.t. classical
subsumption queries. The behaviour of this approximate algorithm is dependent
on the strategy for selecting the subset of concepts taken into account by the
approximation.
    Based on the formal definitions we have shown how this approximate sub-
sumption on an ontology can be computed in terms of a classical subsumption
check on a rewritten version of the ontology.
    In the second half of the paper, we have applied this approximate subsump-
tion method to a set of 8 realistich benchmark ontologies, comparing its per-
formance against classical subsumption in terms of runtime and recall (since
the approximation is sound, precision is always guaranteed). For each of these 8
benchmark ontologies we have applied three different strategies for selecting the
approximated vocabulary, based on the occurrence frequency of the concepts in
the axioms (most frequent first, least frequent first, random).
    Our experiments show that the gain in runtime outweights the loss in recall
on 5 out of 8 cases. Furthermore, the gain is larger for ontologies where classical
subsumption queries are expensive. Hence, our approximation algorithm works
best when it is most needed, namely on complex ontologies. Our experiments also
show that of the three strategies we considered, the most-frequent-first strategy
performed best, in partical on those ontologies with a skewed occurrence fre-
quency of the concepts in the axions.
    Important issues for future work are to extend our approximation to deal
with OWL ontologies (instead of being restricted to ALC) and the study of
other heuristics for influencing the behaviour of the approximation, in particular
heuristics that take into account the structure and complexity of the ontology.

                                        14
References
1. Schaerf, M., Cadoli, M.: Tractable reasoning via approximation. Artificial Intelli-
   gence 74 (1995) 249–310
2. Stuckenschmidt, H.: Partial matchmaking using approximate subsumption. In:
   Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI-07).
   (2007)
3. Groot, P., Stuckenschmidt, H., Wache, H.: Approximating description logic classi-
   fication for semantic web reasoning. In Gómez-Pérez, A., Euzenat, J., eds.: ESWC.
   Volume 3532 of Lecture Notes in Computer Science., Springer (2005) 318–332
4. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F., eds.:
   The Description Logic Handbook: Theory, Implementation, and Applications. Cam-
   bridge University Press (2003)


                                          15