=Paper=
{{Paper
|id=Vol-350/paper-4
|storemode=property
|title=Reasoning with Large A-Boxes in Fuzzy Description Logics using DL reasoners: An experimental evaluation
|pdfUrl=https://ceur-ws.org/Vol-350/paper4.pdf
|volume=Vol-350
}}
==Reasoning with Large A-Boxes in Fuzzy Description Logics using DL reasoners: An experimental evaluation==
<pdf width="1500px">https://ceur-ws.org/Vol-350/paper4.pdf</pdf>
<pre>
      Reasoning with Large A-Boxes in Fuzzy
     Description Logics using DL reasoners: An
              experimental evaluation

P. Cimiano1 , P. Haase1 , Q. Ji1 , T. Mailis2 , G. Stamou2 , G. Stoilos2 , T. Tran1 ,
                                   V. Tzouvaras2
                 1
                     Institute AIFB, Universität Karlsruhe, Germany
                     {pci,pha,qiji,dtr}@aifb.uni-karlsruhe.de
                  2
                    National and Technical University of Athens
              {theofilos,gstam,gstoil,tzouvaras}@image.ntua.gr


      Abstract. While knowledge representation languages developed in the
      context of the Semantic Web build mainly on crisp knowledge, fuzzy
      reasoning seems to be needed for many applications, e.g. for retrieval
      of images or other resources. However, techniques for fuzzy reasoning
      need to scale to large A-Boxes for practical applications. As a first step
      towards clarifying whether fuzzy reasoning techniques can indeed scale,
      we examine a specific point in space. Earlier research has shown that
      fuzzy description logics can be transformed to crisp description logics
      in a satisfiability-preserving fashion. This thus opens the possibility of
      using standard description logic reasoners for reasoning with fuzzy OWL
      ontologies. As the transformation produces a quadratic blow-up of the
      T-Box, a crucial question is if such an approach is feasible from a perfor-
      mance point of view. We provide an empirical evaluation on four different
      ontologies of varying complexity and with generated A-Boxes of up to a
      million individuals. To our knowledge, we thus provide the first system-
      atic and empirical evaluation of a fuzzy reasoning approach based on a
      reduction to standard description logics.


1   Introduction

Current knowledge representation formalisms for the Semantic Web such as
OWL or RDF(S) rely on crisp logics. However, it has become clear that the
assumption that all knowledge is crisp is unsustainable. There are various rea-
sons for this. First of all, the large amount of knowledge needed to bootstrap the
semantic web can neither be acquired by hand nor by simply importing legacy
data. Thus, knowledge acquisition techniques are needed. However, knowledge
acquisition techniques building on machine learning or natural language pro-
cessing techniques are inherently error-prone. Consequently, the need arises to
represent the certainty or confidence of the extracted information. Further, for
the purpose of retrieval, e.g. of images or other data, we need ways to assess
the degree to which a certain answer matches the information needs of a query
(see [1]). Restricting ourselves to the crisp case would yield very brittle systems
which only return answers in case information is perfectly modeled and perfectly
matches the user query. Finally, there are concepts which are inherently fuzzy
and which can not be represented in a crisp way. These are concepts represent-
ing linguistic variables in the sense of Zadeh [2], e.g. such as hot, young, cheap
etc. Thus, various extensions to Semantic Web languages have been presented to
deal with imperfect information, i.e. probabilistic approaches such as [3], [4] for
OWL, [5] for RDF as well as approaches based on Fuzzy Logics such as [6] and
[7]. For an overview of non-crisp ontology formalisms the reader is referred to [8].
Whatever the formalism used is, one requirement is of crucial importance: the
availability of scalable reasoning procedures to deal with imperfect knowledge.
     In this paper, we thus ask ourselves a straightforward question with a not so
straightforward answer: are current reasoning procedures mature and scalable
enough to reason with a large amount of imperfect (e.g. fuzzy) information?
Giving a principled and empirically grounded answer to this problem is certainly
out of the scope of this and many articles to be published. In fact, we are
convinced that it is only through systematic experimental evaluation that we
will find out which techniques can scale towards real scenarios with millions of
facts to be stored and retrieved.
     As a first step towards answering this question, the contribution of this paper
is to examine a very specific setting, i.e. we assume that knowledge is represented
using fuzzy description logics and analyze the scalability of the KAON2 reasoner
on the knowledge bases resulting from reducing Fuzzy Description Logics (see [6,
7]) to standard description logics based on the idea described in [9]. KAON2 is a
resolution-based SHIQ(D)-reasoner which transforms a SHIQ(D) knowledge
base into a Disjunctive Datalog Program. This allows to reuse optimization
techniques from database technology to reason with large A-Boxes. The aim of
our paper is to explore if one can indeed reason with large Fuzzy A-Boxes, a
necessary requirement for large-scale knowledge management applications. In
this sense this paper can be seen as a contribution towards a larger goal: the one
of clarifying whether fuzzy knowledge representation can indeed scale towards
meeting the size of large-scale applications.
     The structure of this paper is as follows: in Section 2, we introduce the logic
fKD -SHIN as well as the reduction to crisp OWL presented already in [7]
in order to make the paper self-contained. In Section 3 we briefly describe the
OWL-DL reasoner used in our experiments, i.e. the KAON2 reasoner. In Section
4 we present our experiments and results. Finally, in Section 5 we discuss some
related work and conclude.


2   Fuzzy Description Logics and their reduction to
    Classical DLs

In this section we provide a brief introduction to the fuzzy Description Logic
(DL) fKD -SHIN [10]. fKD -SHIN is a fuzzy extension of the SHIN DL [11]
  which uses the fuzzy operators 1 − x, max and min for performing the fuzzy set
  theoretic operations. For a general fuzzy extension of SHIN please refer to [6].
       Fuzzy Description Logics [12] have been proposed as powerful knowledge rep-
  resentation languages capable of capturing vague (fuzzy) knowledge that exists
  in many applications. Fuzzy DLs and ontologies usually keep the same syntax
  as their crisp (classical) counterpart, without adding any additional degrees to
  them; notable exceptions are [6] and [13]. Thus, using concepts and roles we
  can build concept descriptions in the usual way by using conjunctions (C u D),
  disjunctions (C t D), negation (¬C), existential (∃R.C) and universal quan-
  tification (∀R.C) as well as cardinality restrictions (≥ nR, ≤ nR). Similarly,
  concept and role axioms, like concept equivalence (C ≡ D), concept subsumption
  (C v D), role inclusions (R v S) and transitivity (Trans(R)) can be defined in
  the usual way. Thus, the definitions of a TBox and an RBox are as usual.


                     Table 1. Semantics of concepts descriptions and axioms
   Constructor          Syntax                            Semantics
top                       >         >I (a) = 1
bottom                    ⊥         ⊥I (a) = 0
general negation          ¬C        (¬C)I (a) = 1 − C I (a)
conjunction             CuD         (C u D)I (a) = min(C I (a), D I (a))
disjunction             CtD         (C t D)I (a) = max(C I (a), D I (a))
exists restriction       ∃R.C       (∃R.C)I (a) = supb∈∆I {min(RI (a, b), C I (b))}
value restriction        ∀R.C       (∀R.C)I (a) = inf b∈∆I {max(1 − RI (a, b), C I (b))}
                                                                                 p+1
at-most                 ≤ pR        (≤ pR)I (a) =         inf          max(1 − max RI (a, bi ), max{bi = bj })
                                                    b1 ,...,bp+1 ∈∆I              i=1             i<j
                                                                           p
at-least                ≥ pR        (≥ pR)I (a) =        sup         min(min RI (a, bi ), min{bi 6= bj })
                                                                          i=1             i<j
                                                    b1 ,...,bp ∈∆I
                           −           − I            I
inverse role              R         (R ) (b, a) = R (a, b)
equivalence             C≡D         ∀a ∈ ∆I .C I (a) = D I (a)
sub-concept             CvD         ∀a ∈ ∆I .C I (a) ≤ D I (a)
transitive role       Trans(R)      ∀a, b ∈ ∆I .RI (a, b) ≥ supc∈∆I {t(RI (a, c), RI (c, b))}
sub-role                RvS         ∀a, b ∈ ∆I .RI (a, b) ≤ S I (a, b)
concept assertions   (a : C)./n     C I (aI )./n, ./ ∈ {≥, >, ≤, <}
role assertions    ((a, b) : R) B n RI (aI , bI ) B n, B ∈ {≥, >}


      Fuzzy DLs extend the syntax of assertions with membership degrees, thus
  allowing to create fuzzy assertions (see [12, 10, 14, 15]) as well as to annotate
  individual axioms (assertions) with degrees of membership. More formally, a
  fuzzy assertion [12] is of the form (a : C)./n or ((a, b) : R) B n, where ./ ∈ {≥
  , >, ≤, <}, B ∈ {≥, >} and a, b are individuals, and n ∈ (0, 1]. In many cases we
  write (a : C) = n instead of writing two fuzzy assertions of the form (a : C) ≥ n
  and (a : C) ≤ n. For example, one is able to state that grass is Green to a degree
  of at least 0.7, writing (grass : Green) ≥ 0.7, or that athens is nearTo rome to
  a degree of at least 0.8, by the axiom ((athens, rome) : nearTo) ≥ 0.8. A set of
  fuzzy assertions defines a (fuzzy) ABox.
      The semantics of fuzzy DLs are provided by fuzzy interpretations [12], which
  extend classical interpretations to the unit interval [0, 1]. A fuzzy interpretation
is a pair I = (∆I , ·I ) where the domain ∆I is a non-empty set of objects and
·I is a fuzzy interpretation function, which maps
 – an individual name a ∈ I to an element aI ∈ ∆I ,
 – a concept name A ∈ C to a membership function AI : ∆I → [0, 1],
 – a role name R ∈ R to a membership function RI : ∆I × ∆I → [0, 1].
Intuitively, an object (pair of objects) can now belong to a fuzzy concept (role)
to any degree between 0 and 1. For example, HotPlaceI (RomeI ) = 0.7, means
that RomeI is a hot place to a degree equal to 0.7. Fuzzy interpretations can be
extended to interpret fKD -SHIN -concepts and roles with the aid of the fuzzy
operators 1 − x, min and max. The complete semantics for concept descriptions,
concept and role axioms are depicted in Table 1.
    We can now define the inference problems for fuzzy DLs. A fuzzy knowledge
base Σ is satisfiable (unsatisfiable) iff there exists (does not exist) a fuzzy inter-
pretation I which satisfies all axioms in Σ. An f-SHIN -concept C is n-satisfiable
w.r.t. Σ iff there exists a model I of Σ for which there is some a ∈ ∆I such that
C I (a) = n, and n ∈ (0, 1]; C subsumes D w.r.t. Σ iff for every model I of Σ
we have ∀d ∈ ∆I , C I (d) ≤ DI (d); a fuzzy ABox A is consistent (inconsistent)
w.r.t. a fuzzy TBox T and RBox R if there exists (does not exist) a model I
of T and R that satisfies each assertion in A. Given a fuzzy concept axiom, a
fuzzy role axiom or a fuzzy assertion φ, Σ entails φ, written Σ |= φ, iff for all
models I of Σ, I satisfies φ.
    Currently, there are many proposals for reasoning in various different frag-
ments of fuzzy Description Logics ranging from tableaux-based [12, 10] to
optimization-based [16, 17]. Straccia proposed in [9] a method for reducing an
fKD -ALCH knowledge base to a classical ALCH one. The purpose of the re-
duction is to reduce the reasoning problem from f-DLs to crisp DLs in order to
allow to use existing optimized reasoners, like FaCT [18], Pellet [19] or KAON2
(see Section 3) to perform the inference tasks of fuzzy DLs. Later, Bobillo et
al. extended the method [13] to cover the full fuzzy OWL language, while quite
recently a number of optimisations of the method have been also proposed [20].
In the following we sketch the main idea and refer the interested reader to [9,
13, 20] for a detailed presentation.
    The main idea behind the reduction is that a fuzzy assertion of the form
(a : C) ≥ n, where a is an individual and n ∈ [0, 1] can be represented by a
crisp assertion of the form a : C≥n , where C≥n is a new concept. Intuitively,
C≥n stands for the set of objects that belong to C to a degree greater or equal
than n. Then there are two important points for the reduction:

 1. The number of different membership degrees we have to consider in the
    reduction is finite (this is a property of fKD -DLs) and more precisely defined
    by the set: N Σ = {0, 0.5, 1} ∪ {n, 1 − n | (a : C)./n ∈ A or ((a, b) : R)./n ∈
    A}.
 2. In order for the reduction to be satisfiability-preserving, additional concept
    and role axioms need to be added. For example, if n1 , n2 are two degrees from
    N Σ and n1 ≤ n2 , then it obviously holds that A≥n2 v A≥n1 . In summary
     for each atomic concept in the fuzzy KB the following axioms are added [9,
     13]:
            (1)         A≥ni+1 v A>ni ,              A>ni v A≥ni
            (2)           A<nj v A≤nj ,              A≤ni v A<ni+1
            (3)    A≥nj u A<nj v ⊥,           A>ni u A≤ni v ⊥
            (4)             > v A≥nj t A<nj ,          > v A>ni t A≤ni

     where 1 ≤ i ≤ |N Σ | − 1 and 2 ≤ j ≤ |N Σ |. Note that axioms with concepts
     A≤n and A<n are necessary since the fuzzy ABox might contain fuzzy as-
     sertions of the form (a : A) ≤ n or (a : A) < n. Similarly for each atomic
     roles we have: R≥ni+1 v R>ni , R>ni v R≥ni .
    Finally, concept and role axioms in the fuzzy KB should also be reduced [9,
13]. Thus, if |C| and |R| is the number of the different atomic concepts and
roles in the original fuzzy KB, respectively, and |N Σ | the number of different
membership degrees that appear in the fuzzy ABox, then the reduced crisp KB
would contain as much as 8|C|(|N Σ | − 1) concept axioms and 2|R|(|N Σ | − 1)
role inclusion axioms of this kind of satisfiability-preserving axioms. On the other
hand, as shown in [9], there are additionally 4|T ||N Σ | concept and 2|R||N Σ | role
inclusion axioms, where |T | and |R| is the number of concept and role axioms
in the original fuzzy KB. Due to the fact that the number of axioms increases
quadratically, the performance of DL reasoners can heavily deteriorate. For this
reason, Bobillo et. al. [20] have proposed a number of optimisations. In particular,
they note the concept equivalences A≤n ≡ ¬A>n and A<n ≡ ¬A≥n , thus they
map fuzzy assertions of the form (a : A) ≤ n and (a : A) < n in the ABox to
a : ¬A>n and a : ¬A≥n respectively. Then, using the laws of contradiction and
excluded middle, it can be proved that all axioms of the forms (2) to (4) above are
unnecessary. Roughly, speaking this optimisation reduces the number of concept
axioms in the reduced TBox to about a quarter compared to the unoptimised
one. In our experiments we will in particular also compare the performance of
the optimised vs. the unoptimised reduction.

3     OWL DL Reasoning with KAON2
Reasoning with KAON2 is based on special-purpose algorithms which have been
designed for dealing with large ABoxes. They are described in more detail in [21],
such that we only present a birds’ eyes perspective here. The underlying ratio-
nale of the reasoning procedures is that algorithms for deductive databases have
proven to be efficient in dealing with large numbers of facts. The KAON2 ap-
proach utilises this by transforming OWL DL ontologies to disjunctive datalog,
and by the subsequent application of the mentioned and established algorithms
for dealing with disjunctive datalog [22].
    A birds’ eyes perspective on the KAON2 approach is depicted in Figure 1.
KAON2 can handle SHIQ(D)1 description logic ontologies, which corresponds
1
    SHIQ(D) provides datatypes and qualified number restrictions in addition to the
    SHIN description logic used for our fuzzy reasoning.
                      OWL DL TBox
       Query                                                    OWL DL ABox
                      (no nominals)

 suffices for
 some queries
 e.g. instance   Transformation to
 retrieval for
 named classes   Disjunctive Datalog
                 [ExpTime]


        Disjunctive Datalog Reasoning Engine [coNP]

                     Answer

                        Fig. 1. KAON2 approach to reasoning

roughly to OWL DL without nominals. The TBox is processed together with a
query by the transformation algorithm, which is described in more detail below
and which returns a disjunctive datalog program. This, together with an ABox,
is then fed into a disjunctive datalog reasoner which eventually returns an answer
to the query. In some cases, e.g. when querying for instances of named classes,
the query does not need to be fed into the transformation algorithm but instead
needs to be taken into account only by the datalog reasoner.
    The transformation algorithm accepts a SHIQ (or SHIQ(D)) TBox and
returns a disjunctive datalog program. Note that the returned program is in
general not logically equivalent to the input TBox; the exact relationship is
given below in Theorem 1.
    The steps of the algorithm can roughly be described as follows. (1) Transitiv-
ity axioms for roles S are replaced by adding axioms of the form ∀R.C v ∀S.∀S.C
whenever S v R. This is a standard method for eliminating transitivity axioms,
such that the resulting knowledge base is satisfiable if and only if the original
knowledge base is. This ensures that the translation can be used to solve typi-
cal SHIQ reasoning problems by reducing them to unsatisfiability of a SHIQ
knowledge base.
    Employing the fact that SHIQ can be regarded as a subset of first-order
logic, step (2) uses standard algorithms to transform the knowledge base into
conjunctive normal form. This involves eliminating existential quantifiers by
Skolemization, such that function symbols must be introduced into the knowl-
edge base.
    Next, in step (3), the obtained set of clauses is partially saturated by adding
logical consequences. This is the crucial step of the algorithm where one has to
compute enough consequences to allow for a reduction to function-free Datalog.
Since the computational complexity is ExpTime for SHIQ but only NP for
disjunctive Datalog, it should not come as a surprise that this transformation
step can be exponential in the size of the input. The details of this step are
rather sophisticated such that we refer to [21] for details and proofs.
    Now function symbols can safely be eliminated in step (4). To ensure that
this process still preserves satisfiability of the knowledge base, one has to add a
linear number of auxiliary axioms. Finally, it is easy to syntactically transform
the resulting set of clauses into a disjunctive Datalog program in step (5).
    Due to the transformations in steps (1) and (2), the output of the algorithm
is in general not logically equivalent to the input. Since all major reasoning
tasks for SHIQ can be reduced to satisfiability checking, it is sufficient for the
transformation to preserve satisfiability, as shown in the following theorem from
[21].

Theorem 1. Let K be a SHIQ(D) knowledge base and D(K) be the datalog
output of the KAON2 transformation algorithm on input K. Then the following
claims hold.

 – K is unsatisfiable if and only if D(K) is unsatisfiable.
 – K |= α if and only if D(K) |= α, where α is of the form A(a) or R(a, b), for
   A a named concept and R a role.
 – K |= C(a) for a nonatomic concept C if and only if, for Q a new atomic
   concept, D(K ∪ {C v Q}) |= Q(a).

   A performance evaluation of KAON2 is reported in [23]. It shows that
KAON2 is indeed superior to other reasoners in most cases where size of the
ABox dominates compared to the size of the TBox. This is exactly the reason
why we have decided to use KAON2 in the context of our experiments with large
A-Boxes.


4   Experiments

In this section, we present the results of our experimental evaluation. First of
all, we describe the datasets used, i.e. the ontologies, in more detail (see Sub-
section 4.1). These belong to different complexity classes in order to be able to
analyse performance with respect to different classes of ontologies. Most stan-
dard benchmark datasets do not focus on average performance (over a larger
amount of queries) nor on performance with respect to data size. Thus, we made
two important decisions in the context of our experimental evaluation. First, we
decided to automatically generate larger A-Boxes ranging from 100 to 1 Mio. in-
dividuals. In order not to bias our results by some generation strategy, we apply
three different ways of generating A-Boxes of varying sizes for a given ontology.
These strategies are described in more detail in Subsection 4.2. Second, instead
of relying on a limited number of predefined queries as done in most bench-
marks, we decided to also randomly generate a number of n queries of varying
complexity. This will then allow us to analyze the performance of our approach
with respect to an average conjunctive query.
4.1   Datasets

We have chosen some popular ontologies which have been used in previous bench-
marks [23]. These are described in what follows:

 – VICODI: The VICODI ontology2 was manually created in the EU-funded
   VICODI project and is a representative of the RDFS(DL) fragment. This
   ontology is relatively small and simple since it does not contain any disjunc-
   tions, existential quantification or number restrictions.
 – LUBM: The Lehigh University Benchmark (LUBM)3 was explicitly de-
   signed for OWL benchmarks. It describes the organizational structure of
   universities. Due to existential restriction on the right side of class expres-
   sions, the LUBM ontology is in OWL Lite.
 – Semintec: The Semintec ontology4 is about financial services and was cre-
   ated at the University of Poznan. It is also relatively simple like the VICODI
   ontology without existential quantifiers or disjunctions. But it is a represen-
   tative of the OWL DL fragment as it contains functional properties and
   disjointness constraints.
 – Wine: The Wine ontology5 is a prominent example of an OWL DL ontology.
   More precisely, we used a version for the Wine ontology without nominals,
   as it has been used in previous benchmarks (cf. [23]).


4.2   Generating A-Boxes

While the ontologies mentioned above already contain some A-Box assertions,
for most of them the A-Boxes are too small for our intended performance eval-
uations. Therefore, for each of the ontologies, we generated different extended
datasets with increasing A-Box size to be able to quantify the relationship be-
tween query time and the size of the A-Box. In particular, in order not to bias our
quantitative analysis, we experimented with three different ways of generating
A-Boxes: random, structural and proportional. We briefly describe these different
generation strategies, omitting details due to space limitations. For simplicity,
from now on, the individuals always include concept individuals and property
individuals if we do not specify which kind of individuals.

 – random generation: according to this strategy, a number n of individuals
   are generated by first randomly selecting m properties, generating new in-
   dividuals of these and randomly deciding whether to create a new domain
   (range) individual or use an existing one. This yields at most m property
   and 2m concept individuals. Then, the rest to n is filled up with random
   concept individuals.
2
  http://www.vicodi.org
3
  http://swat.cse.lehigh.edu/projects/lubm/index.htm
4
  http://www.cs.put.poznan.pl/alawrynowicz/semintec.htm
5
  http://www.schemaweb.info/schema/SchemaDetails.aspx?id=62
 – structural generation: according to this strategy, the existing A-Box is
   copied m times until the final number of individuals is reached, whereby
   concept and property individuals are renamed. This strategy thus aims at
   preserving the structure of the original ontology.
 – proportional: according to this strategy, the conditional probability
   P (cj |ic ), i.e. the probability that given that ic is a concept individual it be-
   longs to concept cj is calculated in the original A-Box. Further, also P (pj |ip )
   is calculated for properties. The relative frequency of property versus concept
   individuals is also calculated. Then, new individuals are randomly generated
   according to the above probabilities. This strategy aims at preserving the
   distribution in the original ontology.

    For each of the ontologies mentioned above, we have thus generated ontologies
with 100, 1.000, 10.000, 100.000 and 1 Mio. individuals. Further, in order to
generate fuzzy ontologies out of these, a random fuzzy degree has been generated
for each of the individuals in the different datasets. The ontologies with the fuzzy
degree values have then been reduced using the optimised and the unoptimised
reduction.

4.3   Queries
To generate various sorts of queries for tests, we use 5 query patterns which vary
from the simplest one to retrieve all the individuals of a concept to the complex
ones such as:
SELECT ?V1 ?V2 ?V3 ?V4 ?V5 ?V6
WHERE {?V1 rdf:type C1 . ?V2 rdf:type C2 . ?V3 R1 ?V4 . ?V5 R2 ?V6}
where, V i (i = 1...6) indicates variables. C1 and C2 means concepts and R1 and
R2 are relations.
    When generating a query, we randomly select one query pattern. For each
chosen pattern, we replace each variable in the pattern by randomly choosing one
from several distinct variables given beforehand. As for each relation (concept)
in the pattern, we randomly choose one relation (concept) from the ontology to
be queried.
    This way of creating queries is obviously flexible enough to generate an arbi-
trary number of queries of varying complexity. It is important to mention that,
as we only use named classes in our queries, the reduction to Datalog does not
need to be performed for each query (see Section 3).

4.4   First observations: number of axioms
In order to generate ontologies with Fuzzy ABoxes, we randomly generated fuzzy
degrees for the A-Boxes generated as described above. After applying the reduc-
tion, we obtained ontologies with a number of axioms (TBox and RBox) as
reported in Table 2. It shows the number of axioms for the original ontologies as
well as those resulting from the reduction to crisp OWL-DL for different degrees,
i.e. 3 (0, 0.5 1), 6 (0, 0.2, 0.4, 0.6, 0.8, 1) and 11 (0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6,
Ontology                                         # Axioms
                                Original                               Reduced Ontologies
                                                                 Unoptimized         Optimized
         Class Prop. TBox Subcl. Disj. Subpr. Func. Dom. Ran.     3      6     11     3     6    11
VICODI    194     10 223    193     0      9     10   10    0 5.110 12.196 24.006 1.624 4.872 8.120
LUBM        43    25   94    36     0      5     25   18    0 1.248 2.994 5.904 480 1.440 2.400
Semintec    60    16 222     55 113        0     16   16   12 1.808 4.016 7.696 770 2.310 3.850
Wine      141     13 218    126     1      5      6    9    6 4.204 9.946 19.516 1.402 4.206 7.010

   Table 2. Number of axioms for the original ontologies and after the reduction


0.7, 0.8, 0.9, 1). We can certainly see that the reduction increases the number
of axioms dramatically. Interestingly, the optimized reduction produces about a
third of the axioms produced by the unoptimized method.


4.5   Results

We have carried out performance evaluation tests for the above mentioned four
ontologies comparing the query performance for standard query answering with
respect to the original ontology as well as fuzzy query answering with respect
to the fuzzy ontologies after the reduction using the optimised and unoptimised
methods. The fuzzy queries are also randomly generated using the patterns de-
scribed above, inserting the concepts and relations produced by the reduction,
thus yielding ”fuzzy” conjunctive queries which are actually crisp queries with
respect to the transformed ontologies.
    Figure 2 shows the average time for query answering with respect to the
automatically generated datasets for the crisp ontologies containing 100 to 1.
Mio individuals for the different generation strategies. A first interesting result
is that the different generation strategies do not have a major effect on the query
time. For this reason, we decided to carry out the remaining experiments using
only the proportional generation strategy.
    In particular, we tested ontologies with 3, 6 and 11 fuzzy degrees. For all
queries, we imposed a time limit of 20 minutes. After that, query evaluation was
stopped and counted as 20 minutes. All results are reported averaged over 100
queries. Our first conclusions were the following:

 – The unoptimised reduction leads to ontologies which are completely in-
   tractable, such that almost none of the queries can be answered within 20
   minutes.
 – The Wine ontology showed to be completely intractable both with the opti-
   mised and unoptimised reduction even using only 3 degrees! This shows that
   highly expressive (fuzzy) ontologies lead to a high number of axioms when
   they are reduced to crisp ontologies, thus becoming intractable for state-of-
   the-art reasoners. Thus, we do not show the detailed results for the Wine
   ontology from now on.
 – Using the optimised reduction, the query time with respect to the original
   crisp ontologies increases a factor of between 2 and 100 depending on the
                      10000


Response time (sec)   1000

                       100

                        10

                         1

                        0,1

                       0,01                                             vicodi_100000
                              vicodi_100

                                           vicodi_1000

                                                         vicodi_10000


                                                                                        vicodi_1000000


                                                                                                                                                                                                                                                               wine_100

                                                                                                                                                                                                                                                                          wine_1000

                                                                                                                                                                                                                                                                                      wine_10000

                                                                                                                                                                                                                                                                                                   wine_100000

                                                                                                                                                                                                                                                                                                                 wine_1000000
                                                                                                         lubm_100

                                                                                                                    lubm_1000

                                                                                                                                lubm_10000

                                                                                                                                             lubm_100000

                                                                                                                                                           lubm_1000000


                                                                                                                                                                                         semintec_1000

                                                                                                                                                                                                         semintec_10000
                                                                                                                                                                          semintec_100


                                                                                                                                                                                                                          semintec_100000

                                                                                                                                                                                                                                            semintec_1000000
                                                                                                         Data sets with different individual size

                                                                                                         ProportionalWay                                   RandomWay                                     StructuralWay

Fig. 2. Average query time for the four (crisp) ontologies (VICODI, LUBM, Semintec
and OWL) for the three A-Box generation strategies: random, structural and propor-
tional


                      number of degrees used. We discuss this increase for the ontologies trans-
                      formed using the optimised reduction in more detail below with respect to
                      the different ontologies.

   Figure 3 shows the average querying time for the three ontologies (VICODI,
LUBM and Semintec) transformed with the optimised reduction using 3, 6 and
11 degrees. We discuss the results separately for each ontology:

        – VICODI (RDFS-DL): The results are as expected as there is a clear increase
          of query time from the original ontology to the transformed ontology with
          11 degrees. For 3 degrees, there seems to be an increase of factor 2 of the
          querying time, while for 6 degrees the increase corresponds to a factor of 10.
          The increase for 11 degrees is a factor between 10 and 100 (for the dataset
          with 100.000 individuals).
        – LUBM (OWL Lite): For LUBM, the results are also more or less as ex-
          pected. The increase is quite consistently a factor of 10 with the only excep-
          tion of the dataset with 10.000 individuals
        – Semintec (OWL-DL): The Semintec ontology shows a similar pattern as
          the other ontologies: using 3 degrees leads to an increase of query time with
          a factor of between 1-2; while using 6 degrees and 11 degrees leads to an
          increase within a factor of around 10 and 100, respectively.

    Overall, we can conclude that the increase in the time for answering a query
is considerable when considering the transformed ontologies, i.e. roughly a factor
                      10000


Response time (sec)   1000

                       100

                        10

                         1

                        0,1

                       0,01

                                                                        vicodi_100000
                              vicodi_100

                                           vicodi_1000

                                                         vicodi_10000


                                                                                          vicodi_1000000

                                                                                                           lubm_100

                                                                                                                      lubm_1000

                                                                                                                                  lubm_10000

                                                                                                                                               lubm_100000

                                                                                                                                                             lubm_1000000

                                                                                                                                                                            semintec_100

                                                                                                                                                                                           semintec_1000

                                                                                                                                                                                                           semintec_10000

                                                                                                                                                                                                                            semintec_100000

                                                                                                                                                                                                                                              semintec_1000000
                                                                                        Data sets with different individual size

                                                                                          crisp             3 degrees             6 degrees                  11 degrees


Fig. 3. Average query time for the three ontologies (VICODI, LUBM and Semintec)
transformed with the optimised reduction using 3, 6 and 11 degrees


between 2 and 100 compared to query answering with respect to the original crisp
ontologies. The pattern which emerges is quite clear. For the reduced ontologies
with 3 degrees, the query time seems to double compared to standard query
answering with respect to the original ontologies. When using 6-11 degrees, the
increase is in most cases a factor of at least 10 compared to the time for answering
queries with respect to the original crisp ontology. For 11 degrees, the increase
can even reach a factor of 100 (e.g. for VICODI with 100.000 individuals), but in
most cases there is an increase with a factor of 10. The good news is certainly that
for smaller ontologies from 100 to 10.000 individuals, the query time is at most a
few (<4) seconds regardless of the number of degrees used. This shows that for
smaller ontologies the approach considered is definitely feasible regardless of the
degrees considered. For bigger ontologies, using more degrees (6-11) comes at a
high price. Thus, applications will clearly need to balance the amount of degrees
they need to distinguish with the maximum querying time they can afford. In
any case, it seems that fuzzy query answering is feasible in real time (where real
time is defined as under a few seconds) in exactly those cases in which query
answering with respect to the original crisp ontology is also. In all those cases
where fuzzy query answering is not feasible in real time (let’s say > 100 seconds),
neither standard query answering with respect to the crisp ontologies is. In such
a case, one could argue, it is not anymore important whether the queries can
be answered in 100 seconds (1.6 minutes) or 1000 seconds (16 minutes) as these
times are both clearly beyond a real-time behaviour.
5   Related Work and Conclusion

In the last couple of years it has become evident that fuzzy Description Logics
could play an important role in several Semantic Web applications that face a
significant amount of vague knowledge, like multimedia analysis and retrieval.
In order to support inference services for fuzzy-DLs, several different reasoning
algorithms have been proposed such as tableaux-based [12, 10], optimization-
based [16, 17] as well as techniques that reduce fuzzy-DL reasoning to crisp DL
reasoning [9, 13, 20].
    Although there has been a significant amount of work on the reduction based
DL reasoning methods, there is currently no systematic and complete evalua-
tion of the method6 . In particular, Straccia was the first to propose the method,
showing that reasoning in fKD -ALCH can be reduced to reasoning in ALCH,
while later Bobillo et. al. [13] extended the method to reduce fKD -SHOIN
(together with fuzzy subsumption axioms [6]) to SHOIN . Then Bobillo [20]
extended the technique to SROIQ also proposing a number of optimisation
methods that reduced the number of created axioms. Additionally, in [20] there
is a first report on an implementation accompanied with a preliminary evalu-
ation. More precisely, the authors have implemented their proposed optimised
reduction technique which then they evaluated over a fuzzified version of the
Koala ontology7 . The Koala ontology contains 20 named classes, 15 anonymous
classes, 4 object properties, 1 datatype property and 6 individuals. The axioms of
the ontology where randomly extended with membership degrees, using 3, 5, 7,
9 and 11 different degrees, thus creating a set of five fuzzy ontologies. Then, the
reduction method was evaluated using the Pellet reasoner providing the times
for knowledge base satisfiability checking. In contrast, we have mainly focused
on the task of query answering. Instead of evaluating with respect to only one
small ontology, we have tested four different ontologies corresponding to different
complexity classes (i.e. the RDFS DL fragment, OWL Lite and OWL-DL) with
respect to different A-Box sizes of up to a million of individuals. Our focus has
in fact been to test the scalability of the reduction-based approach depending
on A-Box size. This is also the reason why we have used KAON2 as reasoner,
which was particularly developed to handle large A-Boxes. Our experimental
evaluation has shown that for quite expressive ontologies, especially the Wine
ontology, the approach based on the reduction to crisp DL is not feasible for
fuzzy query answering. Arguably, the wine ontology was never conceived as a
realistic ontology for A-Box reasoning but rather for T-Box reasoning. In any
case, in our experiments the ontologies resulting from the reduction to crisp DL
where only tractable when using the optimised reduction, which reduced the
number of axioms by around 66% compared to the unoptimised reduction.
    Overall, the increase in query time is moderate for small numbers of degrees
(e.g. 3-6). Distinguishing 3-6 degrees leads to increased query times with a factor
6
  Note that the landscape is not different with the other reasoning methods for fuzzy-
  DLs
7
  http://http://protege.cim3.net/file/pub/ontologies/koala/koala.owl
between 2-10 for smaller A-Boxes (up to 10.000 individuals) and to a factor of
100 for larger A-Boxes with around 100.000 individuals and more. The inter-
esting conclusion is nevertheless that for rather inexpressive ontologies such as
VICODI and LUBM, query answering with the approach analyzed here is defi-
nitely feasible for a few degrees (3-6) even with very large A-Boxes as the time
increase is below a factor of 10. For applications which really need fuzzy query
answering, this increase might be acceptable. Finally, the encouraging result is
that for smaller A-Boxes with a number of individuals between 100 and 10.000
query answering takes below a few seconds irrespectively of the degrees used.
The interesting conclusion that real time query answering seems to be feasible
for the fuzzy ontologies exactly for those A-Box sizes for which it is also feasible
for the crisp ontologies is an encouraging result from our point of view.
    There are obvious avenues for future work. First, it would be appropriate to
experiment with other reasoners such as FaCT or Pellet for reasoning with the
reduced ontologies. Further, a necessary next step is to compare with respect to
other techniques such as tableaux-based reasoners such as Fire (see [24]). While
our envisioned application is retrieval and thus querying, the investigation of
several issues from the application point of view seem interesting. First of all,
we expect that, in contrast to what we have assumed in this paper, not all con-
cepts in an ontology should be considered as inherently fuzzy. In fact, we expect
applications to pose clear requirements on what concepts should be understood
as fuzzy and which not. By this, the large amount of axioms produced by the re-
duction could be dramatically constrained, thus leading to a better performance.
In this context it will be necessary to clarify the interactions between fuzzy and
non-fuzzy concepts in query answering. Overall, we think that this paper offers
a clear step forward towards clarifying whether techniques for reasoning with
fuzzy knowledge can be expected to scale up.


References
 1. Meghini, C., Sebastiani, F., Straccia, U.: A model of multimedia information
    retrieval. Journal of the ACM 48(5) (2001) 909–970
 2. Zadeh, L.: The concept of a linguistic variable and its application to approximate
    reasoning. Information Sciences 8-9 (1975)
 3. Ding, Z., Peng, Y., Pan, R. Studies in Fuzziness and Soft Computing. In:
    BayesOWL: Uncertainty Modeling in Semantic Web Ontologies. Springer-Verlag
    (2005) 27
 4. da Costa, P., Laskey, K.: PR-OWL: A framework for probabilistic ontologies. In:
    Proceedings of the International Conference on Formal Ontology in Information
    Systems. (2006)
 5. Udrea, O., Subrahmanian, V.S., Majkic, Z.: Probabilistic rdf. In: Proc. of IRI’06.
    (2006) 172–177
 6. Straccia, U.: Towards a fuzzy description logic for the semantic web. In: Proceed-
    ings of the 2nd European Semantic Web Conference. (2005)
 7. Stoilos, G., Stamou, G., Tzouvaras, V., Pan, J., Horrocks, I.: Fuzzy OWL: Uncer-
    tainty and the semantic web. In: Proc. of the International Workshop on OWL:
    Experiences and Directions. (2005)
 8. Lukasiewicz, T., Straccia, U.: An overview of uncertainty and vagueness in de-
    scription logics for the semantic web. Technical Report INFSYS Research Report
    1843-06-07, Technische Universität Wien (2006)
 9. Straccia, U.: Transforming fuzzy description logics into classical description logics.
    In: Proceedings of the 9th European Conference on Logics in Artificial Intelligence
    (JELIA-04). Number 3229 in Lecture Notes in Computer Science, Lisbon, Portugal,
    Springer Verlag (2004) 385–399
10. Stoilos, G., Stamou, G., Tzouvaras, V., Pan, J.Z., Horrocks, I.: Reasoning with
    very expressive fuzzy description logics. Journal of Artificial Intelligence Research
    30(8) (2007) 273–320
11. Baader, F., McGuinness, D., Nardi, D., Patel-Schneider, P.: The Description Logic
    Handbook: Theory, implementation and applications. Cambridge University Press
    (2002)
12. Straccia, U.: Reasoning within fuzzy description logics. Journal of Artificial Intel-
    ligence Research 14 (2001) 137–166
13. Bobillo, F., Delgado, M., Gómez-Romero, J.: A crisp representation for fuzzy
    SHOIN with fuzzy nominals and general concept inclusions. In: Proc. of the 2nd
    International Workshop on Uncertainty Reasoning for the Semantic Web (URSW
    06), Athens, Georgia. (2006)
14. Straccia, U.: Answering vague queries in fuzzy DL-Lite. In: Proceedings of the
    11th International Conference on Information Processing and Management of Un-
    certainty in Knowledge-Based Systems, (IPMU-06). (2006) 2238–2245
15. Pan, J., Stamou, G., Stoilos, G., Thomas, E.: Expressive querying over fuzzy
    DL-Lite ontologies. In: Proceedings of the International Workshop on Description
    Logics (DL 2007). (2007)
16. Straccia, U.: Description logics with fuzzy concrete domains. In: 21st Conf. on
    Uncertainty in Artificial Intelligence (UAI-05), Edinburgh (2005)
17. Bobillo, F., Straccia, U.: A fuzzy description logic with product t-norm. In: Pro-
    ceedings of the IEEE International Conference on Fuzzy Systems (Fuzz-IEEE-07),
    London. (2007)
18. Horrocks, I.: Using an Expressive Description Logic: FaCT or Fiction? In: Proceed-
    ings of the Sixth International Conference on Principles of Knowledge Represen-
    tation and Reasoning (KR’98), Trento, Italy, June 2-5, 1998, Morgan Kaufmann
    (1998) 636–649
19. Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: A practical
    OWL-DL reasoner. Journal of Web Semantics 5 (2007) 51–53
20. Bobillo, F., Delgado, M., Gómez-Romero, J.: Optimising the crisp representation of
    the fuzzy dl SROIQ. In: Proc. of the 3nd International Workshop on Uncertainty
    Reasoning for the Semantic Web (URSW 07), Busan, Korea. (2007)
21. Motik, B.: Reasoning in Description Logics using Resolution and Deductive
    Databases. PhD thesis, Universität Karlsruhe (2006)
22. Cumbo, C., Faber, W., Greco, G., Leone, N.: Enhancing the magic-set method for
    disjunctive datalog programs. In Demoen, B., Lifschitz, V., eds.: ICLP. Volume
    3132 of Lecture Notes in Computer Science., Springer (2004) 371–385
23. Motik, B., Sattler, U.: A comparison of reasoning techniques for querying large
    description logic ABoxes. In: Proceedings of the 13th International Conference on
    Logic for Programming Artificial Intelligence and Reasoning (LPAR 2006), Phnom
    Penh, Cambodia, November, 2006. (2006)
24. Simou, N., Kollias, S.: Fire : A fuzzy reasoning engine for impecise knowledge,
    K-Space PhD Students Workshop, Berlin, Germany, 14 September 2007 (2007)

</pre>