Benchmarking Reasoners for Multi-Ontology
                         Applications

                         Ameet N Chitnis, Abir Qasem and Jeff Heflin

                Lehigh University, 19 Memorial Drive West, Bethlehem, PA 18015
                             {anc306, abq2, heflin}@cse.lehigh.edu


         Abstract. We describe an approach to create a synthetic workload for large
         scale extensional query answering experiments. The workload comprises multi-
         ple interrelated domain ontologies, data sources which commit to these ontolo-
         gies, synthetic queries and map ontologies that specify a graph over the domain
         ontologies. Some of the important parameters of the system are the average
         number of classes and properties of the source ontology which are mapped with
         the terms of target ontology and the number of data sources per ontology. The
         ontology graph is described by various parameters like its diameter, number of
         ontologies and average out-degree of node ontology. These parameters give a
         significant degree of control over the graph topology. This graph of ontologies
         is the central component of our synthetic workload that effectively represents a
         web of data.


1 Introduction

One of the primary goals of the Semantic Web is to be able to integrate data from di-
verse sources irrespective of the ontology to which it commits to. Unfortunately it is
difficult to measure progress against this goal. Although there are a large number of
ontologies, few have data associated with them, thereby making it difficult to execute
large scale integration experiments. The aim of this paper is to provide a benchmark
for a synthetic workload that can be easily scaled to the desired configuration for exe-
cuting large scale extensional query answering experiments.
     The benchmark described here was originally developed for evaluating our OBII
system [1]. However our approach could be applied to evaluate other Semantic Web
systems in general. In this paper we present the various workload components that are
of general interest. We also discuss wherever applicable how they can be further gen-
eralized. Specifically we make the following two technical contributions in this paper.

    1.     We design and implement an algorithm to generate a graph of ontologies de-
           fined by parameters like diameter, average out-degree of node ontology,
           number of paths having a diameter length, number of terminal ontologies,
           number of maps etc. Thereafter we generate mapping ontology axioms that
           conform to a subset of OWL DL.
    2.   We use these in conjunction with an approach to generate synthetic domain
         ontologies, synthetic data sources and synthetic queries in order to provide a
         complete Semantic Web workload.
     The rest of the paper is organized as follows: In section 2 we provide a back-
ground about the related work. In Section 3 we define the various steps of data gen-
eration process like generation of domain ontologies, data sources, queries and the
graph of ontologies to create mapping axioms. We introduce a mapping language for
describing maps. In Section 4, we describe the methodology for carrying out an ex-
periment and the performance metrics that can be used for evaluation. In Section 5,
we conclude and discuss future work.


2 Background

The LUBM [2] is an example of a benchmark for Semantic Web knowledge base sys-
tems with respect to use in large OWL applications. It makes use of a university do-
main workload for evaluating systems with different reasoning capabilities and stor-
age mechanisms. Li Ma et. al [3] extend the LUBM so that it can support both OWL
Lite and OWL DL (except Tbox with cyclic definition and Abox with inequality defi-
nition). However LUBM and extended LUBM use a single domain/ontology namely
the university domain comprising students, courses, faculty etc. We need workloads
comprising multiple interrelated ontologies.
    Tempich and Volz [4] perform statistical analysis of the available Semantic Web
ontologies and derive important parameters which could be used to generate synthetic
ontologies. T D. Wang et. al [5] have conducted a more recent survey on OWL on-
tologies and RDFS schemas to perform analysis over the statistical data and report
some important trends. The latter are used to determine if there are interesting trends
in modeling practices, OWL construct usages and OWL species utilization. These
works can be used in determining reasonable parameters for Semantic Web bench-
marks but do not present benchmarks in themselves.
      There has been some prior work on benchmarking DL systems. Horrocks and
Patel-Schneider [6] use a benchmark suite comprising four kinds of tests: concept sat-
isfiability tests, artificial Tbox classification tests, realistic Tbox classification tests
and synthetic Abox tests. The TBox refers to the intentional knowledge of the domain
(similar to an ontology) and the ABox contains extensional knowledge. Elhaik et. al.
[7] provide the foundations for generating random Tboxes and Aboxes. The satisfi-
ability tests compute the coherence of large concept expressions without reference to
a Tbox. However, these approaches neither create OWL ontologies nor SPARQL que-
ries and only focus on a single ontology at a time.
      Garcia-Castro and Gomez-Perez [8] provide a benchmark suite for primarily
evaluating the performance of the methods provided by the WebODE ontology man-
agement API. Although their work is very useful in evaluating ontology based tools it
provides less information on benchmarking knowledge base systems.
      J. Winick and S. Jamin [9], present an Internet topology generator which creates
topologies with more accurate degree distributions and minimum vertex covers as
compared to Internet topologies. Connectivity is one of the fundamental characteris-
tics of these topologies. On the other hand while considering a Semantic Web of on-
tologies there could be some ontologies not mapping to any other ontology thereby
remaining disconnected from the graph.


3       Data Generation

We now describe the process of generating several types of synthetic workloads to
represent a wide variety of situations. While generating the data set the user is given
the freedom to modify the independent parameters while the rest essentially serve as
controls whose values are dependent on the nature of applications, like information
integration etc. The characteristics of a domain ontology and a map ontology are
clearly demarcated in that the former does not have any import statements and a map
inherits the axioms of the two ontologies being mapped. This approach is equivalent
to having a set of ontologies some of which inherit the axioms of the others. But our
approach is very useful in creating the graph of ontologies.


3.1 Generation of Domain Ontologies

We implemented a workload generator that allows us to control several characteristics
of our dataset. In generating the synthetic domain ontologies we decided to have on
the average 20 classes and 20 properties (influenced by the dominance of small on-
tologies in the current Semantic Web).
      Due to restrictions placed on our OBII system our existing implementation only
generates domain ontologies comprising subClassOf and subPropertyOf axioms in
order to support taxonomic reasoning. Also, following the statistical analysis of the
DAML ontology library [4] we maintain more subClassOf axioms than subProper-
tyOf axioms. We designate these ontologies as simple ontologies. But however we
can easily enhance the degree of expressivity to OWL DL or OWL Lite by including
complex axioms like unionOf, intersectionOf, inverseOf etc; because the
classes/properties used in our ontology are synthetic without possessing any intuitive
semantics. Also, there has been some related work like the Artificial Tbox Classifica-
tion tests of Horrocks and Patel-Schneider [6] for benchmarking DL systems.
      To create a domain ontology, we randomly establish subClassOf and subProper-
tyOf relationships across classes and properties respectively. The class and property
taxonomy have an average branching factor of 4 and an average depth of 3.


3.2 Generation of the graph of interlinked ontologies

We consider a directed graph of interlinked ontologies, where every edge is a map
from the source ontology to the target ontology. This map ontology comprises a set of
mapping axioms. We describe the following terms for discussing such a graph -

         Diameter: The length of the longest path in the graph
        Whether the node is a terminal node i.e. has a zero out-degree. Before the
         map is created, we determine the number of terminal nodes and randomly
         mark those many domain ontologies as terminal. The algorithm is so de-
         signed, that it prevents a non-terminal node from attaining a zero out-degree.
         Also, there could be some terminal nodes with a zero in-degree, thereby dis-
         connecting them from the graph.
        Out-path length: The length of the longest outgoing path of a node
        In-path length: The length of the longest incoming path to a node

    The inputs to the graph creation algorithm are the number of ontologies, average
out-degree of the nodes and diameter of the graph. There is a parameter – longPaths
which indicates the number of paths having a diameter length. This parameter has
been hard coded to 1 because we need to have at least one path of diameter length.
The algorithm usually creates additional paths having a diameter length.
     Another important parameter is the total number of maps. We show how this can
be calculated from other parameters.

  Let
         maps – total number of maps
         out – average out-degree
         onts – total number of ontologies
         term – total number of terminal ontologies

     Parameters like maps, out and term are interrelated in that maps is approximately
equal to the product of non terminal ontologies and out. Hence we have -
                           onts  term  out  maps                              (1)
     However we do not provide term as an input parameter. We show how a reason-
able value can be computed from other parameters. We can express maps as the prod-
uct of term and diameter. The number of maps is at least equal to this product. This is
because the in-path length of a terminal node is equal to the diameter. There could be
more maps, in situations where more than one diameter length path leads to a terminal
node as explained below –


    As shown above, the terminal node (marked ‘T’) has 4 paths of diameter length
(diameter is 3) leading to it, effectively yielding more maps. Hence the equation be-
low is desirable but not a requirement. Given that we prefer graphs that will branch
out we will use -
                            term  maps / diameter                                (2)
    Substituting (2) in (1) we get
         onts  maps / diameter   out  maps
         onts  out  (maps * out ) / diameter  maps
         onts  out  maps  (maps * out ) / diameter
         onts  out  diameter  maps  diameter  maps  out
         onts  out  diameter  maps  (diameter  out )

                  maps  (onts  out  diameter ) /( diameter  out )                (3)

Also, by substituting (3) in (2)

                         term  (onts  out ) /( diameter  out )                    (4)

Steps of the Algorithm

    1.   At the outset determine the number of terminal nodes using the equation- (4)
         above. Then randomly mark those many domain ontologies as terminal.
    2.   Thereafter create a path of diameter length. This ensures that there is at least
         one path of length equal to that of diameter.
    3.   For every non-terminal ontology, randomly select its out-degree which falls
         within some range of the specified average out-degree. This range extends
         by one half of the specified average out-degree on its either side. We choose
         a uniform distribution for generating a random number. Thereafter randomly
         select as many target ontologies as the chosen out-degree. The target ontol-
         ogy could be either terminal or non terminal. The sources and the target on-
         tologies will eventually be used for creating mapping axioms.
    4.   While creating a map ontology between a source and a target certain con-
         straints have to be satisfied which are as follows
             i. The in-path length of the source should be less than the diameter in
                 order to prevent the creation of a path of length greater than the di-
                 ameter.
            ii. The target should be different from the source
           iii. There shouldn’t already exist a direct mapping between the source and
                 the target
           iv. The target should not be among those ontologies from which the
                 source could be visited. This prevents the creation of any cycles in the
                 graph. This is a requirement for OBII, which could be relaxed for
                 other systems.
            v. With the given source and the selected target a transitive path of
                 length greater than the diameter shouldn’t be created. This means that
                 the in-path length of the source + the out-path length of the target + 1
                 should not be greater than the diameter.
           vi. If the target is a non-terminal node and by virtue of creating a map be-
                 tween the source and the target, the latter or any of its non-terminal
                 descendants could become a terminal node then it should be avoided.
                 This happens when the in-path length of the source is one less than the
                 diameter.
           vii. There sometimes arises a situation, where none of the existing nodes
                 can satisfy the above constraints. This can happen in cases of large di-
                 ameters and large out-degrees or when the diameter is equal to the
                 number of ontologies. When such a situation arises a new ontology is
                 created to serve as a target. Such ontologies which are dynamically
                 created are termed as fresh ontologies. So the total number of ontolo-
                 gies at the end may be greater than the number of ontologies with
                 which the algorithm began.
    5.    Once a map ontology is created the attributes of the source and the target
          have to updated as follows
              i. The source and the set of ontologies from which it can be reached
                 must be added to the set of ontologies from which the target and its
                 descendants can be reached
             ii. The out-degree of the source has to be updated.
            iii. The source must be made the parent of the target
            iv. The target should be made the child of the source
             v. The out-path length of the source and all its ancestors has to updated
                 if applicable
            vi. The in-path length of the target and all its descendants has to be up-
                 dated if applicable


3.3 Generation of mapping axioms

Once the source and the target ontologies have been identified mapping axioms need
to be established. A specific number of terms (classes and properties) from the source
ontology are mapped to terms in the target ontology. Since the domain ontologies are
randomly chosen while creating a map ontology we expect the latter to reflect a par-
tial overlap between the two. Hence this value has been hard coded to 20% of the total
number of classes and properties in the source ontology.
      OBII uses the language OWLII [1] which is a subset of OWL DL. This language
has been defined as follows.

Definition OWLII

         i. Let Lac be a DL language where A is an atomic class, and if C and D are
            classes and R is a property, then C⊓D and  R.C are also classes.
       ii. Let La include all classes in Lac. Also, if C and D are classes then C⊔D is
           also a La class.
      iii. Let Lc includes all classes in Lac. Also, if C and D are classes then  R.C
           is also an Lc class.
      iv. OWLII axioms have the form C⊑D, A  B, P⊑Q, P  Q, P  Q−, where C
           is an La class, D is an Lc class, A, B are Lac classes and P, Q are proper-
           ties.
At present we generate mapping axioms that fall strictly within OWLII. The limited
expressivity of OWLII prevents generating inconsistent axioms, but when extended to
more expressive axioms we can incorporate a consistency check to the ontology gen-
eration process.
      In what follows we first describe how this is implemented in our current system
and how it can be easily extended to OWL-DL. We create each mapping axiom by es-
sentially generating an OWL parse tree with the root node being a subclass operator.
Then based on a user supplied frequency table of various OWL constructors the tree
is expanded by using named classes or owl constructors. The frequency table allows
the users to specify a ratio of various owl constructors which they expect to have in
their mapping axioms.
      Our algorithm recursively builds the parse tree based on the above and termi-
nates by choosing named classes for all the remaining operands when the maximum
length of a mapping axiom is reached. If there doesn’t exist a mapping between a pair
of ontologies, it simply means that the latter are not related and represent different
domains. Such a landscape truly reflects the nature of semantic web comprising
groups of interrelated ontologies as well as lone ontologies. Thus answering a query
demands being selective about particular data sources instead of scanning the entire
data set. Our OBII system [1] uses the concept of “rel-files” in order to select only
those data sources which contain relevant information.

Note: In the above approach we essentially restrict the axiom generation to remain
within OWLII by using certain constructors in either the subject or the object position
of an axiom. This is done because our current implementation is geared towards data
for OBII system. However, if we lift these restrictions and allow for any constructors
to be on either side of the tree, we can generate axioms that are OWL-DL.


3.4 Generation of data sources

A specified number of data sources are generated for every domain ontology. Every
data source comprises ABox assertions with named classes/properties. For every
source a particular number of classes and properties are used for creating triples.
These triples are added to the source ontology being created. The number of classes
and properties to be used for creating triples can be controlled by specifying the rele-
vant parameters. With our current configuration the average data source has 75 tri-
ples. Considering the sparse landscape of the number of classes/properties from an
ontology which are actually instantiated [10] and also due to the lack of knowledge
about the prospective manifestation of the actual semantic web we have currently
chosen to instantiate 50% of the classes and 50% of the properties of the domain on-
tology. But however this can be easily modified to suit the nature of application.


3.5 Generation of Queries

Our query generation process generates SPARQL queries from a given set of ontolo-
gies. Currently we support single ontology queries i.e. queries that have predicates
from a single namespace. This approach can be extended to multi ontology queries
quite easily. In our current approach we randomly choose an ontology from a set of
ontologies to be the query ontology. These queries are conjunctive in nature as in the
conjunctive query language of Horrocks and Tessaris [11]. We then randomly gener-
ate a set of query predicates. The number of predicates for each query is determined
by a user specified parameter. We generate the queries based on the following poli-
cies:
     1. We choose the first predicate from the classes of the query ontology.
     2. We bias the next predicate to have a 75% (modifiable) chance of being one
          of the properties of the query ontology in order to achieve some degree of
          control over query selectivity.
     3. In order to generate interesting queries that require some joins between query
          predicates, we need to have variables that are shared by at least two predi-
          cates of a given query. In order to guarantee this shared variable, when gen-
          erating a new predicate we can use one variable from the previous predicate
          that has been generated. If the new predicate is unary we use the variable
          from the previous predicate and if it is binary in addition to the "used" vari-
          able we also create a fresh one. Furthermore in choosing the position of the
          “used” variable in a new binary predicate that is being created, on the aver-
          age we choose to put it in the subject position 50% of the time and in the ob-
          ject position 50% of the time. This ensures that the former is equally likely to
          be in the subject as well as object position of connected triples.
     4. If the query we generate is a single predicate query we make all the variables
          distinguished. For any other queries we make on the average 2/3rd of the
          variables distinguished and the rest non-distinguished.
     5. We bias the introduction of a constant in a query predicate with a chance of
          10%.

      The above policy reflects our desire to have a simplistic query generation ap-
proach that can generate queries that are useful in measuring a system's performance.
It allows us to generate queries with a decent mix of classes, properties and individu-
als.

Note: Every conjunct/constant added to the query makes it more selective. With a di-
verse data set and randomly generated queries we obtain a wide range in the degree of
query selectivity.


4 Experimental Methodology

We present here our methodology of setting up an experiment for OBII and also the
performance metrics that could be used for evaluation.
     We feel that the most significant parameters that should be investigated are the
number of ontologies, data sources, out-degree and diameter. A configuration is de-
noted as: nO-nD-nS where nO is number of ontologies, nD is diameter and nS is
number of sources that commit to an ontology.
Metrics like Load Time, Repository Size, Query Response Time, Query Complete-
ness and Soundness could serve as good candidates for performance evaluation [2].
Load Time: This could be calculated as the time taken to load the Semantic Web
space: domain and map ontologies and the selected data sources.
Repository Size: This refers to the resulting size of the repository after loading the
benchmark data into the system. Size is only measured for systems with persistent
storage and is calculated as the total size of all files that constitute the repository. In-
stead of specifying the occupied disk space we could express it in terms of the con-
figuration size.
Query Response Time: We recommend this to be based on the process used in data-
base benchmarks where every query is consecutively executed on the repository for
10 times and then the average response time is calculated.
Query Completeness and Soundness: With respect to queries we say a system is
complete if it generates all answers which are entailed by the knowledge base. How-
ever on the Semantic Web partial answers will also be acceptable and hence we meas-
ure the degree of completeness of each query as a percentage of the entailed answers
that are returned by the system. On similar lines we measure the degree of soundness
of each query as the percentage of the answers returned by the system that are actually
entailed. On small data configurations, the reference set for query answers can be cal-
culated by using state of the art DL reasoners like Racer and FaCT. For large configu-
rations we can use partitioning techniques such as those of Guo and Heflin [12].


5 Conclusion and Future Work

Initial inspection has shown that our approach creates reasonable ontology graphs in
that they are consistent with our input parameters and also have a good path length
distribution from 0 to diameter. The graph topologies for some of the configurations
are as follows. The nodes in the graph represent the ontologies and the links represent
the mappings. A configuration is denoted by the following triple: “No. of ontologies –
Outdegree – Diameter”.


             Fig. 1a. 10 -2- 2                          Fig. 1b. 10 – 2 - 5
There are some nodes in Fig. 1a which are disconnected from the graph. These are
terminal nodes with zero in-degree. In the actual semantic web there could be such
ontologies which do not map to any ontology and remain isolated.
     In this paper we have discussed our approach for developing a benchmark for a
complete synthetic workload. In any kind of benchmark there is some tradeoff be-
tween realism and in being simple and sufficient. Our approach is simple but could be
easily generalized to support more expressive domain ontologies.
      We have also introduced a new methodology for creating a graph of multiple in-
terrelated ontologies that could be used by distributed query systems like OBII. The
graph can be controlled effectively by parameters like diameter and average out-
degree of the nodes. We could incorporate additional variables to represent in-degree
and out-degree distributions where a few ontologies serve as “hubs” with very high
out-degree and in other cases as “authorities” with a very high in-degree.
      A single workload is incapable of evaluating different knowledge base systems.
But our workload can be easily scaled to various configurations for the purpose of
evaluation. This might encourage the development of more scalable reasoners in the
near future.
      It would be useful to allow the user to specify the distribution of RDFS, OWL
Lite, OWL DL ontologies. Furthermore, we intend to conduct an initial experiment
for comparing OWL reasoners such as Sesame, KAON2, Minerva and OWLIM.


References

    1.    A. Qasem, D. A. Dimitrov, J. Heflin. Efficient Selection and Integration of Data
          Sources for Answering Semantic Web Queries. In New Forms of Reasoning Work-
          shop, ISWC 2007.
    2.    Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for owl knowledge base sys-
          tems. Journal of Web Semantics, 3(2):158–182, 2005.
    3.    L. Ma, Y. Yang, Z. Qiu, G, Xie and Y. Pan. Towards A Complete OWL Ontology
          Benchmark. In Proc. of the third European Semantic Web Conference.(ESWC 2006),
          2006
    4.    Tempich, C. and Volz, R. Towards a benchmark for Semantic Web reasoners – an
          analysis of the DAML library. In Workshop on Evaluation on Ontology Based Tools,
          ISWC2003.
    5.    T D. Wang, B. Parsia and J. Hendler. A Survey of the Web Ontology Landscape. In
          Proc. of the 5th International Semantic Web Conference. (ISWC 2006), 2006
    6.    I. Horrocks and P. Patel-Schneider. DL Systems Comparison. In Proc. Of the 1998
          Description Logic Workshop (DL’ 98), 1998.
    7.    Q. Elhaik, M-C Rousset and B. Ycart. Generating Random Benchmarks for Descrip-
          tion Logics. In Proc. of the 1998 Description Logic Workshop (DL’ 98), 1998.
    8.    R. Garcia-Castro and A. Gomez-Perez. A Benchmark Suite for Evaluating the Per-
          formance of the WebODE Ontology Engineering Platform. In Proc. of the 3rd Interna-
          tional Workshop on Evaluation of Ontology-based Tools, 2004.
    9.    J. Winick and S. Jamin. Inet-3.0: Internet Topology Generator. In University of
          Michigan Technical Report CSE-TR-456-02.
    10.   Z. Pan, A. Qasem, J. Heflin. An Investigation into the Feasibility of the Semantic
          Web. In Proc. of the Twenty First National Conference on Artificial Intelligence
          (AAAI 2006), Boston, USA, 2006. pp. 1394-1399.
    11.   I. Horrocks and S. Tessaris. A conjunctive query language for description logic
          aboxes. In AAAI/IAAI, pages 399–404, 2000.
    12.   Guo Y. and Heflin J. Document-Centric Query Answering for the Semantic Web.
          2007 IEEE/WIC/ACM International Conference on Web Intelligence (WI’ 07), 2007.
    13.   B. Grosof, I. Horrocks, R. Volz, and S. Decker. Description logic programs: Combin-
          ing logic programs with description logic. In Proceedings of WWW2003, Budapest,
          Hungary, May 2003. World Wide Web Consortium.