<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>OWL2Bench: Towards a Customizable Benchmark for OWL 2 Reasoners</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gunjan Singh</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ashwat Kumar</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kanav Bhagat</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sumit Bhatia</string-name>
          <email>sumitbhatia@in.ibm.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raghava Mutharaju</string-name>
          <email>raghava.mutharajug@iiitd.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IBM Research AI</institution>
          ,
          <addr-line>New Delhi</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Knowledgeable Computing and Reasoning Lab, IIIT-Delhi</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>In the past decade, there has been remarkable progress towards the development of reasoners3 that involve expressive ontology languages such as OWL 2 [3]. However, they still do not scale well on expressive language pro les (OWL 2 DL). To build better quality reasoners, developers need to nd and improve the performance bottlenecks of their existing systems [11]. A reasoner benchmark aids the reasoner developers to evaluate their system's performance and deal with the limitations. Furthermore, it paves the way for further research to improve performance and functionality. In particular, a reasoner needs to be evaluated from several aspects such as support for di erent language constructs and their combinations, their e ect on reasoning performance, ability to handle large ontologies, and capability to handle queries that involve reasoning. Although there are some existing ontology benchmarks, they are limited in scope. LUBM [2] and UOBM [7] are based on the older version of OWL (OWL 1). OntoBench [6] supports OWL 2 pro les but does not evaluate reasoner performance. ORE benchmark framework4 does not consider evaluation in the context of varying sizes of an ontology. In essence, no existing benchmark covers all the above-mentioned aspects for reasoner evaluation. Here, we describe our ongoing e orts towards building a customizable ontology benchmark for OWL 2 reasoners named OWL2Bench5 (to be presented at the ISWC 2020 Resources Track) [8]. We also brie y discuss the planned future extensions to the benchmark.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>OWL2Bench</title>
      <p>
        OWL2Bench is an extension of the well known University Ontology Benchmark
(UOBM) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It consists of three major components: a xed TBox for each OWL
2 pro le (EL, QL, RL, and DL), an ABox generator that can generate ABox of
varying sizes for the corresponding TBox, and a xed set of 22 SPARQL queries
that involve reasoning. Thus, it allows users to benchmark three aspects of the
reasoners - support for di erent OWL 2 pro les, scalability in terms of ABox
size, and query performance. Moreover, the set of SPARQL queries also enables
benchmarking of SPARQL query engines that support OWL 2 reasoning. The
TBox for each pro le was created by enriching UOBM's university ontology
with the supported constructs. In order to generate varying size ABox, two user
inputs are required, the number of universities and the OWL 2 pro le (EL, QL,
RL, or DL) of interest. The generated instance data complies with the schema
de ned in the TBox of the selected pro le, and the size depends on the number of
universities. For one university, by default, approximately 50,000 ABox axioms
are generated.
      </p>
      <p>
        To demonstrate the utility of OWL2Bench, we ran our benchmark on six
reasoners, ELK [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], HermiT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], JFact6, Konclude [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Openllet7, and Pellet [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
for three reasoning tasks, i.e., consistency checking, classi cation, and realisation.
We also evaluated two SPARQL query engines, Stardog8 and GraphDB9, on
SPARQL queries in terms of their loading time and query response time. During
our evaluation, we identi ed possible issues with these systems (some of which
have already been communicated with the developers) that need to be xed and
could pave the way for further research in the development of reasoners and
query engines. The performance of the reasoners on OWL2Bench is shown in
Figure 1.
      </p>
      <p>
        For our experiments, we set the heap space to 24 GB and time-out to 90
minutes. Most of the reasoners timed-out for even a small number of universities
(except for QL pro le). Although Konclude is much faster, it requires a lot of
memory and could not perform any reasoning task after 50 universities. For the
EL pro le, both Konclude and ELK performed exceptionally well in terms of
time taken, but ELK is better due to its low memory requirements. For the RL
pro le, most reasoners timed-out on larger ontologies. In the case of OWL 2 DL,
Konclude, HermiT, and Pellet were able to complete the consistency checking
task only (for 1, 2, and 5 universities, respectively). However, we observed some
inconsistency in the results of Pellet. Other evaluations were time-outs. More
details about the benchmark and the results are available in the full-version of
our paper [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Extensions</title>
      <p>When selecting a reasoner to use, there could be many considerations. One of the
approaches is to select the best possible reasoner in terms of its e ciency and
scalability. OWL2Bench can be used for this requirement. The other approach is
6 http://jfact.sourceforge.net/
7 https://github.com/Galigator/openllet
8 https://www.stardog.com/
9 http://graphdb.ontotext.com/</p>
      <p>ELK
HermiT
Konclude
Openllet
Pellet
(a) OWL 2 EL (RT)
(d) OWL 2 RL (CC)
(e) OWL 2 DL (CC)</p>
      <p>No. of Universities
to check the performance of a reasoner on a set of language constructs that could
possibly be of interest to the user. Another possibility is to check the performance
of a reasoner under a given set of constraints, for example, time-taken or memory
consumed during the reasoning process. We propose the following two extensions
that address the last two requirements.
3.1</p>
      <p>
        Customizable Selection of OWL 2 Language Constructs
The idea behind this approach is inspired by OntoBench [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. It provides a web
interface to the users to select the constructs of their choice and generates an
ontology according to the selected constructs. However, the primary purpose
of OntoBench is to test the reasoner coverage and not to evaluate the
performance and the scalability of the reasoners. We propose a customizable TBox
generator. Here, along with the choice of constructs, the user can specify the
individual count for each selected construct. Note that, instead of continuing
with OntoBench's general approach and its naming convention for the generated
classes and properties, we propose to use our benchmark's university ontology.
The axioms generated by OntoBench lack interconnections that are necessary
to test the e ciency of the reasoners. However, a university ontology consists
of concepts that describe a university (college, department, faculty, etc.) and
the relationships between them. Thus, the axioms in the generated ontology
would have su cient interconnections for performance evaluation. Moreover, a
domain-speci c ontology improves the readability of large ontologies.
      </p>
      <p>In this approach, we rst create a bucket for each OWL 2 construct. The
TBox axioms of our benchmark already cover all the major OWL 2 constructs.
We start putting these axioms into the bucket of the construct involved. For
example, axioms like Faculty v 9worksFor.College would be put into the
bucket of existential restriction. If the user chooses existential restriction, then
axioms from this bucket would be picked. Note that, along with each axiom,
some related axioms also get generated. For example, along with Faculty v
9worksFor.College, axioms such as Faculty v Employee, College v
University, and University v Organization would also get generated. If the count
exceeds the number of axioms in the bucket, then the axioms can be repeated with
a di erent naming convention such as Faculty 1 v 9worksFor.College 1, and
Employee 1 v 9worksFor.Organization 1. Furthermore, we would also
consider the interactions and the interplay between the axioms. It is possible that
reasoners can handle a particular set of axioms generated for a certain set of
constructors well. However, with the same set of constructors, a di erent set
of axioms could cause a blow up for the reasoner because these axioms result
in interactions that did not occur in the previous case. Therefore, we plan to
generate di erent sets of axioms (more than one ontology) for the user-selected
constructors. Since the bucket for each constructor consists of several axioms, it
is feasible to generate multiple ontologies. However, there are some design
decisions that still need to be made, such as, should both Faculty and Faculty 1 be
a subclass of Employee or should they be subclass of Employee and Employee 1
respectively. This needs to be investigated so that the benchmark does not
generate a large number of unnecessary side axioms.
3.2</p>
      <p>
        Generating Ontologies based on their Hardness
We de ne three levels (easy, medium, and hard) to categorize an ontology. These
are de ned based on reasoner performance metrics such as the time taken and
the memory consumed while reasoning over an ontology. The rst step is to
determine the distinguishing features of an ontology that can help in de ning
the three levels. For this purpose, we plan to make use of existing work on
determining di erent aspects of the size and structural characteristics of an
ontology that a ect the reasoner's performance [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Several ontology features
such as the number of named entities (classes, properties), class unions, depth of
class and property hierarchy, the ratio of object and data properties, etc., have
been studied for their impact on the reasoning performance. A large number of
ontologies from di erent repositories (ORE dataset10, AberOWL11) would be
run on di erent OWL 2 reasoners for the three reasoning tasks: classi cation,
consistency checking, and realisation. The results obtained in terms of the time
taken and the memory consumed, serve as the basis to cluster these ontologies
under the three levels, i.e., easy, medium, and hard. Each cluster would have a
di erent range of values for the ontology features.
10 http://doi.org/10.5281/zenodo.18578
11 http://aber-owl.net/ontology/
      </p>
      <p>The OWL2Bench users would be provided with an option to choose any one
of the hardness levels, along with the total number of axioms in the ontology.
Based on these inputs, and the values of ontology features associated with that
hardness level, OWL2Bench automatically generates an ontology that can be
used to benchmark the OWL 2 reasoners. The approach used would be similar
to Section 3.1. The only di erence is that instead of directly considering the type
and the number of each construct from the user as inputs, we try to balance the
axioms in such a way that the values of the features in the generated ontology
comply with the ranges speci ed in that particular hardness level.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            , B., S.,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>HermiT: An OWL 2 Reasoner</article-title>
          .
          <source>Journal of Automated Reasoning</source>
          .
          <volume>53</volume>
          (
          <issue>3</issue>
          ),
          <volume>245</volume>
          {
          <fpage>269</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , He in, J.:
          <article-title>LUBM: A Benchmark for OWL Knowledge Base Systems</article-title>
          .
          <source>Journal of Web Semantics</source>
          .
          <volume>3</volume>
          (
          <issue>2-3</issue>
          ),
          <volume>158</volume>
          {
          <fpage>182</fpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Krotzsch,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Parsia</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Patel-Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Rudolph</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.:</surname>
          </string-name>
          <article-title>OWL 2 Web Ontology Language Pro les</article-title>
          (
          <issue>Second Edition)</issue>
          (
          <year>2012</year>
          ), https://www.w3.org/ TR/owl2-primer/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>Y.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krishnaswamy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sawangphol</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.F.</given-names>
          </string-name>
          :
          <article-title>Understanding and improving ontology reasoning e ciency through learning and ranking</article-title>
          .
          <source>Information Systems</source>
          <volume>87</volume>
          ,
          <issue>101412</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kazakov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Krotzsch,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Simanc k</surname>
          </string-name>
          , F.:
          <article-title>The Incredible ELK</article-title>
          .
          <source>Journal of Automated Reasoning</source>
          .
          <volume>53</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>61</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Link</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lohmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          , H.:
          <article-title>OntoBench: Generating Custom OWL 2 Benchmark Ontologies</article-title>
          . In: International Semantic Web Conference. pp.
          <volume>122</volume>
          {
          <issue>130</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiu</surname>
            ,
            <given-names>Z .</given-names>
          </string-name>
          and
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Towards a Complete OWL Ontology Benchmark</article-title>
          .
          <source>In: The Semantic Web: Research and Applications</source>
          . pp.
          <volume>125</volume>
          {
          <fpage>139</fpage>
          . Springer Berlin Heidelberg (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhatia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mutharaju</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>OWL2Bench: A Benchmark for OWL 2 Reasoners</article-title>
          .
          <source>In: The Semantic Web - ISWC 2020 - 19th International Semantic Web Conference, ISWC 2020. Lecture Notes in Computer Science</source>
          , Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Pellet: A practical OWL-DL reasoner</article-title>
          .
          <source>Journal of Web Semantics</source>
          .
          <volume>5</volume>
          (
          <issue>2</issue>
          ),
          <volume>51</volume>
          {
          <fpage>53</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Steigmiller</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liebig</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Konclude: System description</article-title>
          .
          <source>Journal of Web Semantics</source>
          .
          <volume>27</volume>
          ,
          <issue>78</issue>
          {
          <fpage>85</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Yadav</surname>
            ,
            <given-names>R.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mutharaju</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhatia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Towards a Concurrent Approximate Description Logic Reasoner</article-title>
          .
          <source>In: Proceedings of the ISWC</source>
          <year>2019</year>
          <article-title>Satellite Tracks (Posters &amp; Demonstrations, Industry, and Outrageous Ideas)</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2456</volume>
          , pp.
          <volume>145</volume>
          {
          <issue>148</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>