<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Empirical Robustness of Description Logic Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rafael S. Gonc¸alves</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolas Matentzoglu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bijan Parsia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Uli Sattler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science, University of Manchester</institution>
          ,
          <addr-line>Manchester</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In spite of the recent renaissance in lightweight description logics (DLs), many prominent DLs, such as that underlying the Web Ontology Language (OWL), have high worst case complexity for their key inference services. Modern reasoners have a large array of optimization, tuned calculi, and implementation tricks that allow them to perform very well in a variety of application scenarios, even though the complexity results ensure that they will perform poorly for some inputs. For users, the key question is how often they will encounter those pathological inputs in practice, that is, how robust are reasoners. We attempt to determine this question for classification of existing ontologies as they are found on the Web. It is a fairly common user task to examine ontologies published on the Web as part of their development process. Thus, the robustness of reasoners in this scenario is both directly interesting and provides some hints toward answering the broader question. From our experiments, we show that the current crop of OWL reasoners, in collaboration, is very robust against the Web.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>
        A serious concern about both versions 1 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and 2 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] of the Web Ontology Language
(OWL) is that the underlying description logics (SHOIQ and SROIQ) exhibit
extremely bad worst case complexity (NEXPTIME and 2NEXPTIME) for their key
inference services. While since the mid-1990s, highly optimized description logic
reasoners have been exhibiting rather good performance in real cases, even in those more
constrained cases there are ontologies (such as Galen) which have proved impossible to
process for over a decade. Indeed, concern with such pathology stimulated a renaissance
of research into tractable description logics with the E L family [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and the DL Lite [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
family being incorporated as special “profiles” of OWL 2. However, even though the
number of ontologies available on the Web has grown enormously since the
standardization of OWL, it is still unclear how robust modern, highly optimized reasoners are to
such input. Anecdotal evidence suggests that pathological cases are common enough to
cause problems, however, systematic evidence has been scarce.
      </p>
      <p>
        In this paper we investigate the question of whether modern, highly-optimized
description logic reasoners are robust over Web input. The general intuition of a robust
system is that it is resistant to failure in the face of a range of input. For any particular
robustness determination, one must decide: 1) the range of input, 2) the functional or
non-functional properties of interest, and 3) what counts as failure. The instantiation
of these parameters strongly influences robustness judgements, with the very same
reasoner being highly robust under one scenario and very non-robust under another. For our
current purposes, the key scenario is that an ontology engineer, using a tool like Prote´ge´
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], is inspecting ontologies published on the Web with an eye to possible reuse, and,
as is common, they wish to classify the ontology using a standard OWL 2 DL reasoner
as part of their evaluation. This scenario yields the following constraints: 1) for input,
we examine Web-based corpora, 2) functional: acceptance (will the reasoner load and
process the ontology); non-functional: performance (i.e., will the reasoner complete
classification before the ontology engineer gives up), 3) w.r.t. acceptance, failure means
either rejecting the input or crashing while processing, and we might reasonably expect
an engineer to wait up to 2 hours if the ontology seems “worth it”. If a reasoner (or a set
of reasoners) is successful for 90% of a corpus, we count that reasoner as robust over
that corpus, with 95% and 99% indicating “strong” and “extreme” robustness. While
these levels are clearly arbitrary (as is the timeout), they provide a framework to set
expectations. Robustness under these assumptions does not ensure robustness under other
assumptions (e.g., over subsets of these ontologies as experienced during development
or over a more stringent time constraint), yet they are challenging enough that it was
unclear to us ex ante whether any reasoner would be robust for any corpus.
      </p>
      <p>In fact, we find that the reasoners are robust or near robust for most of the cases
we examine, including for lower timeouts. More significantly, if we take the best result
for each ontology (which represents a kind of “meta-reasoner”, where our test
reasoners are run in parallel), then the set of reasoners is extremely robust over all corpora.
Thus, in a fairly precise, if limited, sense, we demonstrate that classification over OWL
ontologies (even those based on highly expressive description logics, such as SHOIQ
and SROIQ) is practical, even despite the worst case being intractable in some cases.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <p>
        Overall we have processed a total of 1,071 ontologies, the largest such reasoner
benchmark (similar benchmarks typically use at most a few hundred ontologies, e.g., the
recent study in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]), having found that amongst the 4 tested reasoners Pellet is the most
robust of all (see Table 1). Surprisingly, Pellet is followed by JFact on our robustness
test, due to having far less errors than FaCT++. HermiT and FaCT++ have the same
overall robustness, but FaCT++ has less errors and higher impatient robustness.
      </p>
      <p>While Pellet is the most robust reasoner, we urge some caution in that reading. In
particular, this does not mean that Pellet will always do best or even perform reasonably.
In fact, it may timeout where other reasoners finish reasonably fast. The set of reasoners
(taken together and considering the best results) is extremely robust across the board
(for each reasoner’s contribution to the best case reasoner, see Figure 1). Thus, we have
strong empirical evidence that the ontologies on the Web do not supply any in principle
intractable cases, but only cases which are difficult for particular reasoners.</p>
      <p>Note that FaCT++ and JFact fail to process several ontologies due to poor support
for OWL 2 datatypes. Both of these reasoners, as well as HermiT, seem to have little
support for OWL 1 datatypes. By removing the non OWL 2 datatype errors, we would
end up with FaCT++ being the most robust w.r.t. OWL 2, followed by HermiT and
Pellet. That is, if we restrict the test corpus to those ontologies that use only datatypes
from the OWL 2 datatype map, then FaCT++ would be the most robust reasoner.</p>
      <p>Very Easy</p>
      <p>Easy
Medium</p>
      <p>Hard
Very Hard
Timeout
Errors</p>
      <p>Pellet HermiT JFact FaCT++ Best Combo Worst Combo
787 (73%) 706 (66%) 741 (69%) 784 (73%) 878 (82%) 645 (60.2%)
116 (11%) 112 (10%) 103 (10%) 55 (5%) 73 (7%) 102 (9.5%)
83 (8%) 65 (6%) 43 (4%) 94 (9%) 101 (9%) 45 (4.2%)
24 (2%) 53 (5%) 76 (7%) 7 (1%) 6 (1%) 75 (7.0%)
6 (1%) 4 (0%) 1 (0%) 4 (0%) 4 (0%) 3 (0.3%)
29 (3%)
14 (1%)
16 (1%)</p>
      <p>15 (1%)
26 (2%) 117 (11%) 91 (8%) 112 (10%)
Total (excl. Errors)
Total (incl. Errors)
1016
1071
940
1071
964
1071
944
1071
Impatient Robustness 92% 82% [90%] 83% 87% [96%] 98% 74% [87%]
Overall Robustness 95% 88% [96%] 90% 88% [97%] 99% 81% [96%]
Table 1: Binning of all three corpora: BioPortal, NCIt (2013), and Web crawl. Under
robustness rows, values in square brackets indicate robustness w.r.t. OWL 2 alone.
9 (1%)
0 (0%)
1062
1071
25 (2.3%)
176 (16.4%)
870
1071</p>
      <p>From Figure 1 we see that FaCT++ outperforms other reasoners on many occasions,
but, due to the high number of errors thrown, its robustness w.r.t. our input data is not
as good as this figure might indicate. In Figure 2 we show the frequency with which
reasoners are the worst case in each corpus: Notice that FaCT++ is, overall, less often
the worst reasoner, followed by HermiT. However, HermiT and JFact both dominate the
worst cases in the NCIt corpus. Pellet, while being most often the worst case reasoner in
the Web Crawl corpus, is so (in many cases) by a mere fraction of a second; as pointed
out it is the most robust for that corpus.</p>
      <p>It is clear that deriving a sensible ranking even simply using average or total time
is not straightforward. Our results have rather strong implications for reasoner
experi500  
400  
100  </p>
      <p>0  
ments, especially those purporting to show the advantages of an optimisation or a
technique or an implementation: The space is very complex and it is very easy to
simultaneously generate a biased sample for one system and against another. Even simple,
seemingly innocuous things like timeouts and classification failures require tremendous
care in handling. If results are going to be meaningful across papers we need to converge
on experimental inputs, methods, and reporting forms.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brandt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Pushing the EL envelope</article-title>
          .
          <source>In: Proc. of IJCAI-05</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Tractable reasoning and efficient query answering in description logics: The DL-Lite family</article-title>
          .
          <source>J. of Automated Reasoning</source>
          <volume>39</volume>
          (
          <issue>3</issue>
          ),
          <fpage>385</fpage>
          -
          <lpage>429</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Parsia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Patel-Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.F.</given-names>
            ,
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <surname>U.</surname>
          </string-name>
          :
          <article-title>OWL 2: The next step for OWL</article-title>
          .
          <source>J. of Web Semantics</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel-Schneider</surname>
            ,
            <given-names>P.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <string-name>
            <surname>From</surname>
            <given-names>SHIQ</given-names>
          </string-name>
          and
          <article-title>RDF to OWL: The making of a web ontology language</article-title>
          .
          <source>J. of Web Semantics</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <fpage>7</fpage>
          -
          <lpage>26</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>Y.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krishnaswamy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Predicting reasoning performance using ontology metrics</article-title>
          .
          <source>In: Proc. of ISWC-12</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Knublauch</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fergerson</surname>
            ,
            <given-names>R.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Musen</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>The Prote´ge´ OWL plugin: An open development environment for semantic web applications</article-title>
          .
          <source>In: Proc. of ISWC-04</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>