<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Empirical Investigation of Di culty of Modules of Description Logic Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicolas Matentzoglu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bijan Parsia</string-name>
          <email>bparsia@cs.manchester.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Uli Sattler</string-name>
          <email>sattler@cs.manchester.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The University of Manchester Oxford Road</institution>
          ,
          <addr-line>Manchester, M13 9PL</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Very expressive Description Logics in the SH family have worst case complexity ranging from EXPTIME to double NEXPTIME. In spite of this, they are very popular with modellers and serve as the foundation of the Web Ontology Language (OWL), a W3C standard. Highly optimised reasoners handle a wide range of naturally occurring ontologies with relative ease, albeit with some pathological cases. A recent optimisation trend has been modular reasoning, that is, breaking the ontology into hopefully easier subsets with a hopefully smaller overall reasoning time (see MORe and Chainsaw for prominent examples). However, it has been demonstrated that subsets of an OWL ontology may be harder { even much harder { than the whole ontology. This introduces the risk that modular approaches might have even more severe pathological cases than the normal monolithic ones. In this paper, we analyse a number of ontologies from the BioPortal repository in order to isolate cases where random subsets are harder than the whole. For such cases, we then examine whether the module nearest to the random subset also exhibits the pathological behaviour.</p>
      </abstract>
      <kwd-group>
        <kwd>Modular Reasoning</kwd>
        <kwd>Ontologies</kwd>
        <kwd>Modules</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Reasoning in popular, very expressive Description Logics is very di cult (e.g.,
SROIQ is N2Exptime-complete) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Modern reasoners such as FaCT++ [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ],
Pellet [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and HermiT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] generally perform well against naturally occurring
inputs such as ontologies downloadable from the Web. However, due to the poor
performance in some cases, the quest for optimisations is ongoing.
      </p>
      <p>
        Modular reasoning techniques have experienced a resurgence in recent years.
Intuitively, breaking the input problem into smaller pieces is appealing. From a
naive perspective, it seems that keeping the inputs small is a way to deal with
high worst case complexity. Furthermore, if there are especially di cult parts
of the ontology, perhaps they can be isolated to reduce their impact. However,
classifying a subset S of an ontology O can be, in fact, much more expensive
than classifying O entirely [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. There are a number of reasons why classi cation
time could vary in this way. For example, a disjointness axiom may be missing
from the random set and thus result in subsumption tests that are computed
needlessly (in the worst case almost the entire power set of possible atomic
subsumptions of entities in a signature, if the disjointness happens high up in
the hierarchy).
      </p>
      <p>This raises a question for modular reasoning: Do modules ever exhibit such
a pathological behaviour? The modules currently used in modular reasoners are
depleting, that is, contain every axiom that could participate in a derivation of an
entailment from that module. They are also subsets of O. Our conjecture is that
because of this, reasoners should never perform worse on a module than on the
whole O. In this paper, we test this conjecture on ontologies from the BioPortal
repository. Our results indicate that while generally modules are \easy", there
are cases where a module has a classi cation time worse than that of O. While
our current results suggest that some modular reasoning techniques are not
vulnerable to this phenomenon, others require further investigation.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>Understanding our experiment does not more than a cursory understanding of
the syntax, semantics, and proof theories implemented: For the most part, we
are treating these as black boxes. Some key points:
{ The reasoners under investigations are designed to implement key reasoning
services for the Web Ontology Language (OWL).
{ The services we test ignore all non-logical aspects of OWL 2, thus OWL 2
ontologies can be regarded as notational variants of SROIQ theories (with
some minor syntactic caveats, e.g., OWL ontologies are sets of axioms).
{ Given an ontology (a set of axioms) O, the signature of an ontology Oe is the
set of non-prede ned entities (classes, individuals, object properties, data
properties) appearing in the axioms in O. We use CL(O) to denote the task
of computing the set of entailed atomic subsumptions (i.e., statements of
the form A v B and A B, where A and B are elements of the signature)
or classi cation of O. We use T (X) to denote the time taken for a given
process to terminate. For example T (CL(O)) denotes the time to classify O
(for a given reasoner).</p>
      <p>
        There are many sorts of \logically respectable" modules presented in the
literature, most based on deductive conservative extensions. Here we restrict our
attention to modules based on syntactic locality [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], because they are logically
sound, work for all of OWL and often behave well in practice [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], though not
always [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Among these kinds of modules, we are particularly interested in
?modules (bottom-modules), because they preserve all subsumptions over their
signatures with an element of the signature on the left hand side.
      </p>
      <p>As a consequence, if the union of the signatures of a set of ?-modules equals
the signature of O, then the union of the individual classi cation results equals
the classi cation of the whole ontology.</p>
      <p>Hereafter, we will use M to refer to syntactic locality-based ?-modules.
Additionally, we will use S to denote an arbitrary random subset of O and H to
denote a random subset whose reasoning time is greater than that of O.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Modern Modular Reasoning</title>
      <p>
        We focus on TBox reasoning in locality module based systems for several reasons.
First, at least one system (Pellet) has been using them for incremental reasoning
in production systems successfully for years [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Second, two new systems, MORe
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and Chainsaw [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], make use of them for non-incremental classi cation and
have seen at least some success. (e.g., each won a category at the 2013 Ontology
Reasoner Evaluation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].)
      </p>
      <p>The overarching rationale for exploiting the modular structure of an ontology
for reasoning is the potential in (1) reducing the number of necessary
subsumption tests and (2) reducing the hardness of each individual subsumption test, thus
improving performance (reasoning time and potentially memory consumption).
Additionally, modularisation may make exploiting parallelism easier, though this
has not been seriously explored to our knowledge.</p>
      <p>
        Chainsaw and MORe use quite di erent strategies in exploiting modules.
MORe works by building two modules: the largest easy-to- nd E L module and
the \complement" module which covers the rest of the signature. The E L
module is fed to a fast E L speci c reasoner (typically ELK[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]) while the
complement module is fed to a general purpose OWL 2 reasoner (HermiT or Pellet). If
the complement module covers most of the ontology, then there is a signi cant
amount of redundant computation in addition to the overhead and thus MORe
will perform comparatively poorly to the monolithic reasoner.
      </p>
      <p>
        In contrast, Chainsaw employs the \Atomic Decomposition" (AD) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] of
an ontology. The AD is a partitioning of the ontology into so called genuine
modules (modules that cannot further be decomposed into the union of two
\ "-incomparable modules). The AD is represented as a directed graph where
each node corresponds to an atom and determines the corresponding genuine
module. The set of genuine modules covers the signature of O, thus is a suitable
candidate for modular classi cation. While decomposing an ontology can be done
in polynomial (quadratic) time, the reality is that the overhead introduced by
such a partitioning can be substantial in some cases. Furthermore, the existing
implementation can also experience considerable redundancy due to overlapping
genuine modules.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Module Hardness</title>
        <p>
          While the existence for example of Hot Spots [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] suggests that decomposition
might have dramatically bene cial e ects, it was also empirically observed in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
(and has been known anecdotally for years) that subsets of an ontology might be
dramatically harder than the whole. This fact is not terribly surprising if one
recalls the easy-hard-easy pattern for satis ability of random KCNF propositional
formulae where the hard region corresponds to the phase transition from
satisability to unsatis ability. As the generated formulae move from low \density"
(i.e., a low ratio of number of clauses (L) to number of distinct variables (N)) to
high density, the size of the formulae grows (for a xed N). Thus, we can easily
see systems where random subsets of a large problem are much harder than the
whole. While there is, as yet, no detailed theory (or even sketch of a theory) for
hardness in naturally occurring Description Logic ontologies, we can expect, to
a certain extent, that similar mechanisms may be at work.
        </p>
        <p>Obviously, given an ontology O, if T (CL(Mi)) in a decomposition Dec(O) is
larger than T (CL(O)), then modular reasoning is counterproductive in that case
and thus e orts to tune the decomposition and other overhead is pointless. The
goal of this paper is to investigate the general case: Are modules of an ontology
ever (or likely) to be harder than O itself? In contrast with random subsets,
there are some prima facie reasons for thinking the answer is no if we consider
some possible sources of hardness: (1) If the reasoner is using standard enhanced
traversal to classify the ontology, a random subset might omit an explicitly
asserted subsumption or, perhaps more importantly, disjointness axiom which
considerably prunes the search space. Modules must contain all such axioms over
their signature in order to capture those entailments. (2) A random subset might
break some justi cation of an atomic subsumption over its signature leaving
either only harder ones or leaving enough of a justi cation that the reasoner has
to do a lot of work before it can determine it will not pan out.</p>
        <p>In general, since a module contains everything from the ontology relevant
to entailments over its signature, it seems reasonable to think that reasoning
should be no harder. No shortcuts can be missing and we have removed a lot
of potential distractions. However, these considerations are merely speculative.
In this paper, we attempt to establish whether there are hard subsets that are
modules and to start to sketch their prevalence and other properties. If we can
establish the phenomenon, then we have laid the groundwork for causal
investigations. We would like to note that we are not engaged in either MORe or
Chainsaw algorithm optimisation at this point. It could well be that, if there are
hard modules, that they are not the sort of modules that either technique uses.
Similarly, even if there are no hard modules, that does not mean that modules
generally will be \easy enough" to overcome the overhead challenges.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Materials and Experimental Design</title>
      <p>The conjecture under test is that given an ontology, O, reasoner R, then there
is no module M from that ontology such that the T (CL(O)) &lt; T (CL(M)) for
R. It's trivially the case that there is a M such that T (CL(O)) = T (CL(M))
since O is a module for itself.</p>
      <p>
        The number of modules in an ontology can be very large (in principle,
exponential in the number of axioms or the signature; this has been veri ed
empirically [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]), thus pure uniform sampling is unlikely to detect counterexamples
to our conjecture if they are rare. Given that empirically hard arbitrary subsets
are not particularly common either within ontologies (i.e., most subsets are not
harder than the whole) or across ontologies (i.e., most ontologies do not have
hard subsets), we need a more directed approach. Our basic approach is to look
for hard arbitrary subsets using the techniques described in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We then extract
the module for the signature of that hard subset, that is, the nearest superset
of the hard subset that is a module. If the speculations on sources of hardness
are right, at least in spirit, we should see a big drop in classi cation time going
from the hard subset to the module. Of course, this procedure cannot rule out
that other modules might be hard (it is, intentionally, a biased sample), but at
least the modules we extract are in a known region of hard subsets.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Corpus Selection</title>
        <p>
          We conducted our study on a corpus of 363 BioPortal ontologies (August 2013),
serialised into OWL/XML by the OWL API (3.4.8) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. BioPortal [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] is a open
repository for accessing and sharing biomedical ontologies, developed in
languages such as OWL, RDF or OBO. It is the most prominent and active
collection of naturally occurring ontologies on the web, and serves as the go-to point
for many empirical investigations, such as studies on modularity or reasoner
proling [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. BioPortal contains a range of interesting sizes and expressivities (see
Figure 1).
As we are primarily interested in classi cation, we considered only the TBoxes
of the ontologies (we stripped O of all ABox axioms). In order to handle large
numbers of subsets, we indexed the axioms using annotation properties and
physically manifested the subsets (or modules) only as lists of axiom identi ers.
We classi ed the entire set of 363 ontology TBoxes using Pellet 2.3.1 and
HermiT 1.3.8, and then ltered them, separately for each reasoner. We excluded
ontologies that (1) not OWL DL, (2) for a given reasoner classi ed in less than
5 seconds (relatively easy), (3) for a given reasoner did not classify within 20
minutes (feasible), (4) that merely contained a class hierarchy (only atomic
SubClassOf axioms and EquivalentClasses axioms) (expressive).
        </p>
        <p>
          43 ontologies were ltered out as being OWL Full using the pro le checker in
the OWL API tools ([
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]). In order to classify an interesting amount of subsets,
and given our informed assumption that there are subsets that are harder than
the whole, an infeasible amount of resources would be required if ontologies were
included that were signi cantly harder than 20 minutes.
4.3
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Finding Hard Subsets</title>
        <p>
          In order to nd hard subsets of ontologies, we employed the technique described
in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. We randomly sampled 3 paths from each of the ontologies in order to
identify ontologies with hard regions. A path was sampled by slicing the ontology
O (the set of logical axioms) into size regions relative to the size of O, in our
case eighths. The rst slice corresponds to 1/8 of the ontology, the second slice
to 2/8 and so on, the full TBox to 8/8. A random eighth of the set of (logical)
axioms was picked from the ontology and exported into a new ontology. Then a
second eighth of the remaining axioms was drawn (randomly) and added to the
rst eighth to get the 2/8 slice of the ontology, and so on. A path thus consists
of eight cumulatively grown subsets.
        </p>
        <p>We then classi ed all the subsets generated this way to identify hard subsets
(subsets for which the reasoning time exceeds the reasoning time of the whole
TBox). Our assumption was that if we nd such a hard subset, it is an indicator
for a hard region. For example, if we nd a hard subset that correspond to 6/8
of the ontology, we assume that there might be more hard subsets to be found
in that region. Once we identi ed hard subsets, we sampled 50 random subsets
from their corresponding regions and obtained their classi cation time.
4.4</p>
      </sec>
      <sec id="sec-4-3">
        <title>Classifying the Modules</title>
        <p>For each hard subset H identi ed in the previous step, we extracted a ?-module
M with the syntactic locality-based module extractor embedded in FaCT++,
using the signature of the hard subset as seed. We then obtained the classi cation
time for M. In principle M should be a superset or equal to H. However, if H
contains axioms that are tautologies, they generally do not make it into the
module. The most prominent cases are axioms of the form A v ( 0R:C), or
simply A v &gt;.
4.5</p>
      </sec>
      <sec id="sec-4-4">
        <title>Experiment Machine Speci cation</title>
        <p>A Dell Optiplex 790, with Ubuntu Release 12.10 (quantal) 64-bit, Memory: 15.6
GiB (1333MHz, DDR3), Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz x 4 was
used for the actual benchmarking (obtaining the classi cation times). Every
single classi cation was done in a separate isolated virtual machine (Java 7,
-Xms2G, -Xmx12G). Network and other unnecessary system services were
disabled to prevent background processes from a ecting reasoning performance.
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <sec id="sec-5-1">
        <title>Hard Subset Finding</title>
        <p>The ltering process described in Section 4.2 resulted in 21 ontologies for HermiT
and 15 ontologies for Pellet from the BioPortal corpus. Out of these, we found
hard subsets using the technique described in Section 4.3 for three ontologies
with Pellet and three ontologies for HermiT. For Pellet we had to check a total
of 315 subsets to nd 4 hard subsets (all in region 7), for HermiT a total of 441
to nd 3 hard subsets (two in region 7, one in region 6). It should be noted that
there were a minor amount of failed classi cations (6 for Pellet, 1 for HermiT),
which were either due to a stack over ow or a timeout ( xed at 20 min, wall
clock time). They were excluded from the analysis. There are two things to
observe at this point: (1) hard subsets are relatively rare and (2) they are found
in the higher regions that correspond to 6/8 or 7/8 of the ontology. As for the
rst observation, it should be noted that a lot of subsets under consideration
corresponded to less than half of the size of the ontology (252 for HermiT, 180
for Pellet). Although it was not clear whether hard subsets might be hidden
in these regions, it was also not surprising that they did not turn out to be.
In particular, a random sample is likely to select unrelated axioms, thus small
subsets are unlikely to be tightly constrained.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Performance Heterogeneity</title>
        <p>
          Goncalves et al [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] observed that ontologies which have non-linear path pro les
are highly likely to be performance heterogenous, and thus have Hot Spots.
Performance heterogeneity also seems to correlate with having hard subsets. We
observed the performance pro les exhibited by the paths (i.e. the classi cation
times of the cumulative grown 1/8, 2/8, ..., 8/8 of O). In principle we could
count a local peak in a path that grows non-monotonically as a candidate for a
region of hard subsets, but for experiment resource related reasons we restricted
our attention to regions were a path had a peak that was harder than the whole
ontology { in such cases we knew that there was at least one hard subset.
        </p>
        <p>For HermiT, 3 out of the 63 paths (3 for each of the 21 ontologies in the set)
contain hard subsets and 5 have some non-monotonic characteristics (i.e. there
was a drop in reasoning time from one subset to its successor). The pro les of
the ontologies that contained hard subsets can be seen in Figure 2. 16 paths have
a Pearson product-moment correlation coe cient higher than 0.85 (between the
relative size of the subset and the reasoning time), i.e. T (CL(S)) and jSj have
a strong linear dependence, while a total of 7 paths have a coe cient between
0.5 and -0.5, indicating a relatively weak linear relationship. For Pellet, 4 out of
the 45 paths have hard subsets, 10 of the respective performance graphs have
non-monotonic characteristics, 11 paths have a high Pearsons coe cient and 9
a low one. Figure 2 shows the pro les of the three ontologies that contain hard
subsets.</p>
        <p>SIHPO</p>
        <p>Cell Ontology
We selected a sample of 50 subsets from the identi ed hard regions for each
ontology/reasoner pair for a total of 300 subsets. We then tested each subset for
hardness. For HermiT, a total of 25 hard subsets were identi ed, for Pellet 42.
We subsequently classi ed these subsets and their corresponding modules. To
mitigate unanticipated experimental e ects, we classi ed the subset / module
pairs three times, as well as the ontology. Details can be found in Figure 5.3.
The timeout was set to 30 minutes.</p>
        <p>Each grouped column on the x-axis in the gures represents the classi cation
times of a pair (the subset and its corresponding module). The y-axis represents
the classi cation time in seconds. The red line CT (O) (CT(X) as an equivalent
shorter form of T (CL(X))) represents the median classi cation time of the entire
ontology measured in the three runs mentioned earlier. If one of the other runs for
classi cation of O deviated from the median by more than 5%, we included the
value in the plot (CT(O)+ standing for the highest classi cation time obtained
for the ontology, and CT(O)- the lowest).
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Discussion</title>
      <p>We call a case (a subset/module/ontology triplet) strictly con rmatory of our
hypothesis if the maximum classi cation time measured for a module is lower
than the minimum classi cation time measured for the entire TBox. We call a
case tendentiously con rmatory if the median classi cation time measured for
a module is lower than the median classi cation time measured for the entire
TBox.</p>
      <p>The rst, and most striking, observation is that only three of the six
ontology/reasoner pairs are strictly con rmatory of our hypothesis as a whole (contain
only strictly con rmatory cases, Biomodels, MDA, SIHPO) and only one other
ontology is tendentiously con rmatory as a whole (Biotop). Biotop/HermiT
shows modules that are easier (at least for the median value) than their
generating subset, but one measurement was taken were the subset was classi ed
faster. Cell ontology/Pellet was strictly con rmatory in 21 out of 27 cases, and
tendentiously con rmatory in another 4 (25/27 in total).</p>
      <p>Gazetteer/HermiT clearly contradict the conjecture. We must thus, barring
experiment error, regard it as false. We found this surprising.</p>
      <p>In all cases, both the hard subset and the module were signi cant fractions
of the ontology, as expected. This makes the results more relevant for reasoners
like MORe which tries to generate large modules. Hard large modules suggest
that MORe may have surprisingly pathological cases. In conversation with the
MORe developers, the threat to performance that they saw was that the
nonE L reasoner might essentially perform as it would on the whole ontology (or
perhaps a little better, but still close) but MORe would also have the cost of
running the E L reasoner. That is, they were concerned with duplicated work.
Our results show that the complement module might take much longer than
the whole ontology. Determining how often this sort of pathological behaviour
may occur is part of our future work. One possibility to explain the hardness of
subsets over their generating module is the presence or absence of search space
pruning axioms. It is at least possible that the existence of an axiom or a group
of axioms in the module reduces the amount of necessary subsumption tests,
either by pruning the traversal space (for example through a disjointness axiom)
or by asserting a subsumption that makes a lot of tests redundant (for example
by making a concept unsatis able) or reduces the amount of non-deterministic
choices. We tried to gather some very initial evidence for these kinds of
explanations by counting the number of SAT calls for a given reasoner/ontology
pair. For Biomodels/HermiT we can con rm that the subset requires between
11057 and 25271 more SAT calls then its module for a complete classi cation
CT(HS)
CT(M)
CT(O)</p>
      <p>CT(O)+</p>
      <p>CT(HS)
CT(M)
CT(O)
60
50
40
30
20
10
0
14
12
10
8
6
4
2
0</p>
      <p>CT(HS)
CT(M)
CT(O)
CT(HS)
CT(M)
CT(O)
CT(HS)
CT(M)
CT(O)
CT(O)(Median 15066.5). In the Biotop/HermiT case, we have 110, in the MDA/Pellet
case 9 and in the SIHPO/Pellet case between 7606 and 42296 (Median 28053.5)
more SAT calls. All of the former were at least tendentiously con rming of our
hypothesis. In the Gazetteer case, where in all but one cases, T (CL(M )) was
higher than T (CL(S)) (strongly contradicting the hypothesis), we can observe
exactly the reverse: we have at between 541 and 624 (Median 578) fewer SAT
calls during the subset classi cation, which at least strengthens the evidence
that the amount of SAT calls correlates somehow with the overall classi cation
time. The CO/Pellet combination appears to draw a more mixed picture. For a
random third (9) of the subset/module pairs, all of which are strictly con
rmatory of our hypothesis, the di erences in SAT calls tend from 6065 more to 6009
less calls, with a Median of 2740 less (ie, does not support this explanation).</p>
      <p>The primary threat to internal validity is that we occasionally observed
nondeterministic reasoning times for the same input. While we believe our current
results are stable, further work needs to be done to ensure that the way we feed
inputs to the reasoner ensures a like behaviour. It could be that we accidentally
have modules triggering a \hard state" of the reasoner while the ontology is
triggering an easy state due to irrelevant features such as order of axioms in the
le (which can have an e ect on absorption). If the reasoner can be kicked into
a state where the reasoning times are reversed, then it is possible that a re ned
version of our conjecture is true.</p>
      <p>The primary threat to external validity is the relatively small sample sets and
ontologies examined as well as the inherent biasing of our sampling strategy. We
will continue to cover more hard subsets and candidate ontology reasoner pairs to
get a fuller picture even though our current work su ces to refute the conjecture.
Given that we start with random, thus not coherent, subsets, it may be the case
that we can nd `interestingly hard" modules by looking at genuine modules or
other guided seed signature strategies. While not telling for the conjecture, they
might help provide insight into sources of di culty.</p>
      <p>With respect to eld signi cance, it seems that it is not unreasonable for
modular reasoner developers to neglect this threat. The comparatively low
number of hard subsets in general and of hard modules suggests that other issues
still dominate.</p>
      <p>In addition to broadening our sampling as wide as possible, the key future
task is to investigate the hard subsets and their modules to see if we can isolate
the sources of pathological or benign behaviour. One strategy is to search for
minimal supersets of hard subsets or modules that are easy. In this way, we can
provide minimal test cases for pathological hardness to reasoner developers.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. owlapitools
          <article-title>- a set of independent add-ons for owl api</article-title>
          . https://github.com/ owlcs/owlapitools. Accessed:
          <fpage>2014</fpage>
          -06-09.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>B.</given-names>
            <surname>Cuenca Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Halaschek-Wiener</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kazakov</surname>
          </string-name>
          .
          <article-title>History matters: Incremental ontology reasoning using modules</article-title>
          .
          <source>In Lecture Notes in Computer Science (including subseries Lecture Notes in Arti cial Intelligence and Lecture Notes in Bioinformatics)</source>
          , volume
          <volume>4825</volume>
          LNCS, pages
          <volume>183</volume>
          {
          <fpage>196</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Del Vescovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Klinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          , U. Sattler,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schneider</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Tsarkov</surname>
          </string-name>
          .
          <article-title>Empirical study of logic-based modules: Cheap is cheerful</article-title>
          .
          <source>In Lecture Notes in Computer Science (including subseries Lecture Notes in Arti cial Intelligence and Lecture Notes in Bioinformatics)</source>
          , volume
          <volume>8218</volume>
          LNCS, pages
          <volume>84</volume>
          {
          <fpage>100</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Goncalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bail</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Matentzoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Glimm</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kazakov</surname>
          </string-name>
          .
          <article-title>Owl reasoner evaluation (ore) workshop 2013 results: Short report</article-title>
          .
          <source>In ORE</source>
          , pages
          <volume>1</volume>
          {
          <fpage>18</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Goncalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Matentzoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>The empirical robustness of description logic classi cation</article-title>
          .
          <source>In CEUR Workshop Proceedings</source>
          , volume
          <volume>1014</volume>
          , pages
          <fpage>197</fpage>
          {
          <fpage>208</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Goncalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>Performance heterogeneity and approximate reasoning in description logic ontologies</article-title>
          .
          <source>In Lecture Notes in Computer Science (including subseries Lecture Notes in Arti cial Intelligence and Lecture Notes in Bioinformatics)</source>
          , volume
          <volume>7649</volume>
          LNCS, pages
          <volume>82</volume>
          {
          <fpage>98</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B.</given-names>
            <surname>Grau</surname>
          </string-name>
          , I. Horrocks,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kazakov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <source>Modular Reuse of Ontologies: Theory and Practice. J. Artif. Intell. Res.(JAIR)</source>
          ,
          <volume>31</volume>
          :
          <fpage>273</fpage>
          {
          <fpage>318</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.</given-names>
            <surname>Horridge</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Bechhofer</surname>
          </string-name>
          .
          <article-title>The OWL API: A Java API for OWL ontologies</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>2</volume>
          :
          <fpage>11</fpage>
          {
          <fpage>21</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kazakov</surname>
          </string-name>
          .
          <article-title>RIQ and SROIQ are harder than SHOIQ</article-title>
          .
          <source>In Principles of Knowledge Representation and Reasoning: Proceedings of the Eleventh International Conference, KR</source>
          <year>2008</year>
          , Sydney, Australia,
          <source>September 16-19</source>
          ,
          <year>2008</year>
          , pages
          <fpage>274</fpage>
          {
          <fpage>284</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kazakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Krotzsch, and</article-title>
          <string-name>
            <given-names>F.</given-names>
            <surname>Simancik</surname>
          </string-name>
          .
          <article-title>The incredible elk - from polynomial procedures to e cient reasoning with ;; ontologies</article-title>
          .
          <source>J. Autom. Reasoning</source>
          ,
          <volume>53</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>61</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>B.</given-names>
            <surname>Konev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Walther</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Model-theoretic inseparability and modularity of description logic ontologies</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>203</volume>
          :
          <fpage>66</fpage>
          {
          <fpage>103</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shearer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          .
          <article-title>Hypertableau reasoning for description logics</article-title>
          .
          <source>J. Artif. Intell. Res. (JAIR)</source>
          ,
          <volume>36</volume>
          :
          <fpage>165</fpage>
          {
          <fpage>228</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. H.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. L.</given-names>
            <surname>Whetzel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dorf</surname>
          </string-name>
          , N. Gri th, C. Jonquet,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Rubin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Storey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Chute</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Musen</surname>
          </string-name>
          .
          <article-title>BioPortal: Ontologies and integrated data resources at the click of a mouse</article-title>
          .
          <source>Nucleic Acids Research</source>
          ,
          <volume>37</volume>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Schneider</surname>
          </string-name>
          .
          <article-title>The modular structure of an ontology: An empirical study</article-title>
          .
          <source>In KR</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Horrocks.</surname>
          </string-name>
          <article-title>MORe: Modular Combination of OWL Reasoners for Ontology Classi cation</article-title>
          .
          <source>In International Semantic Web Conference (1)</source>
          , pages
          <fpage>1</fpage>
          {
          <fpage>16</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. E.
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kalyanpur</surname>
            , and
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Katz. Pellet</surname>
          </string-name>
          :
          <article-title>A practical OWL-DL reasoner</article-title>
          .
          <source>Web Semantics</source>
          ,
          <volume>5</volume>
          :
          <fpage>51</fpage>
          {
          <fpage>53</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>D.</given-names>
            <surname>Tsarkov</surname>
          </string-name>
          and
          <string-name>
            <given-names>I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          . Fact+
          <article-title>+ description logic reasoner: System description</article-title>
          .
          <source>In IJCAR</source>
          , pages
          <volume>292</volume>
          {
          <fpage>297</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>D.</given-names>
            <surname>Tsarkov</surname>
          </string-name>
          and
          <string-name>
            <given-names>I.</given-names>
            <surname>Palmisano. Divide</surname>
          </string-name>
          et Impera:
          <article-title>Metareasoning for Large Ontologies</article-title>
          .
          <source>In OWLED</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>C. D. Vescovo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Sattler</surname>
            , and
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Schneider</surname>
          </string-name>
          .
          <article-title>The Modular Structure of an Ontology: Atomic Decomposition</article-title>
          .
          <source>In IJCAI</source>
          , pages
          <volume>2232</volume>
          {
          <fpage>2237</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>