=Paper=
{{Paper
|id=Vol-3263/paper-18
|storemode=property
|title=Next Steps for ReAD: Modules for Classification Optimisation
|pdfUrl=https://ceur-ws.org/Vol-3263/paper-18.pdf
|volume=Vol-3263
|authors=Haoruo Zhao,Bijan Parsia,Uli Sattler
|dblpUrl=https://dblp.org/rec/conf/dlog/ZhaoPS22
}}
==Next Steps for ReAD: Modules for Classification Optimisation==
<pdf width="1500px">https://ceur-ws.org/Vol-3263/paper-18.pdf</pdf>
<pre>
Next Steps for ReAD: Modules for Classification
Optimisation
Haoruo Zhao1 , Bijan Parsia1 and Uli Sattler1
1
    The University of Manchester, Oxford Rd, Manchester, UK M13 9PL


                                         Abstract
                                         Ontology Classification is a central DL reasoning task and supported by several highly-optimised
                                         reasoners for OWL ontologies. Different notions of modularity, including the atomic decomposition (AD),
                                         have already been exploited by different modular reasoners. In our previous work, we have designed and
                                         implemented a new AD-informed and MORe-inspired algorithm that uses Hermit and ELK as delegate
                                         reasoners, but avoids any duplicate subsumption tests between these two reasoners. In this paper, we
                                         push the algorithm further with easyfication (checking subsumption tests in a module rather than in the
                                         whole ontology) and parallelization. We empirically evaluate our algorithm with a set of SNOMED CT
                                         extensions ontologies and a corpus of BioPortal ontologies. We also design, implement and empirically
                                         evaluate a new modular reasoner, called Crane, which works with “coarsened” AD.

                                         Keywords
                                         Classification, Delegate Reasoner, Modular Reasoning


1. Introduction
With the development of syntactic locality-based modules [1, 2] there was a surge of interest in
using them to optimise classification time of description logic (DL) [3, 4] TBoxes [5, 6, 7, 8, 9, 10].
They, especially in their so-called ⊥ form, have a lot of properties which seem attractive for
reasoner optimisation: They are subsumer complete (i.e., a ⊥-module entails all subsumers for its
signature), can be extracted in polynomial time, and are subsets of the original ontology. These
features make “black-box" approaches very straightforward: first decompose the ontology into
some covering set of ⊥-modules, classify each ⊥-module as if it were an independent ontology,
then union the results. If you have dispatch criteria, one can build a coalition reasoner by adding
a dispatching function to several independent reasoners. The MORe [5] family of reasoners
took the latter approach, breaking their input into two modules1 : an ℰℒ++ -only [11, 12] module
which was delegated to an ℰℒ++ -reasoner [13, 14] and the “remainder" module which was
delegated to a 𝒮ℛ𝒪ℐ𝒬 [15] (or similar) reasoner [16, 17, 18]. Chainsaw [6], contrariwise, only
used one reasoner but broke the input into the smallest possible module.


    DL 2022: 35th International Workshop on Description Logics, August 7–10, 2022, Haifa, Israel
$ haoruo.zhao@manchester.ac.uk (H. Zhao); bijan.parsia@manchester.ac.uk (B. Parsia);
uli.sattler@manchester.ac.uk (U. Sattler)
 http://www.cs.man.ac.uk/~bparsia/ (B. Parsia); http://www.cs.man.ac.uk/~sattler/ (U. Sattler)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR

          CEUR Workshop Proceedings (CEUR-WS.org)
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073


                  1
     There are variants which have a more complex approach to the remainder, but the most prominent releases
versions are close to this description.
   Both approaches had successes but, even excluding decomposition time, they were not robust.
While for certain ontologies they showed worthwhile improvement, sometimes they make
things significantly worse. One hypothesis is that locality-based modules tend to overlap a
lot thus potentially inducing repeated work. The only information communicated between
classified modules is their final class hierarchy. To take an extreme example, if the ℰℒ++ module
and the remainder module have 90% overlap, the 𝒮ℛ𝒪ℐ𝒬-reasoner might take nearly as long
on that as it would on the whole ontology and it would still have the cost of the ℰℒ++ -reasoner
on top. If the whole ontology took infeasibly long for the 𝒮ℛ𝒪ℐ𝒬-reasoner to classify, then
the modularisation was pointless.
   In [19], different strategies for “chunking" a set of Chainsaw like modules were explored,
attempting to mitigate the duplicated work problem. However, none of the strategies proved
very robust or provide reliable improvements over a whole ontology approach.
   Alternatively, one can abandon the black-box approach and modify a reasoner to communicate
partial results between classifications of different modules. In [10], we demonstrated a prototype
of a MORe style reasoner, ReAD, that modified HermiT’s traversal algorithm [20, 21] both to
avoid repeating work that the ℰℒ++ reasoner has already done and also to use information
from a modular structure over the ontology (that is, the Atomic Decomposition [22]) to avoid
fruitless subsumption tests.
   In this paper, we report on several new variants of the basic light-touch glass-box approach.
In particular, we apply a (glass-box) Chainsaw like algorithm to the remainder module with
and without parallelisation both for the most fine grained modules (Granular ReAD) possible
and for a slightly coarsened set of modules (Crane). Somewhat surprisingly, we show that
Crane, excluding decomposition time, shows near dominance over all our variants and both
HermiT and a HermiT-MORe style system, often dramatically so. We also show that Crane is
comparable with Konclude even including decomposition time on versions of SNOMED CT
that has been enriched with disjunctive axioms.
   These results suggest that glass-box approaches can realise the promise of modular reasoning.


2. Preliminaries
We assume people are familiar with Description Logic [3] and basic inference services such as
consistency checking and classification.
  In this paper, we use 𝒪 for an ontology, NC for a set of concept names and NR for a set of
role names. A signature is a set Σ ⊆ NC ∪ NR of terms. For an axiom, a concept expression or
an ontology 𝑋, we use 𝑋 ̃︀ to denote its signature, i.e. the set of concept and role names. We use
C(𝒪) for the set of concept names in 𝒪, and C+(𝒪) = C(𝒪) ∪ {⊥, ⊤} for concept names in 𝒪
together with ⊤, ⊥.
  Modules based on syntactic or semantic locality [1] have been introduced as logically well-
behaved, tractable approximations of Conservative Extension (CE) modules [23, 24, 25]. These
modules are subsets of the ontology and we focus here on syntactic ⊥-locality in this paper (we
use module to represent that). These modules are computable polynomial time, and satisfy a
range of other useful properties [2, 26], in particular
   1. preserve all entailments of 𝒪 over Σ,
   2. preserve all entailments of 𝒪 over ℳ, ̃︁
   3. are unique, i.e., there is no other ⊥-locality based module of 𝒪 for Σ,
   4. are subsumer-preserving, i.e., for concept names 𝐴 ∈ ℳ,  ̃︁ 𝐵 ∈ 𝒪,̃︀ 𝒪 |= 𝐴 ⊑ 𝐵 implies
      ℳ |= 𝐴 ⊑ 𝐵 for ⊥-locality based module, and
   5. are monotonic, i.e., if Σ1 ⊆ Σ2 , then ℳ(Σ1 , 𝒪) ⊆ ℳ(Σ2 , 𝒪).
  The atomic decomposition (AD) A(𝒪) [27, 22] partitions an ontology into logically inseparable
sets of axioms, so-called atoms a, and relates these atoms via a dependency relation ⪰. The
principal ideal of a, defined as ↓ a = {𝛼 ∈ b | a ⪰ b}, is a genuine module [27, 22].

2.1. An AD Derived Todo List
In [10], we explored the foundations of using the AD for avoiding STs during classification.
Using the AD, we identify a (hopefully small) set of subsumption tests (STs) Subs(𝒪) that are
necessary and sufficient for classification of 𝒪. All other tests are unnecessary because they are
known not to hold since their signatures are never subsets of a common genuine ⊥-module
signature. Consider the following sets:

           Ats(𝐴) := {a ∈ A(𝒪) | 𝐴 ∈ ̃︀
                                     a}                                          the atoms of 𝐴
    MinAts(𝐴) := {a ∈ Ats(𝐴) |̸ ∃b ∈ Ats(𝐴) with a ≻ b}                          the lowest atoms of 𝐴
         CanS(a) := {𝐴 | a ∈ MinAts(𝐴) and #MinAts(𝐴) = 1} an atom’s candidate set
        BTop(𝒪) := {𝐴 | 𝐴 ∈ 𝒪
                            ̃︀ ∩ NC and #MinAts(𝐴) > 1}                          concept names below ⊤

The candidate set CanS(a) of an atom a are those concept names for which STs need to be run
for a, and concepts in BTop(𝒪) have only trivial subsumers.

Definition 1. [10] The set of STs Subs(a) of an atom a is defined as follows:

                      Subs(a) := {(𝐴, 𝐵) | 𝐴 ∈ CanS(a), 𝐵 ∈ ↓̃︁a, and 𝐴 ̸= 𝐵} ∪
                                 {(𝐴, ⊥) | 𝐴 ∈ CanS(a)} ∪
                                 {(⊤, 𝐵) | 𝐵 ∈ CanS(a)}.
                 ⋃︀
Subs(𝒪) :=            a∈A(𝒪) Subs(a) ∪ {(⊤, 𝐴) | 𝐴 ∈ BTop(𝒪)} ∪ {(⊤, ⊥)}.

   As a consequence of the following theorem, a reasoner that tests only STs in Subs(𝒪) during
classification will (a) test all required, non-trivial2 STs and (b) never duplicate a test.3

Theorem 1. [10] For 𝐴, 𝐵 ∈ 𝒪 ̃︀ ∩ NC ∪ {⊤, ⊥} with ⊥ =   ̸ 𝐴 ̸= 𝐵 ̸= ⊤. If 𝒪 |= 𝐴 ⊑ 𝐵 then
(𝐴, 𝐵) ∈ Subs(𝒪), and (𝐴, 𝐵) is either in exactly one Subs(a) or of the form (⊤, 𝐴) or (⊤, ⊥).


   2
       Of course we avoid testing tautologies.
   3
       It may, though, include a test (𝐴, 𝐶) in addition to (𝐴, 𝐵) and (𝐵, 𝐶).
2.2. Enhanced KP Algorithm, MORe and ReAD
HermiT uses the Enhanced KP (EKPA) classification algorithm [28, 21]. It builds a completion
graph with two sets of concepts pairs: One is known (K) subsumer pairs which records positive
subsumption relations. The other one is remaining possible (P) subsumer pairs which contains
remaining STs. The whole algorithm aims to empty P and increase K. HermiT aggressive strives
to fill K whenever it performs a satisfiability check, whether directly on a concept name or
as part of ST. For example, if the structures HermiT builds to test whether 𝐴 ⊑ 𝐵 allow it to
determine that 𝐸 ⊑ 𝐹 it will add 𝐸 ⊑ 𝐹 to K and remove it from P.
   MORe [5] is a black-box modular reasoning approach and splits the ontology into two
modules of different expressivity, and then uses a fast, delegate reasoner on the inexpressive
module. In its empirical evaluation, MORe uses ELK [14] to classify the module in ℰℒ++
[12, 11] and HermiT [18] for the remaining OWL 2 module.
   ReAD [10] is a glass-box and MORe-inspired approach. Compared to MORe, ReAD uses
ELK to classify a union of all ℰℒ++ modules,4 called 𝒯EL , by exploiting AD and uses a modified
HermiT to classify a union of remaining modules. called 𝒯RAs . The EKPA in HermiT is modified
to ensure 1) we do not check Subs(a) if a ⊆ 𝒯EL ; 2) we remove pairs (𝐴, 𝐵) from P if there is
no a ⊆ 𝒯RAs with (𝐴, 𝐵) ∈ Subs(a).


3. Chainsawing the Remainder Module
ReAD avoids repeating work done by the EL reasoner and exploits some information from the
AD in HermiT. But HermiT still remains a significant bottleneck. Thus, the natural next step is
to try to break down the remainder module into smaller, we hope in aggregate, easier modules.

3.1. Granular ReAD (ReADG )
Since we pre-compute an AD, we can use it to reason over a covering set of minimal modules
of 𝒯RAs . Rather than checking 𝒯RAs |=? 𝐴 ⊑ 𝐵 with (𝐴, 𝐵) ∈ Subs(a) and a ⊆ 𝒯RAs , we
check ↓ a |=? 𝐴 ⊑ 𝐵 with (𝐴, 𝐵) ∈ Subs(a). For every genuine module ↓ a with a ⊆ 𝒯RAs , we
initialize a modified HermiT which only checks Subs(a) and classifier ↓ a. One benefit for that
is that we can parallelize this procedure, called ReADGP . Algorithm 1 shows how ReADG works.
Compared to ReAD’s algorithm in [10], the modification is in lines 12-14. The parallelization is
also built for lines 12-14. Compared to ReAD’s algorithm, we modify less in EKPA algorithm.
We only ensure we check (𝐴, 𝐵) if 𝐴 ∈ CanS(a) but we do not remove any pairs (𝐴, 𝐵) from
P.
   Since ontologies tend to have lots of atoms, there is a risk that the overhead for “reasoner
swap” (building the reasoner, initializing, and deleting the reasoner instances) will be significant.
There are also three potential limitations for this approach due to interactions with other
optimizations for classification. First, HermiT uses a Hypertableau algorithm as its ST engine
and may get some "free" inferred subsumption relations from each ST.
    4
      The differences in deriving the EL and Remainder modules in the direct MORe or via a union of appropriate
modules does not seem significant for classification. In our case, we have the AD so it is convenient to use the union
technique.
Algorithm 1 ReADG
Require: an ontology 𝒪
 1: Initialize a hierarchy ℋ := {(⊥, ⊤)}
 2: use MORe approach to compute a (potentially large) ℰℒ++ module ℳEL and a remaining
    module ℳRAs = ℳ(C+(𝒪) ∖ ℳ          ˜︁EL , 𝒪)
 3: Compute the ⊥-AℳRAs
 4: ELAtoms := {a ∈ ⊥-A(𝒪) | ↓ a is in ℰℒ++ }                           {Find all ℰℒ++ modules}
 5: 𝒯EL := ℳEL ∪a∈ELAtoms a                                  {Compute union of ℰℒ++ modules}
 6: ℋ := Classify(𝒯EL )                                                        {use ELK for this}
 7: for atom a ∈ RemainingAtoms            {build a HermiT for ↓ a} do
 8:    ℋ := ℋ∪ Classify ↓ a                           {use modified HermiT only checks Subs(a)}
 9: end for
10: return ℋ              {If a call to Classify found inconsistency, we would have exited early}


   Some optimizations for STs like pseudo-model merging are not shared between reasoner
instantiation.
   Finally, ET-based algorithms avoid STs by exploring the transitivity of the (inferred) sub-
sumption relation but these “transitive shortcuts” are included in Subs(𝒪). Hence we may lose
this benefit of ET for these inferred subsumption relation.
   Going more glass-box (e.g., caching completion graphs or pseudomodels) might help in some
cases, but it may be that smaller is not always better. More fundamentally, computing the AD
can be very expensive, often dwarfing classification time. While work is ongoing to optimise AD
computation, for any AD approach to be reasonable, we have to assume that the AD is computed
offline and perhaps maintained through updates. While this is not particularly challenging, it is
a change to the basic infrastructure.

3.2. “Coarsened” AD and Crane
Motivated by these issues, we have experimented with using a “coarser" version of the AD.
In [22], we find ADs are often very fine-grained and computing AD is, in particular, very
costly despite it being in PTime. In our new system, Crane, we build “coarsened” AD as an
approximation of the AD and use this for AD-based traversal and classification. Furthermore,
we compute this coarsened AD on the fly interleved with classification call. This could be
changed to a pre-computation strategy.
   The general idea is to get coarsened AD and thus larger atoms and modules and classify them.
When we classify a coarsened module, we check the subsumers of the concept names in this
coarsened module. We have a global set ClassifiedC of concept names to record the concept
names which are already checked for their subsumers.
   We first compute the signature ΣEL from the MORe approach and use it to compute an
ℰℒ++ module ℳ1 and a remaining module ℳ2 . We use ELK to classify ℳ1 . Now we already
checked all subsumers of concept names in ℳ1 . We add these concept names into ClassifiedC
in line 4. In Algorithm 2, we make this genuine module coarsened so we get the module w.r.t
the signature of several axioms. Here picking how many axioms to get the coarsened module is
crucial and a parameter to Crane. In our experiments, we have set this parameter to 𝑛 = 100.
Then we can also further decompose these modules. Here we choose to further decompose
                                               𝑛
these modules once. So we use the threshold 10   = 10 to extract smaller modules and then
classify them.

Algorithm 2 Crane
Require: an ontology 𝒪, a parameter 𝑛
 1: compute ΣEL                                           {use the code from MORe for this}
 2: compute ℳ1 = ℳ(ΣEL , 𝒪) and ℳ2 = ℳ(𝒪       ̃︀ ∖ ΣEL , 𝒪)
 3: ℋ := ℋ∪ Classify(ℳ1 )                                                  {use ELK for this}
                           +
 4: ClassifiedC := ℳ1 ∩ C (𝒪)
                    ̃︂
 5: UnclassifiedAxioms := 𝒪 ∖ ℳ1
 6: while UnclassifiedAxioms ̸= ∅ do
 7:   get 𝑛 axioms 𝒯 from UnclassifiedAxioms
 8:   compute ℳ𝐶𝑂𝐴𝑅 = ℳ(𝒯̃︀ , ℳ2 )
 9:   UnclassifiedModuleAxioms := ℳ𝐶𝑂𝐴𝑅 ∩ UnclassifiedAxioms
10:   while UnclassifiedModuleAxioms ̸= ∅ do
              𝑛
11:      get 10 axioms 𝒯1 from UnclassifiedModuleAxioms
12:      ℳ𝐶𝑂𝐴𝑅𝐹 𝑢𝑟𝑡ℎ𝑒𝑟 := ℳ(𝒯̃︀1 , ℳ𝐶𝑂𝐴𝑅 )
13:      CanS := (ℳ𝐶𝑂𝐴𝑅𝐹˜︂𝑢𝑟𝑡ℎ𝑒𝑟 ∩ C+(𝒪)) ∖ ClassifiedC
14:      ℋ := ℋ∪ Classify ℳ𝐶𝑂𝐴𝑅𝐹 𝑢𝑟𝑡ℎ𝑒𝑟 with CanS          {use modified HermiT which only
         checks (𝐴, 𝐵) with 𝐴 ∈ CanS}
15:      UnclassifiedAxioms := UnclassifiedAxioms ∖ ℳ𝐶𝑂𝐴𝑅𝐹 𝑢𝑟𝑡ℎ𝑒𝑟
16:      UnclassifiedModuleAxioms := UnclassifiedModuleAxioms ∖ ℳ𝐶𝑂𝐴𝑅𝐹 𝑢𝑟𝑡ℎ𝑒𝑟
17:                                        ˜︂𝑢𝑟𝑡ℎ𝑒𝑟 ∩ C+(𝒪))
         ClassifiedC := ClassifiedC ∪ (ℳ𝐶𝑂𝐴𝑅𝐹
18:   end while
19: end while
20: return ℋ


4. Experiment Setting
We use HermiT version 1.3.8 5 , Konclude [29] version v0.7.0-1135 6 as base reasoners in this
paper. We use our own implementation of a MORe-like approach (see [10] page 11) which
classifies 𝒯EL by ELK and 𝒯RAs by HermiT described in [10]. This makes the comparison of the
algorithms more precise as they all build off the same code base.
   We ran three experiments. The first two experiments are essentially expanded versions
of those in [10] and demonstrate the differences of the ReADG and Crane approaches over
ReAD and the baseline systems. In these two experiments, all the ontologies are classifiable
using normal HermiT, thus the experiments explore glass-box modular reasoning as a potential
marginal improvement over monolithic and black-box MORe systems.
   5
       http://www.hermit-reasoner.com
   6
       https://github.com/konclude/Konclude
   In the final experiment, we look at several versions of SNOMED CT which enrich the ℰℒ++
version to include disjunctions. These ontologies prove difficult to impossible for HermiT and
MORe-like so we expand our comparison to include Konclude, which is capable of classifying
all version. This experiment explores whether our modular optimisations applied to HermiT
can be competitive with Konclude in an otherwise impossible for HermiT case.
   We use OWL API [30] version 3.4.3, ELK version 0.4.2 andHermiT (modified as indicated)
version 1.3.8 in all our systems. Crane also uses the module extraction code from MORe 7 .
   Most of experiments have been performed on Intel(R) Core(TM) i7-6700HQ CPU 2.60GHz
RAM 8GB, called Mach1 , running Java version 1.8 with an initial heap size of 1GB and a maximal
heap size of 8GB. Time is measured in CPU time. All experiments related to Konclude have been
performed on Intel(R) Core(TM) i7-5820K CPU 3.30GHz RAM 62GB, called Mach2 , running
Java version 1.8 JDK with an initial heap size of 1GB and a maximal heap size of 60GB.


5. Experiments
Throughout these experiments, we report classification times excluding the time to compute
modules or the AD even when interleaved in Crane, though we do report the Crane modularising
time separately. This introduces a potentially misleading bias in our reporting that we strongly
caution against. Indeed, if one is starting “cold", i.e., from an unmodularised input, one should
presume that all systems with an unreported modularisation time might have taken longer,
even much longer. While that is not always the case, it is often enough, especially for AD-based
systems that it is the right bet.
   We do this because we are trying to understand how exploiting modules affects the clas-
sification process itself. If an approach to exploiting modules does not win then the issue of
modularisation overhead is moot.
   Furthermore, there is an in principle solution to high modularisation cost: modularise “offline"
and amortise the cost. Since an AD based storage format and, indeed, AD-maintenance algo-
rithms are both straightforward, it is reasonable to focus on classification itself. Plus, speeding
up AD computation is ongoing with some promising efforts.
   We report on Crane’s modularisation time since we had originally hoped that this coarsening
time would reduce the modularisation overhead to negligible. It did not do that overall, but
there are some interesting results aroudn that.

5.1. Bioportal and HermiT
In this section, we re-use a corpus from 2017 NCBO BioPortal ontology already described in
[10] which contains 438 ontologies. We removed ABox axioms from all these ontologies, those
ontologies that are empty and those ontologies for which we cannot compute an AD (6) or
which HermiT cannot handle (37);8 this leaves us with 308 ontologies.


    7
     https://github.com/anaphylactic/MORe
    8
     HermiT threw OutOfMemory exceptions or timed-out after 10 hours for 11 ontologies; it failed to handle 26
ontologies due to unsupported syntax or syntax errors.
  We further discarded the 164 ontologies that are either purely ℰℒ++ (122 ontologies) or
have no ℰℒ++ modules (42 ontologies)9 which leaves us 63 ontologies with non-deterministic
tableaux graphs and 81 ontologies with deterministic tableaux graphs; for the latter, HermiT does
not check STs as the concept name satisfiability tests produce, as a side-effect, all subsumers of
each concept name. We use CT to represent the classification time and H, M, R, RG, RGP, C
for HermiT, MORe-like, ReAD, ReADG , ReADGP and Crane respectively.

5.2. Results
We show the results for deterministic ontologies in Table 1.
  For the first three columns, we have roughly what we might expect: A MORe like approach
shows, with some noise, some improvement over monolithic HermiT. ReAD does not show
any systematic improvement over MORe. ReADG shows some dramatic improvements but also
some odd reversals, especially for CHEBI and HRDP where it is an order of magnitude worse
than the baseline! Parallelisation improves matters greatly but not quite enough to ensure
dominance over baseline, at least, not with the number of cores we had available. Crane, on the
other hand, dominates, and typically by two orders of magnitude. The catch is that if we include
the modularisation time, Crane is hopeless. Thus, with current modularisation technology,
Crane needs to offline modularisation.

Table 1
The classification time among 7 deterministic ontologies in bins (2) and (3). The time is shown in seconds.
PrT is short for pre-processing time (including computing coarsened AD). Problematic times are bolded.
 Ontology           CTH(𝒪)       CTM(𝒪)        CTR(𝒪)        CTRG(𝒪)       CTRGP(𝒪)         CTC(𝒪)        C PrT
 FTC                 1,445.94     1645.75       1,322.09        225.34         40.68           2.54      13,335
 GO*                   834.66       695.42        618.84        233.24         48.90              2      14,438
 CHEBI*                 66.70        47.78         49.02       530.26          96.78              4      31,870
 FYPO                   24.88        11.25          8.27         17.76          3.74           0.14         296
 VTO                     4.68         0.31          0.34          1.02          1.01           0.19        2.17
 HRDO                    3.88         4.04          3.86         39.91          7.73           3.31      12,579
 ONL-MR-DA               2.75         2.86             3          6.11          1.33           0.06        0.19

   We might expect deterministic ontologies are the best case for HermiT and the worst case
for any granular approach, but the results for non-deterministic ontologies, shown in Table 2,
are startling.
   HermiT, MORe, and ReAD show their usual pattern with somewhat less noise. But, ReADG
shows quite bad behavior which parallelisation only mitigates in most cases. While throwing
threads at a problem to get faster overall performance is reasonable, throwing lots of threads to
just catch up with baseline is not.
   Crane, excluding pre-processing time, continues its near dominance, only falling behind
baseline with the ONL-MSA and behind ReAD with PHAGE. While dominant, the degree of

    9
      These are, indeed, comparable with a MORe system. . . but it would reduce to a comparison with the expressive
reasoner component and thus would not test glass vs. black-box issues. Testing against these ontologies is future
work.
dominance is less robustly significant. While sometimes it is 2 orders of magnitude faster, there
are cases where it is merely a nice speed up or, for CAO, barely an improvement.

Table 2
The classification time among 16 non-deterministic ontologies in bins (2) and (3). The time is shown in
seconds.
 Ontology             CTH(𝒪) CTM(𝒪) CTR(𝒪) CTRG(𝒪) CTRGP(𝒪) CTC(𝒪) C PrT
 CAO                   1071.13    1057.46    666.68       2392.33          1826.35        1,039    0.02
 PHAGE*                 606.36     475.67    449.10         120.05           50.23             2 20,699
 DCO-DEBUGIT            386.80     431.75    382.28       3150.41          1523.17       109.32    0.22
 VO                      78.92      94.19      85.86         35.29            9.61         0.45      1.7
 NTDO                    76.65      84.92      86.61        248.09          100.52        10.12    0.49
 NCIT                    75.73      57.38      57.25      4068.86          1349.88        38.81 857.42
 ONL-MSA                  9.02       9.21       9.14         33.85           18.71        19.32    0.26
 CCONT                    8.28       8.08       7.61         37.81           16.11         0.46   19.98
 FB-CV                    5.09       5.45       5.43          8.02            7.28           0.1  0.484
 FOODON                   2.11       0.30       0.21          0.49            0.27        0.015    3.52
 NEMO                     2.09       3.09       2.26          5.47            2.47        0.015    0.56
 TOK                      1.95       2.28       2.31          2.48            2.25             0  0.038
 ECSO                     1.39       1.28       1.28          4.06            2.13        0.079    1.61
 HUPSON                   1.30       1.34       1.16          8.72            4.05        0.047    0.28
 MHC                      1.23       1.14       1.14         10.81            4.36        0.046    1.95
 EPO                      1.11       1.02       1.07         12.39            4.57        0.329    0.13

  By itself, these results do not clearly establish a case for requiring the infrastructure overhaul
needed to mitigate the modularisation time, even for Crane. However, the fact that coarse-grain
modules tend to dominate performance and often produce orders of magnitude speed ups
suggests that investigating more ways to share information between classification of modules
would be fruitful. Clearly, the finest grain modules disrupt something. but Crane shows that
some level of module by module classifying is a big win.

5.3. SNOMED CT Experiment
SNOMED CT [31] is a medical terminology ontology that describes the knowledge of health
information and is used in medical records as a source of unambiguous, medicine processable
terms.10 The official development of SNOMED CT is constrained to be in ℰℒ++ and is well-
supported by the reasoner ELK, e.g. the 2021-Jan-28 version SNOMED CT ontology is classified
by ELK in only 11 seconds.
   Extending the SNOMED CT ontology to be an expressivity beyond ℰℒ++ draws a lot of
interest in both academic research [5] and industry but generally has been prohibited by practical
concerns about computational difficulty.
   From conversation with a SNOMED CT developer, we found that 4 minutes to reclassify
was about as much as they were willing to pay. They also helped derive three sets of more
expressive axioms that they found desirable from a modelling perspective. We combined these

   10
        https://www.snomed.org/
with two versions of ℰℒ++ SNOMED CT to produce a test set.11 Note that this is the basic
scenario that MORe was designed for: A mostly ℰℒ++ ontology with some more expressive
additions. The base versions were from 2020-July (S20July) and 2021-Jan (S21Jan). We add the
9-axiom expansion to S20July and the other two to S21Jan. For some details of all versions,
see Table 5.3. Note that the length of these ontologies is much larger than its “size" indicating a
large average length of axiom.
   The 9 axiom extension have axioms of two forms and was part of an initial proof of concept
effort:
         3 axioms of the form 𝐴1 ≡ 𝐴2 ⊔ 𝐴3 ,
         6 axioms of the form 𝐴4 ≡ 𝐴5 ⊓ ∃𝑟1 .(∃𝑟2 .¬𝐴6 ⊓ ∃𝑟3 .𝐴7 ⊓ ∃𝑟4 .𝐴8 ⊓ ∃𝑟5 .𝐴9 )
  The modelling scenario for the remaining versions is based on importing into SNOMED CT
a modified version of an Anatomy ontology to incorporate a disjunction based approach to
propagation of properties through part-whole relations (see [32] for an extensive discussion of
the modelling problem and different ways of solving it).
  The basic move is to replace axioms in Anatomy of the form:

                   𝑆 ≡ BodyStructure ⊓ ∃AllOrPartOf.𝐸 ⊓ ∃Laterality.QualifierValue             (1)
with 𝑆 ∈ StructureConcept, 𝐸 ∈ EntireConcept, with axioms of the form:
             𝑆 ≡ BodyStructure ⊓ (𝐸 ⊔ ∃ProperPartOf.𝐸) ⊓ ∃Laterality.QualifierValue.           (2)
One then deletes the corresponding atomic subsumptions between a StructureConcept and
EntireConcept in SNOMED CT and imports the modified ontology. The deleted subsumptions
are then entailed.
   This produces an extremely difficult ontology even if, as we did, you only convert a random
sample of 1000 style 1 axioms (SA21Jan1000 ). To make an intermediately difficult version
(S21Jan7905 ), we extracted all 7,905 axioms in the Anatomy ontology of the form in Equation 1,
modified them, and added them directly to SNOMED CT, discarding the rest of Anatomy. We
also left in the relevant atomic subsumptions to (hopefully) ease matters. Thus we have three
expressive versions of SNOMED CT of ascending difficulty.
   Table 3 shows the number of TBox axioms in the variants of the SNOMED CT ontologies
and the length of these ontologies. We notice these SNOMED CT ontologies all have axioms
with a generally complex structure since the length of these ontologies is more than 5 times
larger than their size.

5.3.1. Results
As baseline, we note that ELK classifies S20July and S21Jan in 9 seconds and 12s respectively.
In contrast to this, HermiT classifies these two ontologies with 5,199s (around 1.44 hours) and
5,929s (around 1.65 hours). All of the MORe descendents, including our own, show ELK like
performance on these as the ℰℒ++ module is the entire ontology.
   Table 4 shows the results of classifying our three extended versions of SNOMED CT by our
prior test systems (excluding non-parallel ReAD) with the addition of Konclude..
   11
        Note, for historical reasons, we did not do a full cross product.
Table 3
Different SNOMED CT extensions.
                                             Length        Size     DL
                             S20July        1,907,918   355,214     ℰℒ++
                             S20July9       1,907,924   355,217     𝒜ℒ𝒞
                             S21Jan         1,952,625   355,664     ℰℒ++
                             Anatomy          159,371    42,579     ℰℒ++
                             S21Jan7905     2,047,125   372,114     𝒜ℒ𝒞
                             SA21Jan1000    2,049,223   373,493     𝒜ℒ𝒞

Table 4
Different SNOMED CT extensions with their classification time checked by Crane, Konclude, HermiT,
MORe like approach. We use M for Memory. The time is shown in minutes.
                   CTH(𝒪)      CTM(𝒪)      CTRGP(𝒪)       CTC(𝒪)     CTK(𝒪)       C PrT        KM
  S20July9              97         5.43           1.9        0.02        1.77     46.15    11.52GB
  S21Jan7905       TimeOut        2,280           30          8.3          20    123.77     22.4GB
  SA21Jan1000      TimeOut     TimeOut       TimeOut        87.48      191.32    412.48     29.5GB


  The first observation is that only Crane and Konclude handle all three versions. We see a
variably steep slowdown for all reasoners across the samples. None of the reasoners approach
the 4 minute mark for the most realistic version, in spite of it being only a partial conversion.
  That said, Crane beats Konclude by at least an order of magnitude on pure classification time
and shows a competitive time when modularisation time is included. Moreover, Crane uses
much less memory, never exceeding 8GB even for the most difficult version while Konclude
needs 29GB. 12 Finally, we can, in principle, improve Crane’s performance by scaling up the cores
available. The dramatic improvement of granular ReAD over more for S21Jan7905 suggests
that its optimisations do distinguish it if not robustly.


6. Conclusion
These results suggest that light touch, glass box modularisation is a useful optimisation approach
even given the inconveniences of having to manage ontologies in a decomposed form. Further
tuning of the coarseness factor in Crane needs to be explored and there may be a value that is
effective across the board. Of course, once your ontology is modularised and classified, modular
incremental algorithms can be used which may bring the re-classification time of even the
most difficult versions of SNOMED CT to the 4 minute mark. Using Konclude as the delegate
reasoner for the remainder module might accomplish that directly.
   Understanding the causal story of the interaction of various granularities of modularisation
with other optimisations is critical. If we can identify these factors we may be able to dervive a
notion of module that is “right sized" for reasoning.


   12
     We run experiments for Konclude with another machine with maximal heap size of 30GB since Konclude
throw out of memory for 8GB heap size as we described in Section 4
Acknowledgments
Thanks to Dr Yongsheng Gao from IHTSDO for providing various SNOMED CT ontologies
including how to enrich them beyond ℰℒ in a realistic way.


References
 [1] B. Cuenca Grau, I. Horrocks, Y. Kazakov, U. Sattler, Modular reuse of ontologies: Theory
     and practice, Journal of Artificial Intelligence Research 31 (2008) 273–318.
 [2] U. Sattler, T. Schneider, M. Zakharyaschev, Which kind of module should I extract?, in:
     Proc. of DL-09, volume 477 of CEUR, 2009.
 [3] F. Baader, I. Horrocks, C. Lutz, U. Sattler (Eds.), An Introduction to Description Logic,
     Cambridge University Press, 2017.
 [4] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. F. Patel-Schneider (Eds.), The Descrip-
     tion Logic Handbook: Theory, Implementation, and Applications, CUP, 2003.
 [5] A. A. Romero, B. Cuenca Grau, I. Horrocks, More: Modular combination of owl reasoners
     for ontology classification, in: International Semantic Web Conference, Springer, 2012, pp.
     1–16.
 [6] D. Tsarkov, I. Palmisano, Chainsaw: a metareasoner for large ontologies., in: Proc. of ORE
     2012, 2012.
 [7] B. Cuenca Grau, C. Halaschek-Wiener, Y. Kazakov, History matters: Incremental ontology
     reasoning using modules, in: Proc. of ISWC-07, volume 4825 of LNCS, 2007, pp. 183–196.
 [8] N. Matentzoglu, B. Parsia, U. Sattler, Owl reasoning: Subsumption test hardness and
     modularity, Journal of automated reasoning 60 (2018) 385–419.
 [9] H. Zhao, B. Parsia, U. Sattler, Avoiding subsumption tests during classification using the
     atomic decomposition, in: DL-19, volume 573, 2019.
[10] H. Zhao, B. Parsia, U. Sattler, Read: Ad-based modular ontology classification, in: European
     Conference on Logics in Artificial Intelligence, Springer, 2021, pp. 210–224.
[11] F. Baader, S. Brandt, C. Lutz, Pushing the ℰℒ envelope, in: Proc. of IJCAI-05, 2005.
[12] F. Baader, S. Brandt, C. Lutz, Pushing the ℰℒ envelope further, in: Proc. of OWLED 2008,
     2008.
[13] F. Baader, C. Lutz, B. Suntisrivaraporn, Cel—a polynomial-time reasoner for life science
     ontologies, in: International Joint Conference on Automated Reasoning, Springer, 2006,
     pp. 287–291.
[14] Y. Kazakov, M. Krötzsch, F. Simančík, The incredible ELK, Journal of automated reasoning
     53 (2014) 1–61.
[15] I. Horrocks, O. Kutz, U. Sattler, The even more irresistible 𝒮ℛ𝒪ℐ𝒬, in: KR-06, 2006, pp.
     57–67.
[16] D. Tsarkov, I. Horrocks, Fact++ description logic reasoner: System description, in:
     International joint conference on automated reasoning, Springer, 2006, pp. 292–297.
[17] E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, Y. Katz, Pellet: A practical owl-dl reasoner,
     Journal of Web Semantics 5 (2007) 51–53.
[18] B. Glimm, I. Horrocks, B. Motik, G. Stoilos, Z. Wang, Hermit: an owl 2 reasoner, Journal
     of Automated Reasoning 53 (2014) 245–269.
[19] N. A. Matentzoglu, Module-based classification of OWL ontologies, Ph.D. thesis, University
     of Manchester, 2016.
[20] F. Baader, B. Hollunder, B. Nebel, H.-J. Profitlich, E. Franconi, An empirical analysis of
     optimization techniques for terminological representation systems: or:’making kris get a
     move on’, in: 3rd International Conference on Principles of Knowledge Representation
     and Reasoning (KR-92), 1992, pp. 270–281.
[21] B. Glimm, I. Horrocks, B. Motik, R. Shearer, G. Stoilos, A novel approach to ontology
     classification, Journal of Web Semantics 14 (2012) 84–101.
[22] C. Del Vescovo, M. Horridge, B. Parsia, U. Sattler, T. Schneider, H. Zhao, Modular structures
     and atomic decomposition in ontologies, Journal of Artificial Intelligence Research 69
     (2020) 963–1021.
[23] S. Ghilardi, C. Lutz, F. Wolter, Did I damage my ontology? A case for conservative
     extensions in description logics, in: KR, AAAI Press, 2006, pp. 187–197.
[24] B. Konev, C. Lutz, D. Walther, F. Wolter, Semantic modularity and module extraction in
     description logics, in: ECAI-08, 2008, pp. 55–59.
[25] C. Lutz, D. Walther, F. Wolter, Conservative extensions in expressive description logics,
     in: IJCAI, 2007, pp. 453–458.
[26] B. Cuenca Grau, I. Horrocks, Y. Kazakov, U. Sattler, Extracting modules from ontologies:
     A logic-based approach, in: H. Stuckenschmidt, C. Parent, S. Spaccapietra (Eds.), Modular
     Ontologies: Concepts, Theories and Techniques for Knowledge Modularization, volume
     5445 of LNCS, SV, 2009, pp. 159–186.
[27] C. Del Vescovo, B. Parsia, U. Sattler, T. Schneider, The modular structure of an ontology:
     Atomic decomposition, in: IJCAI, 2011, pp. 2232–2237.
[28] R. Shearer, I. Horrocks, Exploiting partial information in taxonomy construction, in:
     International Semantic Web Conference, Springer, 2009, pp. 569–584.
[29] A. Steigmiller, T. Liebig, B. Glimm, Konclude: system description, Journal of Web Semantics
     27 (2014) 78–85.
[30] M. Horridge, S. Bechhofer, The owl api: A java api for owl ontologies, Semantic web 2
     (2011) 11–21.
[31] S. Schulz, B. Suntisrivaraporn, F. Baader, M. Boeker, Snomed reaching its adolescence:
     Ontologists’ and logicians’ health check, International journal of medical informatics 78
     (2009) S86–S94.
[32] P. Seyed, A. L. Rector, U. Sattler, B. Parsia, R. Stevens, Representation of part-whole
     relationships in snomed ct, in: ICBO, Citeseer, 2012.

</pre>