<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improved Algorithms for Module Extraction and Atomic Decomposition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dmitry Tsarkov</string-name>
          <email>tsarkov@cs.man.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science The University of Manchester Manchester</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In recent years modules have frequently been used for ontology development and understanding. This happens because a module captures all the knowledge an ontology contains in a given area, and often is much smaller than the whole ontology. One useful modularisation technique for expressive ontology languages is locality-based modularisation, which allows for fast (polynomial) extraction of modules. In order to better understand the modular structure of an ontology, a technique called Atomic Decomposition can be used. It e ciently builds a structure representing all possible modules for an ontology, regardless of the modularisation algorithm adopted and without the need to compute an exponential number of modules, as in a naive approach. This structure may be used e.g., for quick extraction of modules, or to investigate dependencies between modules, and so on. However, existing algorithms for both locality-based module extraction and atomic decomposition do not scale well. This happens mainly because of their global nature: each iteration always explores the whole ontology, even when it is not necessary. We propose algorithms for locality-based module extraction and atomic decomposition that work only on the relevant part of the ontology. This improves performance of algorithms by avoiding unnecessary checks. Empirical evaluation con rms a signi cant speed up on real-life ontologies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Following the great success of the OWL 2 family of ontology languages, they
have become widespread as a knowledge representation formalism. This success,
in particular, is based on reasoning facilities available for these languages, that
are provided by a number of tools. However, the tools (usually) do not scale
well. The worst-case computational complexity of the most expressive decidable
fragment of OWL, OWL 2 DL, is N2EXPTIME. So there is a need to deal with
large ontologies in an e cient manner.</p>
      <p>One way of dealing with this issue during ontology development is to use
modules. A module is a subset of an ontology that captures all the knowledge
the ontology contains about a given set of terms. When reusing an existing
ontology, instead of importing all of it to use a few terms and axioms, one could
extract a module based on a given set of terms, thus limiting the growth of the
ontology that will be fed to the reasoner.</p>
      <p>
        To do so, a module extraction algorithm is necessary. The notion of
conservative extensions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] allows one to de ne a module w.r.t. a signature as a
minimal set that preserves all entailments over . However, deciding whether
a subset of an ontology is a module in this sense is a non-trivial task. Even for
simple DL languages, it is double exponential in time, whereas for expressive
languages, like OWL 2 DL, this problem is undecidable.
      </p>
      <p>Locality-based modules is an alternative solution for e cient module
extraction in expressive logics. Intuitively, an axiom is local w.r.t. a signature if it does
not a ect any entailment that uses only terms from that signature. So the
approach is to keep in the module only those axioms that are non-local to a given
signature (while extending the signature as more axioms are added to the
module). The traditional modularisation algorithm follows this idea by traversing all
the axioms in the ontology, checking their locality and adding them to a module
if non-local, updating the signature accordingly and then repeating the traversal
until no new entities are added to a signature.</p>
      <p>It is easy to see that this approach, while having polynomial (quadratic)
run-time, has some room for improvement. The addition of a single term to
a signature could lead to re-checking locality of every axiom in the ontology,
including those that have nothing to do with the change in the signature, i.e.,
are not touched by either the old or the new signature. In the approach proposed
in this paper only the axioms that might become non-local after a change of
signature are re-checked.</p>
      <p>
        There might be cases where one wants to extract more than just a single
module from an ontology. In order to explore the modular structure of the
ontology, the atomic decomposition approach has recently been investigated [
        <xref ref-type="bibr" rid="ref1 ref5">1,
5</xref>
        ]. Atomic decomposition can be viewed as a compact representation of all the
modules of an ontology. In this approach the notion of atom is introduced as a
subset of an ontology, whose axioms always co-occur in a module (i.e., for each
module, either all of the axioms are included in the module or none of them
occurs in it). A dependency between atoms, which mirrors the subset relation
between corresponding modules, is also described in the Atomic Decomposition
approach.
      </p>
      <p>The algorithm for building the atomic decomposition of an ontology is also
rather straightforward. First, the module for a signature of every axiom is built.
Then axioms with equivalent modules are combined into a single atom. After
the set of atoms is known, their modules are explored to derive dependencies
between atoms.</p>
      <p>As in the module extraction case, the atomic decomposition algorithm can be
improved. Taking account of the notion of a module in the atomic decomposition
structure, it is possible to signi cantly reduce the search space for modules, as
well as for dependency relation. Empirical evaluation on large real-life ontologies
shows an increase in performance of up to 50 times.</p>
      <p>The rest of this paper is organised as follows. In Section 2 some preliminary
notions are de ned. Section 3 contains the de nition of the original module
extraction algorithm, the analysis of its ine ciencies and the improved algorithm,
together with a proof of its correctness. Similarly, in Section 4 the original and
improved algorithms for the atomic decomposition are discussed. Results of the
evaluation of the algorithms involving several real-life ontologies are presented
in Section 5. Finally some conclusions are drawn in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>We assume that the reader is familiar with the notion of OWL 2 axiom, ontology
and entailments. An entity is a named element of the signature of an ontology.
For an axiom , we denote by e the signature of that axiom, i.e. the set of all
entities in . The same notation is also used for a set of axioms.
De nition 1 (Module). Let O be an ontology and
M of the ontology is called a module of O w.r.t.
every axiom with e .</p>
      <p>be a signature. A subset
if M j= () O j= , for</p>
      <p>One of the ways to build modules is to use locality of axioms.</p>
      <p>De nition 2 (Semantic Locality). An axiom is called &gt;(?)-local w.r.t a
signature if replacing all named entities in e n with &gt;(resp. ?) makes that
axiom a tautology. An axiom is called a tautology if it is local w.r.t. e. An
axiom is called global if it is non-local w.r.t. ;.</p>
      <p>
        Note that checking the tautology of an axiom is done by checking the
entailment of by the empty ontology, i.e., ; j= . In order to avoid this check
(which involves reasoning and might be expensive) the notion of syntactic locality
was introduced in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>We are not giving a formal de nition of syntactic locality here. This would be
an unnecessary complication, as the algorithms will use the locality checker as a
black box. The intuition behind the syntactic locality is that it tries to simulate
the entailment check by exploring the axiom structure and making decisions
about locality by propagating constant values through expressions.</p>
      <p>Syntactic locality is sound in the sense that every syntactically local axiom
is also semantically local. The converse is not true, however: some syntactically
non-local axiom are semantically local. We assume that the locality checker
provides a method isNonLocal( ) that returns true i the axiom is
nonlocal.</p>
      <sec id="sec-2-1">
        <title>De nition 3 (Atomic Decomposition). A set of axioms A is an atom of an</title>
        <p>ontology O, if for every module M of O, either A M or A \ M = ;. An atom
A is dependent on B (written B 4 A) if A M implies B M , for every
module M . An Atomic Decomposition of an ontology O is a graph G = hS; 4i,
where S is the set of all atoms of O.</p>
        <p>
          Algorithm 1 Original Modularity Algorithm [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
2 O do
2= M and isNonLocal( ; ) then
        </p>
        <p>M M [
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Module Extraction Algorithms</title>
      <p>The locality-based module extraction is based on the following theorem.</p>
      <sec id="sec-3-1">
        <title>Theorem 1 (Locality-based Module [2]). Let M O be two ontologies</title>
        <p>such that all axioms in O n M are local w.r.t. [ Mf. Then M is a module of O
w.r.t</p>
        <p>This claim holds for all types of modules, as well as the locality checking
approach. The original algorithm, based on this theorem, is described here as an
Algorithm 1, and is implemented in, e.g., OWL API.1
3.1</p>
      </sec>
      <sec id="sec-3-2">
        <title>Original Module Extraction Algorithm</title>
        <p>The algorithm starts from the empty module and then goes through all the
axioms to check their locality. If an axiom is non-local w.r.t. the current
signature, it is added to the module. In addition, the signature is extended with
the signature of . The whole process is repeated until the signature reaches a
xpoint (line 11).</p>
        <p>While having a simple structure and being easily understandable, the
traditional algorithm has some ine ciencies. The most obvious one comes from the
fact that it is necessary to check all the remaining axioms in the ontology if a
single entity is added to the signature. This leads to the worst-case complexity of
O(n2), where n is the number of axioms in the ontology. Indeed, if every run of
the loop in lines 3{11 adds a single axiom to a module, increasing the signature
on each step, the loop will be run n times and about n2=2 locality checks will
be made.
1 http://owlapi.sourceforge.org
. Initialize SA and Globals</p>
        <p>. global axiom
. Initialise working set
. Global axioms are always in the module
Algorithm 2 Improved Modularity Algorithm
an axiom such that is local w.r.t.</p>
        <p>[ 0 such that 0 \ e = ;.</p>
        <p>Proof. Let jc denote the axiom in which all entities not in are replaced
with c, where c is either &gt; or ? depending on the locality type. Then the claim
of the proposition follows from two simple observations:
2. jc ne =
Since is local w.r.t. , jc is a tautology. The, by the rst item above,
Soj, [ jc0 [=0 (cojicnc)ijdce0s =wit(h jc j)cjc, 0in.ee.,, iwshaictha,utboylogthy.e Tshecuosn,d item, equals to jc .
c
is local w.r.t. [ 0.</p>
        <p>tu</p>
        <p>Algorithm 2 implements the proposed approach. In lines 3{11, the auxiliary
structures for the algorithms are initialised. One of these structures is a map SA
that associates each entity with the set of axioms containing it in their signature.
Another is a set of global axioms Globals. Global axioms should be treated in a
special way as they are part of every module independently of their signature.</p>
        <p>After these structures are created, the algorithm initialises the working set
S with the initial signature . Then, all the global axioms from the set Globals
are added to the module using the addNonLocal procedure.</p>
        <p>In the main cycle (lines 16{21) an entity is taken from the set S, then the
set of a ected axioms is retrieved using the map SA. Each of these axioms is
checked for locality and, if non-local, is added to the module by addNonLocal.</p>
        <p>The addNonLocal procedure is de ned in the lines 24{30 of the
Algorithm 2. If an axiom is found non-local, it is added to the module M , and its
signature is added to . Moreover, every new entity is added to the working set
S (line 27) to allow further search for the axioms that are non-local w.r.t. the
extended signature.</p>
        <p>The correctness of the algorithm can be proved by induction on the size of
the signature . The basis of induction, for the empty signature: all that goes
to the module is the set of global axioms, which is done in the lines 13{15 of
the algorithm. Assume that for a given all the necessary locality checks have
been performed for axioms in the ontology O. Let us now check the case when
a new entity is added to . In this case all the axioms from O that contain
in their signature, are re-checked for locality w.r.t. new signature. All other
axioms, according to Proposition 1, will keep their locality status, so there is no
need to re-check them.</p>
        <p>Note that the computational complexity of the improved algorithm is di
erent from th eone of the original one. Now every axiom is checked an most jej
times, so the overall complexity is O(N s), where N is the size of the ontology
O and s = max 2O(jej).</p>
        <p>It is also worth noting that the initialisation of auxiliary structures (lines 3{
11) can be done only once for every ontology, and then reused for consequent
module extraction queries.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Atomic Decomposition Algorithms</title>
      <p>
        Now let us introduce the algorithms for the atomic decomposition of an ontology.
For the ease of explanation we assume that the ontology for the atomic
decomposition does not contain tautologies (i.e. axioms that are local w.r.t. their own
signature). They have no sense in the atomic decomposition, as they does not
Algorithm 3 Original Atomic Decomposition [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
1: procedure AtomicDecomp(O)
2: Gen ;; M odule ;; Atom ;; 4
3: for all 2 O do
4: M odule( ) getModule(e; O)
5: if isNewModule( ; Gen) then
6: Atom( ) f g
7: Gen Gen [
8: end if
9: end for
10: for all 2 Gen do
11: for all 2 Gen do
12: if 2 M odule( ) then
13: 4 4 [hAtom( ); Atom( )i
14: end if
15: end for
16: end for
17: end procedure
18: function isNewModule( ; Gen)
19: for all 2 Gen do
20: if M odule( ) = M odule( ) then
21: Atom( ) Atom( ) [ f g
22: return false
23: end if
24: return true
25: end for
26: end function
;
. build all atoms and modules
. build all dependencies
belong to any (locality-based) module of the ontology. In order to achieve this
one have to check all the axioms and remove the tautologies from the ontology.
4.1
      </p>
      <sec id="sec-4-1">
        <title>The Original Atomic Decomposition Algorithm</title>
        <p>
          The original atomic decomposition algorithm presented here was described by
Del Vescovo et al [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. It contains two independent parts. The rst part (lines 3{
9) builds all the atoms of the ontology. It is done by creating a module for a
signature of every axiom (line 4), and by comparing this module to already
created modules. If such a module already exists in the ontology (which is checked
in the auxiliary procedure isNewModule, line 20), then the axiom is added to
the atom, represented by the already checked axiom (line 21). If no module is
equivalent to the module for , then goes to a new atom (line 6).
        </p>
        <p>The second part of the algorithm, lines 10{16, builds the dependency relation
4. It goes through all atoms and sets the dependency between atoms A and B
if an axiom from A is contained in the module built for axioms from the atom
B.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Improved Atomic Decomposition Algorithm</title>
        <p>As in the case of the module extraction algorithm, the traditional atomic
decomposition algorithm su ers from some ine ciencies. The rst is independence
of module creation: all the modules are created, using the whole ontology as a
starting point. However, in many cases a module is included into another module
with a bigger signature. This is a consequence of the following observation.
24: function getAtomSeed( ; )
25: M odule( ) getModule(e; M odule( ))
26: if M odule( ) = M odule( ) then
27: return
28: else
29: return
30: end if
31: end function
. The atom for is already known
Proposition 2. Let ; be axioms in an ontology O. Let M odule( ; O) denote
the module of O w.r.t. e. Then M odule( ; O) M odule( ; O) whenever 2
M odule( ; O).</p>
        <p>In fact, this proposition follows from depleteness of the locality-based
modules [1, Proposition 2.2, claim iii]. So in order to build a module for it is enough
to explore only axioms in the module for .</p>
        <p>Another observation stems from the analysis of an dependency relation
structure. Lines 12{13 of the Algorithm 3 imply that all atoms on which Atom( )
depends are contained in M odule( ). But in this case there is no need to look
outside that module for the dependencies. At the same time, the dependency
relation could be build during the atom creation process.</p>
        <p>These two ideas lie at the foundation of the improved atomic decomposition
algorithm, presented as Algorithm 4. The main cycle (lines 2{6) ensures that
an atom is built for every axiom, using the whole ontology as a starting point.
After all atoms are created, the dependency relation is completed by using the
standard transitive closure algorithm (line 7).</p>
        <p>The main ingredient of the algorithm is implemented as a recursive function
buildAtomsInModule. It takes two parameters: an axiom , for which the
atom (and module) are going to be built; and an axiom , which is a \parent"
of an in the sense that M odule( ) M odule( ). For special case = null, as
in line 4 of the code, we assume that M odule( ) is the whole ontology O. The
function returns a representative of the Atom( ).</p>
        <p>First, it checks whether an atom for has been already created (lines 10{12).
In this case there is nothing to do and is returned as a representative.</p>
        <p>Then, using the module of a parent axiom as a starting point, the module
for is created (line 25). Then, like in the function isNewModule from
Algorithm 3, the algorithm checks whether such module already exists. However,
unlike in function isNewModule, only one check is required here (line 26),
namely, to compare it with the parent module. Then axiom is added to an
atom, obtained by function getAtomSeed (line 14) and, if the atom already
exists (i.e., is represented by the parent axiom ) then the parent is returned.</p>
        <p>If the atom is new, i.e. M odule( ) 6= M odule( ), the algorithm recursively
builds all atoms inside M odule( ) (lines 18{21): buildAtomsInModule is
called for all axioms in M odule( ) with as a parent. The dependency
relation is updated accordingly (line 20), as Atom( ) depends on every atom in
the M odule( ). In the end, as a representative of a new module, is returned.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Empirical Evaluation</title>
      <p>
        The improved algorithms described in Sections 3.2 and 4.2 were implemented in
the FaCT++ Description Logic reasoner [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We have done some experiments,
which show considerable performance improvement over the original versions of
the algorithms.
      </p>
      <p>The rst set of experiments shows the di erence between original and
improved module extraction algorithms. As a test we perform an atomic
decomposition (improved algorithm) over several well-known ontologies, because it
intensively uses the module extraction procedure. The results are presented in
Table 1. Here the ontology size is given in the number of axioms, size of the
atomic decomposition (AD size) is given in number of atoms.</p>
      <p>This table shows some general patterns of the performance improvement.
The rst two ontologies represent the case of a large number of small atoms,
where the improved algorithm behaves in the best way. The full version of the
Galen ontology is very hard for reasoning. It contains one large atom (about 950
axioms), while all other atoms are rather small. Still, the improved algorithm
requires only 10% of the locality checks in the original one. The Wine ontology
represents the other end of the spectrum: a few very large atoms. This leads to
the smallest improvement of the new algorithm against the original one; however,
even in this case it uses 50% operations of the standard algorithm.</p>
      <p>
        The second set of experiments compares two atomic decomposition
algorithms on a set of ontologies. We use the OWL API implementation as a
reference, and the FaCT++ implementation as an improved algorithm. The set of
test ontologies is a subset of BioPortal ontologies, described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>The results of the tests are shown at Fig. 1. The graph shows the ratio
between the time needed to decompose ontologies by the original algorithm and
the improved algorithm. While the average ratio is about 7, in the extreme cases
the improved algorithm demonstrates 48 times better performance.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>We propose new improved algorithms of the locality-based module extraction
and atomic decomposition of the ontologies. We prove their correctness and
compare them with the original algorithms. Provided empirical evaluation results
con rm that the proposed algorithms outperformed original ones on a set of
reallife ontologies.</p>
      <p>
        We are planning to implement a semantic locality checker and compare
results on real-life ontologies. We also are planning to implement labelled atomic
decomposition [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which could be useful for the fast module extraction.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Del</given-names>
            <surname>Vescovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Parsia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            ,
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>The modular structure of an ontology: atomic decomposition</article-title>
          .
          <source>In: Proc. of IJCAI</source>
          . pp.
          <volume>2232</volume>
          {
          <issue>2237</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazakov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Modular reuse of ontologies: Theory and practice</article-title>
          .
          <source>Journal of Arti cial Intelligence Research</source>
          <volume>31</volume>
          (
          <issue>1</issue>
          ),
          <volume>273</volume>
          {
          <fpage>318</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Konev</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walther</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Formal properties of modularisation</article-title>
          . Modular Ontologies pp.
          <volume>25</volume>
          {
          <issue>66</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Tsarkov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
          </string-name>
          , I.:
          <article-title>FaCT++ description logic reasoner: System description</article-title>
          .
          <source>Automated Reasoning</source>
          pp.
          <volume>292</volume>
          {
          <issue>297</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Vescovo</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gessler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klinov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winget</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Decomposition and modular structure of bioportal ontologies</article-title>
          .
          <source>In: International Semantic Web Conference (1)</source>
          . pp.
          <volume>130</volume>
          {
          <issue>145</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>