<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology Module Extraction via Datalog Reasoning?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ana Armas Romero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark Kaminski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Cuenca Grau</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ian Horrocks</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose a novel approach in which module extraction is reduced to a reasoning problem in datalog. Our approach generalises existing approaches and can be tailored to preserve only speci c kinds of entailments, which allows us to extract signi cantly smaller modules. An evaluation on widely-used ontologies shows very encouraging results. Module extraction is the task of computing, given an ontology T and a signature of interest , a (preferably small) subset M of T (a module) that preserves all relevant entailments in T over the set of symbols . Such an M is indistinguishable from T w.r.t. , and T can be safely replaced with M in applications of T that use only the symbols in . Module extraction has received a lot of attention in recent years [27, 6, 25, 17, 10, 9, 23], and modules have found a wide range of applications, including ontology reuse [6, 13], matching [12], debugging [29, 19] and classi cation [1, 30, 4]. Preservation of relevant entailments is formalised via inseparability relations. The strongest notion is model inseparability, which requires that it must be possible to turn any model of M into a model of T by (re-)interpreting only the symbols outside ; such an M preserves all second-order entailments of T w.r.t. [15]. A weaker and more exible notion is deductive inseparability, which requires only that T and M entail the same -formulas in a given query language. Unfortunately, the decision problems associated with module extraction are invariably of high complexity, and often undecidable. For model inseparability, checking whether M is a -module in T is undecidable even if T is restricted to be in the description logic (DL) E L, for which standard reasoning is tractable. For deductive inseparability, the problem is typically decidable for lightweight DLs and \reasonable" query languages, albeit of high worst-case complexity; e.g., the problem is already ExpTime-hard for E L if we consider concept inclusions as the query language [20]. Practical algorithms that ensure minimality of the extracted modules are known only for acyclic E LI [15] and DL-Lite [17]. Practical module extraction techniques are typically based on sound approximations: they ensure that the extracted fragment M is a module (i.e., inseparable</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        ? This paper summarises the results of our work published in [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
from T w.r.t. ), but they give no minimality guarantee. The most popular such
techniques are based on a family of polynomially checkable conditions called
syntactic locality [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6, 24</xref>
        ]; in particular, ?-locality and &gt;? -locality. Each
localitybased module M enjoys a number of desirable properties for applications: (i) it
is model inseparable from T ; (ii) it is depleting, i.e., T n M is inseparable from
the empty ontology w.r.t. ; (iii) it contains all justi cations (a.k.a.
explanations) in T of every -formula entailed by T ; and (iv) last but not least, it can
be computed e ciently, even for very expressive ontology languages.
      </p>
      <p>
        Locality-based techniques are easy to implement, and surprisingly e ective in
practice. However, the extracted modules can be rather large, which limits their
usefulness in some applications [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. One way to address this is to develop
techniques that produce smaller modules while still preserving properties (i){(iii).
E orts in this direction have con rmed that locality-based modules can be far
from optimal in practice [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]; however, these techniques only apply to rather
restricted languages and utilise algorithms with high worst-case complexity.
      </p>
      <p>Another approach to computing smaller modules is to weaken properties
(i){(iii), which are stronger than required for many applications. In particular,
model inseparability (property (i)) is a very strong condition, and deductive
inseparability would usually su ce, with the query language determining which
kinds of consequence are preserved; in modular classi cation, for example, only
atomic concept inclusions need to be preserved. However, practical module
extraction techniques for expressive ontology languages yield modules that satisfy
all three properties, and hence are potentially much larger than they need to be.</p>
      <p>In this paper, we propose a technique that reduces module extraction to a
reasoning problem in datalog. The connection between module extraction and
datalog was rst observed in [28], where it was shown that locality ?-module
extraction for E L ontologies could be reduced to propositional datalog reasoning.
Our approach takes this connection much farther by generalising both
localitybased and reachability-based [23] modules for expressive ontology languages in
an elegant way. A key distinguishing feature of our technique is that it can
extract deductively inseparable modules, with the query language tailored to
the requirements of the application at hand, which allows us to relax Property
(i) and extract signi cantly smaller modules. In all cases our modules preserve
the nice features of locality: they are widely applicable (even beyond DLs), they
can be e ciently computed, they are depleting (Property (ii)) and they preserve
all justi cations of relevant entailments (Property (iii)). We have implemented
our approach using the RDFox datalog engine [22]. Our evaluation shows that
module size consistently decreases as we consider weaker inseparability relations,
which could signi cantly improve the usefulness of modules in applications.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>Ontologies and Queries We use standard rst-order logic and assume
familiarity with description logics, ontology languages and theorem proving. A
signature is a set of predicates and Sig(F ) denotes the signature of a set of
formulas F . To capture a wide range of KR languages, we formalise ontology
axioms as rules: function-free sentences of the form 8x:['(x) ! 9y:[Win=1 i(x; y)]],
where ', i are conjunctions of distinct atoms. Formula ' is the rule body and
9y:[Win=1 i(x; y)] is the head. Universal quanti cation is omitted for brevity.
Rules are required to be safe (all variables in the head occur in the body). A
TBox T is a nite set of rules; TBoxes mentioning equality ( ) are extended
with its standard axiomatisation. A fact is a function-free ground atom. An
ABox A is a nite set of facts. A positive existential query (PEQ) is a formula
q(x) = 9y:'(x; y), where ' is built from function-free atoms using only ^ and _.
Datalog A rule is datalog if its head has exactly one atom (which could be ?, the
nullary falsehood predicate) and all variables are universally quanti ed. A datalog
program P is a set of datalog rules. Given P and an ABox A, their materialisation
is the set of facts entailed by P [A, which can be computed by means of
forwardchaining. A fact is a consequence of a datalog rule r = Vin=1 i0 ! and facts
1; : : : ; n if = with a most-general uni er (MGU) of i; i0 for each
1 i n. A (forward-chaining) proof of in P [ A is a pair = (T; ) where T
is a tree, is a mapping from nodes in T to facts such that for each node v the
following holds: 1. (v) = if v is the root of T ; 2. if v is a leaf then (v) 2 A
or (! (v)) 2 P; and 3. if v has children w1; : : : ; wn then (v) is a consequence
of r and (w1); : : : ; (wn). Forward-chaining is sound (if has a proof in P [ A
then it is entailed by P [ A) and complete (if is entailed by P [ A then either
has a proof in P [ A or so does ?). Finally, the support of is the set of rules
occurring in some proof of in P [ A.</p>
      <p>Inseparability Relations &amp; Modules We next recapitulate the most common
inseparability relations studied in the literature. TBoxes T and T 0 are
{ -model inseparable (T m T 0), if for every model I of T there is a model</p>
      <p>J of T 0 with the same domain s.t. AI = AJ for each A 2 , and vice versa.
{ -query inseparable (T q T 0) if for every Boolean PEQ q and -ABox A
we have T [ A j= q i T 0 [ A j= q.
{ -fact inseparable (T f T 0) if for every fact and ABox A over we
have T [ A j= i T 0 [ A j= .
{ -implication inseparable (T i T 0) if for each ' of the form A(x) ! B(x)
with A; B 2 , T j= ' i T 0 j= '.</p>
      <p>These relations are naturally ordered from strongest to weakest: m ( q (
f ( i for each non-trivial .</p>
      <p>Given an inseparability relation for , a subset M T is a -module of
T if T M. Furthermore, M is minimal if no M0 ( M is a -module of T .
3</p>
    </sec>
    <sec id="sec-3">
      <title>Module Extraction via Datalog Reasoning</title>
      <p>
        In this section, we present our approach to module extraction by reduction into
a reasoning problem in datalog. Our approach builds on recent techniques that
exploit datalog engines for ontology reasoning [
        <xref ref-type="bibr" rid="ref16">16, 26, 31</xref>
        ]. In what follows, we
x an arbitrary TBox T and signature Sig(T ). Unless otherwise stated,
our de nitions and theorems are parameterised by such T and . We assume
w.l.o.g. that rules in T share no existentially quanti ed variables. For simplicity,
we also assume that T contains no constants (all results extend to constants [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]).
Our overall strategy to extract a module M of T for an inseparability relation
z , with z 2 fm; q; f; ig, can be summarised by the following steps:
1. Pick a substitution mapping all existentially quanti ed variables in T to
constants, and transform T into a datalog program P by (i)
Skolemising all rules in T using and (ii) turning disjunctions into conjunctions
while splitting them into di erent rules, thus replacing each function-free
disjunctive rule of the form '(x) ! Win=1 i(x) with datalog rules '(x) !
1(x); : : : ; '(x) ! n(x).
2. Pick a -ABox A0 and materialise P [ A0.
3. Pick a set Ar of \relevant facts" in the materialisation and compute the
supporting rules in P for each such fact.
4. The module M consists of all rules in T that yield a supporting rule in P.
      </p>
      <p>Thus, M is fully determined by the substitution and the ABoxes A0, Ar.
The main intuition behind our module extraction approach is that we can pick
, A0 and Ar (and hence M) such that each proof of a -consequence ' of T
to be preserved can be embedded in a forward chaining proof 0 in P [ A0 of a
relevant fact in Ar. Such an embedding satis es the key property that, for each
rule r involved in , at least one corresponding datalog rule in P is involved in
0. In this way we ensure that M contains the necessary rules to entail '. This
approach, however, does not ensure minimality of M: since P is a strengthening
of T there may be proofs of a relevant fact in P [ A0 that do not correspond to
a -consequence of T , which may lead to unnecessary rules in M.</p>
      <p>To illustrate how our strategy might work in practice, consider T ex in Fig. 1,
= fB; C; D; Gg, and that we want a module M that is -implication
inseparable from T ex. This is a simple case since ' = D(x) ! G(x) is the only non-trivial
-implication entailed by T ex; thus, for M to be a module we only need M j= '.
Note that proving T ex j= ' amounts to proving T ex [ fD(a)g j= G(a) (with a a
fresh constant). Figure 2(a) depicts a hyper-resolution tree showing how G(a)
can be derived from the clauses corresponding to r4{r6 and D(a), with rule r4</p>
      <p>r6
(a)
r0
4
D(a)
transformed into clauses r40 = D(x) ! S(x; f (x3)) and r400 = D(x) ! E(f (x3)).
Hence M = fr4{r6g is a -implication inseparable module of T ex, and as G(a)
cannot be derived from any subset of fr4{r6g, M is also minimal.</p>
      <p>We pick A0 to contain the initial fact D(a), Ar to contain the fact to be
proved G(a), and we make map variable y3 in r4 to a fresh constant c, in which
case rule r4 corresponds to the rules D(x) ! S(x; c) and D(x) ! E(c) in P.</p>
      <p>Figure 2(b) depicts a forward chaining proof 0 of G(a) in P [ fD(a)g. As
shown, can be embedded in 0 via by mapping functional terms over f to the
fresh constant c. This way, the rules involved in are mapped to the datalog rules
involved in 0 via . Hence, we will extract the (minimal) module M = fr4{r6g.
The substitution and the ABoxes A0 and Ar, which determine the extracted
module, can be chosen in di erent ways to ensure the preservation of di erent
kinds of -consequences. The following notion of a module setting captures in
a declarative way the main elements of our approach.</p>
      <p>De nition 3.1. A module setting for T and is a tuple = h ; A0; Ari with
a substitution from existentially quanti ed variables in T to constants, A0 a
-ABox, Ar a (Sig(T ) [ f?g)-ABox, and s.t. no constant in occurs in T .</p>
      <p>The program of is the smallest datalog program P containing, for each
r = '(x) ! 9y:[Win=1 i(x; y)] in T , the rule ' ! ? if n = 0 and all rules
' ! for each 1 i n and each atom in i. The support of is the set
of rules r 2 P that support a fact from Ar in P [ A0. The module M of is
the set of rules in T that have a corresponding datalog rule in the support of .
3.3</p>
      <p>Modules for each Inseparability Relation
We next consider each inseparability relation z , where z 2 fm; q; f; ig, and
formulate a speci c setting z which provably yields a z -module of T .
Implication Inseparability The example in Sect. 3.1 suggests a natural
setting i = h ; A0; Ari that guarantees implication inseparability. As in our
example, we pick to be as \general" as possible by Skolemising each existentially
quanti ed variable to a fresh constant. For A and B predicates of the same arity
n, proving that T entails a -implication ' = A(x1; : : : ; xn) ! B(x1; : : : ; xn),
amounts to showing that T [ fA(a1; : : : ; an)g j= B(a1; : : : ; an) for fresh constants
a1; : : : ; an. Thus, following the ideas of our example, we initialise A0 with a fact
A(c1A; : : : ; cnA) for each n-ary predicate A 2 , and Ar with a fact B(c1A; : : : ; cnA)
for each pair of n-ary predicates fB; Ag with B 6= A.</p>
      <p>De nition 3.2. For each existentially quanti ed variable yj in T , let cyj be a
fresh constant. Furthermore, for each A 2 of arity n, let c1A; : : : ; cnA be also
fresh constants. The setting i = h i; Ai0; Airi is de ned as follows:
{ i = f yj 7! cyj j yj existentially quanti ed in T g,
{ Ai0 = fA(c1A; : : : ; cnA) j A n-ary predicate in g, and
{ Air = fB(c1A; : : : ; cnA) j A 6= B n-ary predicates in g [ f?g.</p>
      <p>
        The setting i is reminiscent of the datalog encodings typically used to check
whether a concept A is subsumed by concept B w.r.t. a \lightweight" ontology
T [
        <xref ref-type="bibr" rid="ref18">18, 26</xref>
        ]. There, variables in rules are Skolemised as fresh constants to produce
a datalog program P and it is then checked whether P [ fA(a)g j= B(a).
Theorem 3.3. M i i T .
      </p>
      <p>Fact Inseparability The setting i in Def. 3.2 cannot be used to ensure fact
inseparability. Consider again T ex and = fB; C; D; Gg, for which M i =
fr4; r5; r6g. For A = fB(a); C(a)g we have T ex [ A j= G(a) but M i [ A 6j= G(a),
and hence M i is not fact inseparable from T ex.</p>
      <p>More generally, M i will only preserve -fact entailments T [ A j= where
A is a singleton. However, for a module to be fact inseparable from T it must
preserve all -facts when coupled with any -ABox. We achieve this by choosing
A0 to be the critical ABox for , which consists of all facts that can be
constructed using and a single fresh constant [21]. Since every -ABox can be
homomorphically mapped into the critical -ABox, we can show that all proofs
of a -fact in T [ A are embeddable into a proof of a relevant fact in P [ A0.
De nition 3.4. Let constants cyi be as in Def. 3.2, and let be a fresh constant.
The setting f = h f ; Af0; Afri is de ned as follows: (i) f = i, (ii) Af0 =
f A( ; : : : ; ) j A 2 g, and (iii) Afr = Af0 [ f?g.</p>
      <p>The datalog programs for i and f coincide; hence, the only di erence between
the two settings is in the de nition of their corresponding ABoxes. In our
example, both Af0 and Afr contain facts B( ), C( ), D( ), and G( ). Clearly, P f [A0 j=
G( ) and the proof additionally involves rule r3. Thus M f = fr3; r4; r5; r6g.
Theorem 3.5. M f f T .</p>
      <p>Query Inseparability Positive existential queries constitute a much richer
query language than facts as they allow for existentially quanti ed variables.
Thus, the query inseparability requirement invariably leads to larger modules.</p>
      <p>For instance, let T = T ex and = fA; Bg. Given the -ABox A = fA(a)g
and -query q = 9y:B(y) we have T ex [A j= q (due to rule r1). The module M f
is, however, empty. Indeed, the materialisation of P f [fA( )g consists of the
additional facts R( ; cy1 ) and B(cy1 ) and hence contains no relevant fact mentioning
only . Thus, M f [ A 6j= q and M f is not query inseparable from T ex.</p>
      <p>Our example suggests that, although the critical ABox is constrained enough
to embed every -ABox, we need to consider additional relevant facts to capture
all proofs of a -query. In particular, rule r1 implies that B contains an instance
whenever A does: a dependency that is then checked by q. This can be captured
by considering fact B(cy1 ) as relevant, in which case r1 would be in the module.</p>
      <p>More generally, we consider a module setting that di ers from f only in
that all -facts (and not just those over ) are considered as relevant.
De nition 3.6. Let constants qcy=i and be as in Def. 3.4. The setting q =
h q; Aq0; Arqi is as follows: (i) f , (ii) Aq0 = Af0, and (iii) Arq consists of ?
and all -facts A(a1; : : : ; an) with each aj either a constant cyi or .
Theorem 3.7. M q q T .</p>
      <p>Model Inseparability The modules generated by q may not be model
inseparable from T . To see this, let T = T ex and = fA; D; Rg, in which
case M q = fr1; r2g. The interpretation I where I = fa; bg, AI = fag,
BI = CI = fbg, DI = ; and RI = f(a; b)g is a model of M q . However, I cannot
be extended to a model of r3 (and hence of T ) without reinterpreting A, R or D.</p>
      <p>To achieve model inseparability, it su ces to ensure that each model of the
module can be extended to a model of T in a uniform way. Thus, M = fr1; r2; r3g
is a m -module of T ex since all its models can be extended by interpreting E, F
and G as the domain, H as empty, and S as the Cartesian product of the domain.
We can capture this idea in our framework by means of the following setting.
De nition 3.8. The setting m = h m; A0m; Armi is as follows: m maps each
existentially quanti ed yj to the fresh constant , A0m = A0, and Arm = A0 [f?g.
f m
In our example, P m [ A0m entails the relevant facts A( ), R( ; ) and D( ), and
hence M m = fr1; r2; r3g.</p>
      <sec id="sec-3-1">
        <title>Theorem 3.9. M m</title>
        <p>m T .
3.4</p>
        <p>
          Modules for Ontology Classi cation
Module extraction has been exploited for optimising ontology classi cation [
          <xref ref-type="bibr" rid="ref1 ref4">1, 30,
4</xref>
          ]. In this case, it is not only required that modules are implication inseparable
from T , but also that they preserve all implications A(x) ! B(x) with A 2
but B 2= . This requirement can be captured as given next.
        </p>
        <p>De nition 3.10. TBoxes T and T 0 are -classi cation inseparable (T
if for each ' of the form A(x) ! where A 2 and either = ? or
for B 2 Sig(T [ T 0), we have T j= ' i T 0 j= '.</p>
        <p>c T 0)
= B(x)
Classi cation inseparability is a stronger requirement than implication
inseparability. For T = fA(x) ! B(x)g and = fAg, M = ; is implication inseparable
from T , whereas classi cation inseparability requires that M = T .</p>
        <p>
          Modular reasoners such as MORe [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and Chainsaw [30] rely on locality
?modules, which satisfy this requirement. Each model of a ?-module M can be
extended to a model of T by interpreting all additional predicates as empty,
which is not possible if A 2 and T entails A(x) ! B(x) but M does not. We
can cast ?-modules in our framework with the following setting, which extends
m in Def. 3.8 by also considering as relevant facts involving predicates not in .
De nition 3.11. The setting b = h b; Ab0; Arbi is as follows: b =
A0m, and Ar consists of ? and all facts A( ; : : : ; ) where A 2 Sig(T ).
m, Ab0 =
The use of ?-modules is, however, stricter than is needed for ontology classi
cation. For instance, if we consider T = T ex and = fAg we have that M b
contains all rules r1{r6, but since A does not have any subsumers in T ex the
empty TBox is already classi cation inseparable from T ex.
        </p>
        <p>The following module setting extends i in Def. 3.2 to ensure classi cation
inseparability. As in the case of b in Def. 3.11, the only required modi cation
is to also consider as relevant facts involving predicates outside .
De nition 3.12. Setting c = ( c; Ac0; Acr) is as follows: c = i, Ac0 = Ai0,
and Acr consists of ? and all facts B(c1A; : : : ; cnA) s.t. A 6= B are n-ary predicates,
A 2 and B 2 Sig(T ).</p>
        <p>Indeed, given T = T ex and
Theorem 3.13. M c c T .</p>
        <p>= fAg, the module for c is empty, as desired.
3.5</p>
        <p>
          Additional Properties of Modules
Although the essential property of a module M is that it captures all relevant
-consequences of T , in some applications modules are expected to satisfy
additional requirements. For ontology reuse, it is sometimes desirable that a module
M does not \leave any relevant information behind" in the sense that T nM does
not entail any relevant -consequence|a property known as depletingness [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <sec id="sec-3-1-1">
          <title>De nition 3.14. Let</title>
          <p>is depleting if T n M
z be an inseparability relation. A
z ;.</p>
          <p>z -module M of T
Note that not all modules are depleting: for some relevant -entailment ' we
may have M j= ' (as required by the de nition of module), but also (T nM) j= ',
in which case M is not depleting. The following theorem establishes that all
modules de ned in Sect. 3.3 are depleting.</p>
          <p>Theorem 3.15. M z is depleting for each z 2 fm; q; f; i; cg.</p>
          <p>
            Another application of modules is to optimise the computation of justi
cations: minimal subsets of a TBox that su ce to entail a given formula [
            <xref ref-type="bibr" rid="ref14">14, 29</xref>
            ].
De nition 3.16. Let T j= '. A justi cation for ' in T is a minimal subset
T 0 T such that T 0 j= '.
          </p>
          <p>Justi cations are displayed in ontology development platforms as explanations
of why an entailment holds, and tools typically compute all of them. Extracting
justi cations is a computationally intensive task, and locality-based modules
have been used to reduce the size of the problem: if T 0 is a justi cation of ' in
T , then T 0 is contained in a locality module of T for = Sig('). Our modules
are also justi cation-preserving, and we can adjust our modules depending on
what kind of rst-order sentence ' is.</p>
          <p>Theorem 3.17. Let T 0 be a justi cation for a rst-order sentence ' in T and
let Sig(') . Then, T 0 M m . Additionally, the following properties hold:
(i) if ' is a rule, then T 0 M q ; (ii) if ' is datalog, then T 0 M f ; and
(iii) if ' is of the form A(x) ! B(x) or A(x) ! ?, then T 0 M i ; nally, if '
satis es A 2 ; B 2 Sig(T ), then T 0 M c .
3.6</p>
          <p>Complexity of Module Extraction
We conclude this section by showing that our modules can be e ciently
computed in most practically relevant cases.</p>
          <p>Theorem 3.18. Let m be a non-negative integer and L a class of TBoxes s.t.
each rule in a TBox from L has at most m distinct universally quanti ed
variables. The following problem is tractable: given z 2 fq; f; i; cg, T 2 L, and r 2 T ,
decide whether r 2 M z . The problem is tractable for arbitrary L if z = m.</p>
          <p>
            We now provide a proof sketch for this result. Checking whether a datalog
program P and an ABox A entail a fact is feasible in O(jPj nv), with n the
number of constants in P [ A and v the maximum number of variables in a rule
from P [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. Thus, although datalog reasoning is exponential in the size of v (and
hence of P), it is tractable if v is bounded by a constant.
          </p>
          <p>Given arbitrary T and , and for z 2 fm; q; f; i; cg, the datalog program P z
can be computed in linear time in the size of jT j. The number of constants n in z
(and hence in P z [Az0) is linearly bounded in jT j, whereas the maximum number
of variables v coincides with the maximum number of universally quanti ed
variables in a rule from T . As shown in [31], computing the support of a fact in a
datalog program is no harder than fact entailment, and thus module extraction in
our approach is feasible in O(jT j nv), i.e., tractable for ontology languages where
rules have a bounded number of variables (as is the case for most DLs). Finally,
for z = m the setting m involves a single constant and module extraction boils
down to propositional datalog reasoning (a tractable problem regardless of T ).
3.7</p>
          <p>Module Containment and Optimality
Intuitively, the more expressive the language for which preservation of
consequences is required the larger modules need to be. The following proposition
shows that our modules are consistent with this intuition.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Proposition 3.19. M i</title>
        <p>M f</p>
        <p>M q</p>
        <p>M m</p>
        <p>M b ; M i</p>
        <p>Finally, we ask ourselves whether each z with z 2 fq; f; i; m; cg is optimal for
its inseparability relation in the sense that there is no setting producing smaller
modules. To make such statements precise we need to consider families of module
settings, i.e., functions that assign a module setting to each pair of T and .
De nition 3.20. A setting family is a function that maps a TBox T and
signature to a module setting for T and . Family is uniform if (i) for
every and pair T ; T 0 with the same number of existentially quanti ed variables,
(T ; ) and (T 0; ) are isomorphic; (ii) for every T and 0, (T ; )
is a restriction of (T ; 0) to . Let z 2 fi; f; q; m; cg; then is z-admissible
if, for each T and , M (T ; ) is a z -module of T . Finally, is z-optimal if
M (T ; ) M 0(T ; ) for every T , and every uniform 0 that is z-admissible.
Uniformity ensures that settings do not depend on the speci c shape of rules in
T , but rather only on and the number of existentially quanti ed variables in T .
In turn, admissibility ensures that each setting yields a module. The (uniform
and admissible) family z for each setting z in Sects. 3.3 and 3.4 is de ned in
the obvious way: for each T and , z(T ; ) is the setting z for T and .</p>
        <sec id="sec-3-2-1">
          <title>Theorem 3.21.</title>
          <p>z is z-optimal for z 2 fi; m; cg.</p>
          <p>In contrast, q and f are not optimal. To see this, let T = fA(x) !
B(x); B(x) ! A(x)g and = fAg. The empty TBox is fact inseparable from
T since the only -consequence of T is the tautology A(x) ! A(x). However,
M f = T since fact A(a) is in Afr and its support is included in the module.
One can provide a family of settings that distinguishes tautological from
nontautological inferences; however, this family yields settings of exponential size in
jT j, which is undesirable in practice.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>
        We have implemented a prototype system for module extraction, called PrisM,
that uses RDFox [22] for datalog materialisation and exploits some code from
PAGOdA [31] for computing the support of an entailed fact. We have evaluated
PrisM on a set of ontologies identi ed by Glimm et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] as non-trivial for
reasoning. We have normalised all ontologies to make axioms equivalent to rules.1
      </p>
      <p>
        We compared the size of our modules with the locality-based modules
computed using the OWL API. We have followed the experimental methodology
from [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] where two kinds of signatures are considered: genuine signatures
corresponding to the signature of individual axioms, and random signatures. Unlike
in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], we extracted random signatures using a randomised graph sampling
algorithm provided by RDFox. The advantage of this approach is that the resulting
1 The ontologies used in our experiments are available for download at
https://krr-nas.cs.ox.ac.uk/2015/jair/PrisM/testOntologies.zip.
signatures are \semantically connected", which is likely to be the case in
practical applications. For each type of signature and ontology, we took a sample
of 400 runs and averaged module sizes. Random signatures were obtained from
samples of size 1=1000 the size of the original ontology. We present results for
the 7 largest ontologies from [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] in terms of signature (this selection is
representative for the behaviour on the smaller ontologies; see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). All experiments
were performed on a server with 2 Intel Xeon E5-2670 processors and 90GB of
allocated RAM, running RDFox on 16 threads.
      </p>
      <p>Table 1 summarises our results. We compared ?-modules with the modules
for c (Sect. 3.4) and &gt;? -modules with those for m, q, f , and i (Sect. 3.3).
We can see that module size consistently decreases as we consider weaker
inseparability relations. In some cases, the modules for c are over 3 orders of
magnitude smaller than ?-modules. The di erence between &gt;? -modules and
i modules can also be considerable, reaching 2 orders of magnitude. In fact,
c, f , and i modules are sometimes empty, meaning that no two predicates
in are in an implication relationship (which can happen for large ontologies
and small ). Also note that our modules for model inseparability slightly
improve on &gt;? -modules. Finally, recall that our modules may not be minimal for
their inseparability relation. Since techniques for extracting minimal modules are
available only for model inseparability, and for restricted languages, we could not
assess how close our modules are to minimal ones.</p>
      <p>Computation times were generally higher for c than for the other settings
due to the larger signature involved in computing modules for c. For random
signatures, average times (over all settings) were 54s for GALEN, 2s for FLY, 5s
for FMA, 13s for Gazetteer, 7s for Molecule, 32s for SNOMED, and 44s for NCI,
which is considerably higher than for locality-based modules, but still acceptable
for some applications. For genuine signatures, the times were accordingly lower.</p>
      <p>It should be noted that PrisM currently relies on RDFox, which is not
optimised for module extraction and is used in a black-box fashion. We conjecture
that the performance of our approach can be considerably improved by
dedicated systems. Another interesting task would be integrating our techniques
in existing modular reasoners as well as in systems for justi cation extraction.
Finally, the issue of optimality of our approach requires further investigation.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This work was supported by the Royal Society, the EPSRC projects MaSI3,
Score!, DBOnto, and ED3, and by the EU FP7 project OPTIQUE.
19. Ludwig, M.: Just: a tool for computing justi cations w.r.t. ELH ontologies. In:</p>
      <p>ORE. pp. 1{7 (2014)
20. Lutz, C., Wolter, F.: Deciding inseparability and conservative extensions in the
description logic EL. J. Symb. Comput. 45(2), 194{228 (2010)
21. Marnette, B.: Generalized schema-mappings: From termination to tractability. In:</p>
      <p>PODS. pp. 13{22 (2009)
22. Motik, B., Nenov, Y., Piro, R., Horrocks, I., Olteanu, D.: Parallel materialisation
of datalog programs in centralised, main-memory RDF systems. In: AAAI. pp.
129{137 (2014)
23. Nortje, R., Britz, K., Meyer, T.: Reachability modules for the description logic</p>
      <p>SRIQ. In: LPAR. pp. 636{652 (2013)
24. Sattler, U., Schneider, T., Zakharyaschev, M.: Which kind of module should I
extract? In: DL (2009)
25. Seidenberg, J., Rector, A.L.: Web ontology segmentation: Analysis, classi cation
and use. In: WWW. pp. 13{22 (2006)
26. Stefanoni, G., Motik, B., Horrocks, I.: Introducing nominals to the combined query
answering approaches for EL. In: AAAI. pp. 1177{1183 (2013)
27. Stuckenschmidt, H., Parent, C., Spaccapietra, S. (eds.): Modular Ontologies:
Concepts, Theories and Techniques for Knowledge Modularization, LNCS, vol. 5445.</p>
      <p>Springer (2009)
28. Suntisrivaraporn, B.: Module extraction and incremental classi cation: A
pragmatic approach for ontologies. In: ESWC. pp. 230{244 (2008)
29. Suntisrivaraporn, B., Qi, G., Ji, Q., Haase, P.: A modularization-based approach
to nding all justi cations for OWL DL entailments. In: ASWC. pp. 1{15 (2008)
30. Tsarkov, D., Palmisano, I.: Chainsaw: A metareasoner for large ontologies. In: ORE
(2012)
31. Zhou, Y., Nenov, Y., Cuenca Grau, B., Horrocks, I.: Pay-as-you-go OWL query
answering using a triple store. In: AAAI. pp. 1142{1148 (2014)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Armas</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Cuenca Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          , I.:
          <article-title>MORe: Modular combination of OWL reasoners for ontology classi cation</article-title>
          .
          <source>In: ISWC</source>
          . pp.
          <volume>1</volume>
          {
          <issue>16</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Armas</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Kaminski</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>Ontology module extraction via datalog reasoning</article-title>
          .
          <source>In: AAAI</source>
          . pp.
          <volume>1410</volume>
          {
          <issue>1416</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Armas</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Kaminski</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>Module extraction in expressive ontology languages via datalog reasoning</article-title>
          .
          <source>J. Artif. Intell. Res</source>
          . (
          <year>2016</year>
          ), to appear
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Halaschek-Wiener</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Kazakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Suntisrivaraporn</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>Incremental classi cation of description logics ontologies</article-title>
          .
          <source>J. Autom. Reason</source>
          .
          <volume>44</volume>
          (
          <issue>4</issue>
          ),
          <volume>337</volume>
          {
          <fpage>369</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Kazakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <surname>U.</surname>
          </string-name>
          :
          <article-title>Just the right amount: Extracting modules from ontologies</article-title>
          .
          <source>In: WWW</source>
          . pp.
          <volume>717</volume>
          {
          <issue>726</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Kazakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <surname>U.</surname>
          </string-name>
          :
          <article-title>Modular reuse of ontologies: Theory and practice</article-title>
          .
          <source>J. Artif. Intell. Res</source>
          .
          <volume>31</volume>
          ,
          <issue>273</issue>
          {
          <fpage>318</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dantsin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eiter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voronkov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Complexity and expressive power of logic programming</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>33</volume>
          (
          <issue>3</issue>
          ),
          <volume>374</volume>
          {
          <fpage>425</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Del</given-names>
            <surname>Vescovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Klinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Parsia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            ,
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Tsarkov</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>Empirical study of logic-based modules: Cheap is cheerful</article-title>
          . In: DL. pp.
          <volume>144</volume>
          {
          <issue>155</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Del</given-names>
            <surname>Vescovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Parsia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            ,
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>The modular structure of an ontology: Atomic decomposition</article-title>
          .
          <source>In: IJCAI</source>
          . pp.
          <volume>2232</volume>
          {
          <issue>2237</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gatens</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Konev</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Lower and upper approximations for depleting modules of description logic ontologies</article-title>
          .
          <source>In: ECAI</source>
          . pp.
          <volume>345</volume>
          {
          <issue>350</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>HermiT: An OWL 2 reasoner</article-title>
          . J.
          <string-name>
            <surname>Autom</surname>
          </string-name>
          . Reason.
          <volume>53</volume>
          (
          <issue>3</issue>
          ),
          <volume>245</volume>
          {
          <fpage>269</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>LogMap: Logic-based and scalable ontology matching</article-title>
          .
          <source>In: ISWC</source>
          . pp.
          <volume>273</volume>
          {
          <issue>288</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berlanga</surname>
            <given-names>Llavori</given-names>
          </string-name>
          , R.:
          <article-title>Safe and economic re-use of ontologies: A logic-based methodology and tool support</article-title>
          .
          <source>In: ESWC</source>
          . pp.
          <volume>185</volume>
          {
          <issue>199</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horridge</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirin</surname>
          </string-name>
          , E.:
          <article-title>Finding all justi cations of OWL DL entailments</article-title>
          . In: ISWC. pp.
          <volume>267</volume>
          {
          <issue>280</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Konev</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walther</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Model-theoretic inseparability and modularity of description logic ontologies</article-title>
          .
          <source>Artif. Intell</source>
          .
          <volume>203</volume>
          ,
          <issue>66</issue>
          {
          <fpage>103</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kontchakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The combined approach to ontology-based data access</article-title>
          .
          <source>In: IJCAI</source>
          . pp.
          <volume>2656</volume>
          {
          <issue>2661</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Kontchakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Logic-based ontology comparison and module extraction, with an application to DL-Lite</article-title>
          .
          <source>Artif. Intell</source>
          .
          <volume>174</volume>
          (
          <issue>15</issue>
          ),
          <volume>1093</volume>
          {
          <fpage>1141</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. Krotzsch,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Rudolph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          : ELP:
          <article-title>Tractable rules for OWL 2</article-title>
          . In: ISWC. pp.
          <volume>649</volume>
          {
          <issue>664</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>