Module Extraction via Query Inseparability in OWL 2 QL

    Boris Konev1 , Roman Kontchakov2 , Michel Ludwig1 , Thomas Schneider3 ,
                   Frank Wolter1 , and Michael Zakharyaschev2
1
    University of Liverpool, UK {konev,michel.ludwig,wolter}@liverpool.ac.uk
         2
           Birkbeck College London, UK {roman,michael}@dcs.bbk.ac.uk
     3
       University of Bremen, Germany tschneider@informatik.uni-bremen.de


       Abstract. We show that deciding conjunctive query inseparability for
       OWL 2 QL ontologies is PSpace-hard and in ExpTime. We give polyno-
       mial-time (incomplete) algorithms and demonstrate by experiments that
       they can be used for practical module extraction.


1     Introduction
Ontology-based data access (OBDA) has recently emerged as one of the most
interesting and challenging applications of description logic. The key idea is to
use ontologies for enriching data with background knowledge, and thereby en-
able query answering over incomplete and semistructured data via a high-level
conceptual interface. The W3C recognised the importance of OBDA by includ-
ing in the OWL 2 Web Ontology Language the profile OWL 2 QL, which was
designed for OBDA with relational database systems. OWL 2 QL is based on a
description logic that was originally introduced under the name DL-LiteR [5, 6]
and called DL-LiteH core in the more general classification [1]. It can be described
as an optimal sub-language of SROIQ, underlying OWL 2, which includes most
of the features of conceptual models, and for which query answering can be done
in AC0 for data complexity. Thus, DL-LiteH     core is becoming a major language
for developing ontologies, and a target language for translation and approxima-
tion of existing ontologies formulated in more expressive DLs [11, 4]. One of
the consequences of this development is that DL-LiteH    core ontologies turn out to
be larger and more complex than originally envisaged. As a result, reasoning
support for ontology engineering tasks such as composing, re-using, comparing,
and extracting ontologies—which so far has been only analysed for expressive
DLs [7, 12], EL [10] and DL-Lite dialects without role inclusions [9]—is becoming
increasingly important for DL-LiteH core as well.
    In the context of OBDA, the basic notion underlying many ontology engi-
neering tasks is Σ-query inseparability: for a signature (a set of concept and role
names) Σ, two ontologies are deemed to be inseparable if they give the same
answers to any conjunctive query over any data formulated in Σ. Thus, in ap-
plications using Σ-queries and data, one can safely replace any ontology by a
Σ-query inseparable one. Note that the relativisation to Σ is very important
here. For example, one cannot expect modules of an ontology to be query insep-
arable from the whole ontology for arbitrary queries and data sets, whereas this
should be the case if we restrict the query and data language to the module’s
signature or a specified subset thereof. Similarly, when comparing two versions
of one ontology, the subtle and potentially problematic differences are those that
concern queries over their common symbols, rather than all symbols occurring in
these versions. In applications where ontologies are built using imported parts, a
stronger notion of inseparability is required: two ontologies are strongly Σ-query
inseparable if they give the same answers to Σ-queries and data when imported
to an arbitrary context ontology formulated in Σ.
     The aim of this paper is to (i) investigate the computational complexity of
deciding (strong) Σ-query inseparability for DL-LiteH    core ontologies, (ii) develop
efficient (though incomplete) algorithms for practical inseparability checking,
and (iii) analyse the performance of the algorithms for the challenging task of
minimal module extraction.
     One of our surprising discoveries is that the analysis of Σ-query insepara-
bility for DL-LiteH core ontologies requires drastically different logical tools com-
pared with the previously considered DLs. It turns out that the new syntactic
ingredient—the interaction of role inclusions and inverse roles—makes deciding
(strong) query inseparability PSpace-hard, as opposed to the known coNP and
Π2p -completeness results for DL-Lite dialects without role inclusions [9]. On the
other hand, the obtained ExpTime upper bound is actually the first known
decidability result for strong inseparability, which goes beyond the ‘essentially’
Boolean logic and might additionally indicate a way of solving the open problem
of strong Σ-query inseparability for EL [10]. For DL-Litecore ontologies (without
role inclusions), strong Σ-query inseparability is shown to be only NLogSpace-
complete. We give (incomplete) polynomial-time algorithms checking (strong)
Σ-inseparability and demonstrate, by a set of minimal module extraction exper-
iments, that they are (i ) complete for many existing DL-LiteH    core ontologies and
signatures, and (ii ) sufficiently fast to be used in module extraction algorithms
that require thousands of Σ-query inseparability checks. All omitted proofs can
be found at www.dcs.bbk.ac.uk/~roman/owl2ql-modules.


2    Σ-Query Entailment and Inseparability
We begin by formally defining DL-LiteH   core , underlying OWL 2 QL, and the no-
tions of Σ-query inseparability and entailment. The language of DL-LiteH     core
contains countably infinite sets of individual names ai , concept names Ai , and
role names Pi . Roles R and concepts B of this language are defined by:

    R    ::=   Pi    |    Pi− ,       B    ::=    ⊥     |   >    |   Ai   |   ∃R.

A DL-LiteH
         core TBox, T , is a finite set of inclusions

        B1 v B2 ,        R1 v R2 ,      B1 u B2 v ⊥,            R1 u R2 v ⊥,

where B1 , B2 are concepts and R1 , R2 roles. An ABox, A, is a finite set of asser-
tions of the form B(ai ), R(ai , aj ) and ai 6= aj , where ai and aj are individual
names, B a concept and R a role. Ind(A) will stand for the set of individual names
occurring in A. Taken together, T and A constitute the DL-LiteH          core knowledge
base (KB, for short) K = (T , A). The sub-language of DL-LiteH        core without role
inclusions R1 v R2 is denoted by DL-Litecore [6]. The semantics of DL-LiteH         core
is defined as usual in DL [2]. We only note that, in interpretations I = (∆I , ·I ),
we do not have to comply with the UNA, that is, we can have aIi = aIj for
i 6= j. We write I |= α to say that an inclusion or assertion α is true in I. The
interpretation I is a model of a KB K = (T , A) if I |= α for all α ∈ T ∪ A. K is
consistent if it has a model. A concept B is said to be T -consistent if (T , {B(a)})
has a model. K |= α means that I |= α for all models I of K.
    A conjunctive query (CQ) q(x1 , . . . , xn ) is a first-order formula
                          ∃y1 . . . ∃ym ϕ(x1 , . . . , xn , y1 , . . . , ym ),
 where ϕ is constructed, using only ∧, from atoms of the form B(t) and R(t1 , t2 ),
 with B being a concept, R a role, and ti being an individual name or a variable
 from the list x1 , . . . , xn , y1 , . . . , ym . The variables in ~x = x1 , . . . , xn are called
 answer variables of q. We say that an n-tuple ~a ⊆ Ind(A) is an answer to q in
 an interpretation I if I |= q[~a] (here we regard I to be a first-order structure);
~a is a certain answer to q over a KB K = (T , A) if I |= q[~a] for all models I of
 K; in this case we write K |= q[~a].
     To define the main notions of this paper, consider two KBs K1 = (T1 , A) and
 K2 = (T2 , A). For example, the Ti are different versions of some ontology, or one
 of them is a refinement of the other by means of new axioms. The question we
 are interested in is whether they give the same answers to queries formulated in
 a certain signature, say, in the common vocabulary of the Ti or in a vocabulary
 relevant to an application. To be precise, by a signature, Σ, we understand
 any finite set of concept and role names. A concept (inclusion, TBox, etc.) all
 concept and role names of which are in Σ is called a Σ-concept (inclusion, etc.).
 We say that K1 Σ-query entails K2 if, for all Σ-queries q(~x) and all ~a ⊆ Ind(A),
 K2 |= q[~a] implies K1 |= q[~a]. In other words: any certain answer to a Σ-query
 given by K2 is also given by K1 . As the ABox is typically not fixed or known at
 the ontology design stage, we may have to compare the TBoxes over arbitrary
 Σ-ABoxes rather than a fixed one, which gives our central definition:
Definition 1. Let T1 and T2 be TBoxes and Σ a signature. T1 Σ-query entails
T2 if (T1 , A) Σ-query entails (T2 , A) for any Σ-ABox A. T1 and T2 are Σ-query
inseparable if they Σ-query entail each other, in which case we write T1 ≡Σ T2 .
In many applications, Σ-query inseparability is enough to ensure that T1 can be
safely replaced by T2 . However, if they are developed as part of a larger ontology
or are meant to be imported in other ontologies, a stronger notion is required:
Definition 2. T1 strongly Σ-query entails T2 if T1 ∪ T Σ-query entails T2 ∪ T ,
for all Σ-TBoxes T . T1 and T2 are strongly Σ-query inseparable if they strongly
Σ-query entail each other, in which case we write T1 ≡sΣ T2 .
The following example illustrates the difference between Σ-query and strong
Σ-query inseparability. For further discussion and examples, consult [7, 9].
Example 3. Let T1 = ∅, T2 = {> v ∃R, ∃R− v B, B u A v ⊥} and Σ = {A}.
T1 and T2 are Σ-query inseparable. However, they are not strongly Σ-query
inseparable. Indeed, for the Σ-TBox T = {> v A}, T1 ∪ T is consistent, while
T2 ∪ T is inconsistent, and so T1 ∪ T does not Σ-query entail T2 ∪ T , as witnessed
by the query q = ⊥.

3    Σ-Query Entailment and Σ-Homomorphisms
In this section, we characterise Σ-query entailment between DL-LiteH   core TBoxes
semantically in terms of (partial) Σ-homomorphisms between certain canonical
models. Then, in the next section, we use this characterisation to investigate the
complexity of deciding Σ-query entailment.
    The canonical model, MK , of a consistent KB K = (T , A) gives correct
answers to all CQs. In general, MK is infinite; however, it can be folded up into
a small generating model GK = (IK , K ) consisting of a finite interpretation
                                                                          ∗
IK and a generating relation       K that defines the unfolding. Let vT be the
reflexive and transitive closure of the role inclusion relation given by T , and let
[R] = {S | R v∗T S and S v∗T R}. We write [R] ≤T [S] if R v∗T S; thus, ≤T
is a partial order on the set {[R] | R a role in T }. For each [R], we introduce a
witness w[R] and define a generating relation K on the set of these witnesses
together with Ind(A) by taking:
 – a K w[R] if a ∈ Ind(A) and [R] is ≤T -minimal such that K |= ∃R(a) and
   K 6|= R(a, b) for all b ∈ Ind(A);
 – w[S] K w[R] if [R] is ≤T -minimal with T |= ∃S − v ∃R and [S − ] 6= [R].
A role R is generating in K if there are a ∈ Ind(A) and R1 , . . . , Rn = R such
that a K w[R1 ] K · · · K w[Rn ] . The interpretation IK is defined as follows:
    ∆IK = Ind(A) ∪ {w[R] | R is generating in K},
    aIK = a, for all a ∈ Ind(A),
    AIK = {a ∈ Ind(A) | K |= A(a)} ∪ {w[R] | T |= ∃R− v A},
    P IK = {(a, b) ∈ Ind(A) × Ind(A) | there is R(a, b) ∈ A s.t. [R] ≤T [P ]} ∪
            {(x, w[R] ) | x    K w[R] and [R] ≤T   [P ]} ∪
            {(w[R] , x) | x    K w[R] and [R] ≤T   [P − ]}.
GK can be constructed in polynomial time in |K|, and it is not hard to see that
IK |= K. To construct the canonical model MK giving the correct answers to
all CQs, we unfold the generating model GK = (IK , K ) along K . A path in
GK is a finite sequence aw[R1 ] · · · w[Rn ] , n ≥ 0, such that a ∈ Ind(A), a K w[R1 ]
and w[Ri ] K w[Ri+1 ] , for i < n. Denote by path(GK ) the set of all paths in GK
and by tail(σ) the last element in σ ∈ path(GK ). MK is defined by taking:
                              ∆MK = path(GK ),
                              aMK = a, for all a ∈ Ind(A),
                              AMK = {σ | tail(σ) ∈ AIK },
              P MK = {(a, b) ∈ Ind(A) × Ind(A) | (a, b) ∈ P IK } ∪
                         {(σ, σ · w[R] ) | tail(σ)          K w[R] ,   [R] ≤T [P ]} ∪
                         {(σ · w[R] , σ) | tail(σ)          K w[R] ,   [R] ≤T [P − ]}.

Example 4. For T1 = {A v ∃S, ∃S − v ∃T, ∃T − v ∃T, T v R} and
K1 = (T1 , {A(a)}), the models GK1 and MK1 look as follows ( K1 in GK1 is
shown as     ):
                                         R, T
         A      S            R, T
 GK1
         a          wS                    wT
         A      S            R, T               R, T
MK1                                                            ...
          a         awS             awS wT             awS wT wT

 Theorem 5. For all consistent DL-LiteH   core KBs K = (T , A), CQs q(~
                                                                      x) and
~a ⊆ Ind(A), we have K |= q[~a] iff MK |= q[~a].

Thus, to decide Σ-query entailment between KBs K1 and K2 , it suffices to check
whether MK2 |= q[~a] implies MK1 |= q[~a] for all Σ-queries q(~x) and tuples ~a.
This relationship between MK2 and MK1 can be characterised semantically in
terms of finite Σ-homomorphisms. For an interpretation I and a signature Σ,
the Σ-types tIΣ (x) and r IΣ (x, y), for x, y ∈ ∆I , are given by:

   tIΣ (x) = {Σ-concept B | x ∈ B I },            r IΣ (x, y) = {Σ-role R | (x, y) ∈ RI }.
                                                                                         0
A Σ-homomorphism from an I to I 0 is a function h : ∆I → ∆I such that
              0                                                         0
h(aI ) = aI , for all individual names a interpreted in I, tIΣ (x) ⊆ tIΣ (h(x)) and
                  0
r IΣ (x, y) ⊆ r IΣ (h(x), h(y)), for all x, y ∈ ∆I .
      It is well-known that answers to conjunctive Σ-queries are preserved under
Σ-homomorphisms. Thus, if there is a Σ-homomorphism from MK2 to MK1 ,
then K1 Σ-query entails K2 . However, the converse does not hold in general.

Example 6. Take T1 from Example 4, and let T2 result from replacing R in T1
with R− . Let Σ = {A, R} and Ki = (Ti , {A(a)}). Then the Σ-reduct of MK1
does not contain a Σ-homomorphic image of the Σ-reduct of MK2 , depicted be-
low. On the other hand, it is easily seen that T1 and T2 are Σ-query inseparable.
              A                     R−                 R−
    MK2        a                                                       ...

Note that the Σ-reduct of MK2 contains points that are not reachable from
the ABox by Σ-roles. In fact, using König’s Lemma, one can show that if every
point in MK2 is reachable from the ABox by a path of Σ-roles, then K1 Σ-query
entails K2 iff there exists a Σ-homomorphism from MK2 to MK1 .

We say that I is finitely Σ-homomorphically embeddable into I 0 if, for every
finite sub-interpretation I1 of I, there exists a Σ-homomorphism from I1 to I 0 .
Theorem 7. Let K1 and K2 be consistent DL-LiteH   core KBs. Then K1 Σ-query
entails K2 iff MK2 is finitely Σ-homomorphically embeddable into MK1 .
Theorem 7 does not yet give a satisfactory semantic characterisation of Σ-
query entailment between TBoxes, as one still has to consider infinitely many
Σ-ABoxes. However, using the fact that inclusions in DL-LiteH core , different from
disjointness axioms, involve only one concept or role in the left-hand side and
making sure that the TBoxes entail the same Σ-inclusions, one can show that it
is enough to consider singleton Σ-ABoxes of the form {B(a)}. Denote the mod-
els G(T ,{B(a)}) and M(T ,{B(a)}) by GTB and MBT , respectively. We thus obtain
the following characterisation of Σ-entailment between DL-LiteH  core TBoxes:

Theorem 8. T1 Σ-query entails T2 iff

(p) T2 |= α implies T1 |= α, for all Σ-inclusions α;
(h) MB                                                   B
       T2 is finitely Σ-homomorphically embeddable into MT1 , for all T1 -con-
   sistent Σ-concepts B.

By applying condition (p) to B v ⊥, we obtain that every T1 -consistent Σ-
concept B is also T2 -consistent.


4   Complexity of Σ-Query Entailment

We use Theorem 8 to show that deciding Σ-query entailment for DL-LiteH     core
TBoxes is PSpace-hard and in ExpTime. Recall that subsumption in DL-LiteH   core
is NLogSpace-complete [6, 1]; so condition (p) of Theorem 8 can be checked in
polynomial time. And, since there are at most 2·|Σ| singleton Σ-ABoxes, we can
concentrate on the complexity of checking finite Σ-homomorphic embeddability
of canonical models for singleton ABoxes.
    We begin by considering DL-Litecore , where the existence of Σ-homomorph-
isms between canonical models can be expressed in terms of the types of their
points; cf. [9]. Let T1 and T2 be DL-Litecore TBoxes and Σ a signature.

Theorem 9. T1 Σ-query entails T2 iff (p) holds and, for every T1 -consistent
                                B                   B        IB         IB
Σ-concept B and every x ∈ ∆IT2 , there is x0 ∈ ∆IT1 with tΣT2 (x) ⊆ tΣT1 (x0 ).

   The criterion of Theorem 9 can be checked in polynomial time, in NLog-
Space, to be more precise. Thus:
Theorem 10. Checking Σ-query entailment for TBoxes in DL-Litecore is com-
plete for NLogSpace.
However, if role inclusions become available, the picture changes dramatically:
not only do we have to compare the Σ-types of points in the canonical models,
but also the Σ-paths to these points. To illustrate, consider the generating models
G1 , G2 in Fig. 1, where the arrows represent the generating relations, and the
concept names A, Xi0 , Xi1 and the role names R and Tj are all symbols in Σ.
The model G2 contains 4 R-paths from a to w, which are further extended by
the infinite Tj -paths. The paths π from a to w can be homomorphically mapped
to distinct R-paths h(π) in G1 starting from a. But the extension of such a π
with the infinite Tj -chain can only be mapped first to a suffix of h(π) (backward,
along Tj− )—because we have to map paths in the unfolding M2 of G2 to paths
in M1 —and then to a Tj -loop in G1 . But to check whether this can be done, we
may have to ‘remember’ the whole path π.

                          X11                                    X31
             G2                        R                                    R
                    R                                     R
             a                                                                      w
         A
                   R                   R                  R                 R               T2
                          X10                                    X30        T1

                                T1                  T2
                                                                    T1             T2
                           T1          T1                 T2

                     −
                          X11        R,Tj−   X21         R,Tj−   X31      R,Tj−     X41
                  R,T j                         −                   −                   −
                                                Tj                  Tj                Tj
         A                                   R,
                                             R                   R,
                                                                 R                 R,
                                                                                   R
             a                                 ,T −                ,T −              ,T −
                  R,T −                         j                   j                 j

             G1      j
                          X10        R,Tj−   X20         R,Tj−   X30      R,Tj−     X40

                    Fig. 1. Σ-reducts of generating models G2 and G1 .


    To see thatVG1 and G2 can be given by DL-LiteH               core TBoxes, fix a QBF
                       m
Q1 X1 . . . Qn Xn j=1 Cj , where Qi ∈ {∀, ∃} and C1 , . . . , Cm are clauses over the
variables X1 , . . . , Xn . Let Σ = {A, Xi0 , Xi1 , R, Tj | i ≤ n, j ≤ m}, T1 contain the
inclusions

                  A v ∃S0− ,                   −
                                             ∃Si−1 v ∃Qki ,
         ∃(Qki )− v Xik ,                           Qki v Si ,                    Si v R,
              Xik v ∃Rj          if k = 0, ¬Xi ∈ Cj or k = 1, Xi ∈ Cj ,
             ∃Rj− v ∃Rj ,                           Rj v Tj ,                     Si v Tj− ,

and let T2 contain the inclusions
                                                                   (
                                                                        ∃Qki ,     if Qi = ∀,
                    A v ∃S0− ,                          −
                                                      ∃Si−1 v
                                                                        ∃Si ,      if Qi = ∃,
             ∃(Qki )− v Xik ,                             Qki v Si ,                        Si v R,
                  ∃Sn− v ∃Pj ,                           ∃Pj− v ∃Pj ,                       Pj v Tj ,

for all i ≤ n, j ≤ m, k = 1, 2. The generating models GTA1 and GTA2 , restricted
to Σ, look like G1 and G2 in Fig. 1, respectively. Moreover, one can show that
MA                                                         A
   T2 is (finitely) Σ-homomorphically embeddable into MT1 iff the QBF above
is satisfiable. As satisfiability of QBFs is PSpace-complete, we obtain:
Theorem 11. Σ-query entailment for DL-LiteH
                                          core TBoxes is PSpace-hard.
On the other hand, the problem whether MK2 is finitely Σ-homomorphically
embeddable into MK1 can be reduced to the emptiness problem for alternating
two-way automata, which belongs to ExpTime [13]. In a way similar to [13,
8], where these automata were employed to prove ExpTime-decidability of the
modal µ-calculus with converse and the guarded fixed point logic of finite width,
one can use their ability to ‘remember’ paths (in the sense illustrated in the
example above) to obtain the ExpTime upper bound:
Theorem 12. Σ-query entailment for DL-LiteH
                                          core TBoxes is in ExpTime.
The precise complexity of Σ-query entailment for DL-LiteH        core TBoxes is still
unknown. Recall that deciding Σ-query entailment for DL-LiteN         horn is coNP-
complete [9]. Compared to DL-LiteH                N
                                    core , DL-Litehorn allows (unqualified) number
restrictions and conjunctions in the left-hand side of concept inclusions, but does
not have role inclusions: DL-LiteN               H
                                  horn ∩ DL-Litecore = DL-Litecore . CQ answering
          0
is in AC for data complexity in all three languages under the UNA. However,
the computational properties of these logics become different as far as Σ-query
entailment is concerned: NLogSpace-complete for DL-Litecore , coNP-complete
for DL-LiteN                                                          H
             horn , and between PSpace and ExpTime for DL-Litecore . It may be
                                                         N
of interest to note that Σ-query entailment for DL-Litebool , allowing full Booleans
as concept constructs, is Π2p -complete.
    Let us consider strong Σ-query entailment. It is easy to construct an expo-
nential-time algorithm checking strong Σ-query entailment between DL-LiteH     core
TBoxes T1 and T2 : enumerate all Σ-TBoxes T and check whether T1 ∪T Σ-query
entails T2 ∪ T . As there are quadratically many Σ-inclusions, this algorithm calls
                                          2
the Σ-query entailment checker ≤ 2|Σ| times. We now show that one can do
much better than that. First, it turns out that instead of expensive Σ-query
entailment checks for the TBoxes Ti ∪ T , it is enough to check consistency (in
polynomial time). More precisely, suppose T1 Σ-query entails T2 . One can show
then that T1 does not strongly Σ-query entail T2 iff there exist a Σ-TBox T
and a Σ-concept B such that (T1 ∪ T , {B(a)}) is consistent but (T2 ∪ T , {B(a)})
is not (cf. Example 3). Moreover, checking consistency for all Σ-TBoxes T can
further be reduced—using the primitive form of DL-LiteH  core axioms—to checking
consistency for all singleton Σ-TBoxes T . Thus, we obtain the following:
Theorem 13. Suppose that T1 Σ-query entails T2 . Then T1 does not strongly Σ-
query entail T2 iff there is a Σ-concept B and a Σ-TBox T with a single inclusion
of the form B1 v B2 or R1 v R2 such that (T1 ∪ T , {B(a)}) is consistent but
(T2 ∪ T , {B(a)}) is inconsistent.
So, if we already know that T1 Σ-query entails T2 , then checking whether this
entailment is actually strong can be done in polynomial time (and NLogSpace).

5   Incomplete Algorithm for Σ-Query Entailment
The interplay between role inclusions and inverse roles, required in the proof of
PSpace-hardness, appears to be too artificial compared to how roles are used
in ‘real-world’ ontologies. Thus, in conceptual modelling, the number of roles is
comparable with the number of concepts, but the number of role inclusions is
much smaller. For this reason, instead of a complete (exponential) Σ-query en-
tailment checker, we have implemented a polynomial-time correct but incomplete
algorithm, which is based on testing simulations between transition systems.
    Let T1 and T2 be DL-LiteH core TBoxes, Σ a signature, B a Σ-concept. Denote
Ki = (Ti , {B(a)}) and Ii = IKi , i = 1, 2. A relation ρ ⊆ ∆I2 × ∆I1 is called a
Σ-simulation of GK2 in GK1 if the following conditions hold:
(s1) the domain of ρ is ∆I2 and (aI2 , aI1 ) ∈ ρ;
(s2) tIΣ2 (x) ⊆ tIΣ1 (x0 ), for all (x, x0 ) ∈ ρ;
(s3) if x K2 w[R] and (x, x0 ) ∈ ρ, then there is y 0 ∈ ∆I1 such that (w[R] , y 0 ) ∈ ρ
    and S ∈ r IΣ1 (x0 , y 0 ) for every Σ-role S with [R] ≤T2 [S].
We call ρ a forward Σ-simulation if it satisfies (s1), (s2) and the condition
(s30 ), which strengthens (s3) with the extra requirement: y 0 = w[T ] , for some
role T , with x0 K1 w[T ] and [T ] ≤T1 [S] for every Σ-role S with [R] ≤T2 [S].
Example 14. In Example 6, there is a Σ-simulation of GK2 in GK1 , but no forward
Σ-simulation. The same applies to G2 and G1 in the proof of the PSpace bound.
In contrast to finite Σ-homomorphic embeddability of MK2 in MK1 , the problem
of checking the existence of (forward) Σ-simulations of GK2 in GK1 is tractable
and well understood from the literature on program verification [3]. Consider
now the following conditions, which can be checked in polynomial time:
(y) condition (p) holds and there is a forward Σ-simulation of GTB2 in GTB1 , for
   every T1 -consistent Σ-concept B;
(n) condition (p) does not hold or there is no Σ-simulation of GTB2 in GTB1 , for
   any T1 -consistent Σ-concept B.
Theorem 15. Let T1 , T2 be DL-LiteH  core TBoxes and Σ a signature. If (y) holds,
then T1 Σ-query entails T2 . If (n) holds, then T1 does not Σ-query entail T2 .
    Thus, an algorithm checking conditions (y) and (n) can be used as a correct
but incomplete Σ-query entailment checker. It cannot be complete since neither
(y) nor (n) holds in Example 14. On the other hand, condition (n) proves to be
a criterion of Σ-query entailment in two important cases:
Theorem 16. Let (a) T1 , T2 be DL-Litecore TBoxes, or (b) T1 = ∅ and T2 a
DL-LiteH
       core TBox. Then condition (n) holds iff T1 does not Σ-query entail T2 .


6    Experiments
Checking (strong) Σ-query entailment has multiple applications in ontology ver-
sioning, re-use, and extraction. We have used the algorithms, suggested by The-
orems 15 and 13, for minimal module extraction to see how efficient they are
in practice and whether the incompleteness of the (y)–(n) conditions is prob-
lematic. Extracting minimal modules from medium-sized real-world ontologies
requires thousands of calls of the (strong) Σ-query entailment checker, and thus
provides a tough test for our approach.
   For a TBox T and a signature Σ, a subset M ⊆ T is

 – a Σ-query module of T if M ≡Σ T ;
 – a strong Σ-query module of T if M ≡sΣ T ;
 – a depleting Σ-query module of T if ∅ ≡sΣ∪sig(M) T \ M, where sig(M) is the
   signature of M.

We are concerned with computing a minimal (w.r.t. ⊆) Σ-query (MQM), a mini-
mal strong Σ-query (MSQM), and the (uniquely determined) minimal depleting
Σ-query (MDQM) module of T . The general extraction algorithms, which call
Σ-query entailment checkers, are taken from [9]. For MQMs and MSQMs, the
number of calls to the checker coincides with the number of inclusions in T . For
MDQMs (where one of the TBoxes given to the checker is empty, and so the
checker is complete, by Theorem 16), the number of checker calls is quadratic in
the number of inclusions in T .
    We extracted modules from OWL 2 QL approximations of 3 commercial soft-
ware applications called Core, Umbrella and Mimosa (the original ontologies use
a few axioms that are not expressible OWL 2 QL). Mimosa is a specialisation
of the MIMOSA OSA-EAI specification4 for container shipping. Core is based
on a supply-chain management system used by the bookstore chain Ottakar’s
(now merged with Waterstone’s), and Umbrella on a research data validation
and processing system used by the Intensive Care National Audit and Research
Centre.5 The original Core and Umbrella were used for the experiments in [9].
           ontology           Mimosa Core Umbrella IMDB LUBM
           concept inclusions    710 1214    1506     45  136
           role inclusions        53   19      13     21    9
           concept names         106   82      79     14   43
           role names            145   76      64     30   31
For comparison, we extracted modules from OWL 2 QL approximations of the
well-known IMDB and LUBM ontologies. For each of these ontologies, we ran-
domly generated 20 signatures Σ of 5 concept and 5 roles names. We extracted
Σ-MQMs, MSQMs, MDQMs as well as the >⊥-module [7] from the whole Mi-
mosa, IMBD and LUBM ontologies. For the larger Umbrella and Core on-
tologies, we first computed the >⊥-modules, and then employed them to fur-
ther extract MQMs, MSQMs, MDQMs, which are all contained in the >⊥-
modules. The average size of the resulting modules and its standard devia-
tion is shown below. Details of the experiments and ontologies are available
at www.dcs.bbk.ac.uk/~roman/owl2ql-modules. Here we briefly comment on
efficiency and incompleteness. Checking Σ-query inseparability turned out to be
very fast: a single call of the checker never took more than 1s for our ontologies.
For strong Σ-query inseparability, the maximal time was less than 1 min. For
4
    htpp://www.mimosa.org/?q=resources/specs/osa-eai-v321
5
    http://www.icnarc.org
comparisons with the empty TBox, the maximal time for strong Σ-query insep-
arability tests was less than 10s. In the hardest case, Mimosa, the average total
extraction times were 2.5 mins for MQMs, 140 mins for MSQMs, and 317 mins
for MDQMs. Finally, only in 9 out of about 75,000 calls, the Σ-query entail-
ment checker was not able to give a certain answer due to incompleteness of the
(y)–(n) condition, in which case the inclusions in question were added to the
module.
     Core (1233)                Mimosa (763)                Umbrella (1519)               IMDB (66)                LUBM (145)
                                                                                  391
                  375375                                                                                                        34 34
                                                     105
                                              90                          315                                30    31 32
                                                                                                      25
                                                                                         20 20
                                       56
                                 47
     83 87                                                   98 101
                   MDQM


                                              MDQM


                                                                           MDQM


                                                                                                      MDQM


                                                                                                                                MDQM
           MSQM


                                       MSQM


                                                                   MSQM


                                                                                               MSQM


                                                                                                                         MSQM
     MQM


                                 MQM


                                                             MQM


                                                                                         MQM


                                                                                                                   MQM
                          >⊥M


                                                      >⊥M


                                                                                   >⊥M


                                                                                                             >⊥M


                                                                                                                                       >⊥M
References
 [1] Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite fam-
     ily and relations. Journal of Artificial Intelligence Research 36, 1–69 (2009)
 [2] Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.
     (eds.): The Description Logic Handbook. Cambridge University Press (2003)
 [3] Baier, C., Katoen, J.-P.: Principles of Model Checking. MIT Press (2007)
 [4] Botoeva, E., Calvanese, D., Rodriguez-Muro, M.: Expressive approximations in
     DL-Lite ontologies. In: Proc. of AIMSA. pp. 21–31. Springer (2010).
 [5] Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Data com-
     plexity of query answering in description logics. In: Proc. of KR. pp. 260–270
     (2006)
 [6] Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable
     reasoning and efficient query answering in description logics: The DL-Lite family.
     Journal of Automated Reasoning 39(3), 385–429 (2007)
 [7] Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontolo-
     gies: Theory and practice. Journal of Artificial Intelligence Research 31, 273–318
     (2008)
 [8] Grädel, E., Walukiewicz, I.: Guarded fixed point logic. In: Proc. of LICS. pp.
     45–54 (1999)
 [9] Kontchakov, R., Wolter, F., Zakharyaschev, M.: Logic-based ontology comparison
     and module extraction, with an application to DL-Lite. Artificial Intelligence 174,
     1093–1141 (2010)
[10] Lutz, C., Wolter, F.: Deciding inseparability and conservative extensions in the
     description logic EL. Journal of Symbolic Computation 45(2):194–228 (2010)
[11] Pan, J.Z., Thomas, E.: Approximating OWL-DL Ontologies. In: Proc. of AAAI.
     pp. 1434–1439 (2007)
[12] Stuckenschmidt, H., Parent, C., Spaccapietra, S. (eds.): Modular Ontologies,
     LNCS, vol. 5445. Springer (2009)
[13] Vardi, M.Y.: Reasoning about the past with two-way automata. In: Proc. of
     ICALP. LNCS, vol. 1443, pp. 628–641. Springer (1998)