=Paper= {{Paper |id=Vol-1193/paper_6 |storemode=property |title=Abstraction Refinement for Ontology Materialization |pdfUrl=https://ceur-ws.org/Vol-1193/paper_6.pdf |volume=Vol-1193 |dblpUrl=https://dblp.org/rec/conf/dlog/GlimmKLTV14 }} ==Abstraction Refinement for Ontology Materialization== https://ceur-ws.org/Vol-1193/paper_6.pdf
    Abstraction Refinement for Ontology Materialization

Birte Glimm1 , Yevgeny Kazakov1 , Thorsten Liebig2 , Trung-Kien Tran1 , and Vincent
                                    Vialard2
       1
           University of Ulm, Ulm, Germany, .@uni-ulm.de
                   2
                     derivo GmbH, Ulm, Germany, @derivo.de



       Abstract. We present a new procedure for ontology materialization (comput-
       ing all entailed instances of every atomic concept) in which reasoning over a
       large ABox is reduced to reasoning over a smaller “abstract” ABox. The abstract
       ABox is obtained as the result of a fixed-point computation involving two stages:
       1) abstraction: partition the individuals into equivalence classes based on told
       information and use one representative individual per equivalence class, and 2)
       refinement: iteratively split (refine) the equivalence classes, when new assertions
       are derived that distinguish individuals within the same class. We prove that the
       method is complete for Horn ALCHOI ontologies, that is, all entailed instances
       will be derived once the fixed-point is reached. We implement the procedure in
       a new database-backed reasoning system and evaluate it empirically on existing
       ontologies with large ABoxes. We demonstrate that the obtained abstract ABoxes
       are significantly smaller than the original ones and can be computed with few re-
       finement steps.


1    Introduction

Ontology based data access (OBDA) is an increasingly popular paradigm in the area
of knowledge representation and information systems. In ODBA, a TBox with back-
ground knowledge is used to enrich and integrate large, incomplete, and possibly semi-
structured data, which users can then access via queries. Different approaches have
been developed to compute the entailed answers to a query: 1) Query rewriting ap-
proaches answer a query by “compiling” the background knowledge of the TBox into
the query [2,11]. 2) Materialization techniques take the opposite approach by pre-
computing all entailed information upfront, independent of the queries [15]. After ex-
tending the ABox with all pre-computed facts, the unmodified queries can be evaluated
over the enriched data only (i.e., without considering the schema). 3) Recently, also
combined approaches have been proposed, which allow for smaller rewritten queries
and are applicable for more expressive languages [10,9,4,13].
    How queries can be answered also depends on what kinds of queries are allowed.
Classical conjunctive queries include existential variables, which can be satisfied also
by individuals that do not occur in the input ontology, but whose existence can be in-
ferred. Answering such queries often requires reasoning at query time even for fully
materialized ontologies. Many approaches including the entailment regimes of the Se-
mantic Web query language SPARQL [6] are based on the simpler form of conjunctive
instance queries. Such queries only involve named individuals from the input ontology
                             Table 1. The syntax of ALCHOI

         Concepts C ::= > | ⊥ | A | o | ¬C | C1 u C2 | C1 t C2 | ∃R.C | ∀R.C
         Roles    R ::= R | R−
         Axioms C1 v C2 , R1 v R2 , C(a), R(a, b)


and can be reduced to instance retrieval. Hence, answering such queries requires only
look-ups over a materialized ABox. This is also the setting considered in this paper, i.e.,
our goal is the materialization of entailed facts for large ABoxes stored in a database.
    Our work is based on an assumption that individuals with similar asserted types are
likely to have the same inferred types. We can group such individuals into equivalence
classes and compute the types just for one representative individual. This results in a so-
called abstraction that contains one representative individual per equivalence class plus
representative individuals for its direct role successors and predecessors. Derivations for
the latter individuals can be used to split/refine equivalence classes, in which individuals
no longer have the same assertions. The procedure is complete when the fixed point is
reached, i.e., no new derivations are made and no refinements happen. The number
of individuals in the abstraction is bounded exponentially in the number of different
concepts and roles and linearly in the size of the original ABox; hence the abstraction
is relatively small when the number of individuals is much larger than the number of
concepts and roles used in the ontology.
    We implement the technique in a database-backed system that interacts with a
highly optimized in-memory reasoner that materializes the abstract ABox. The database
engine needs to support only simple operations and does not require any knowledge of
the TBox. We show that the procedure is sound and it is complete for computing the
entailed types of individuals in Horn ALCHOI ontologies.
    After the preliminaries, we present the theoretical foundation for our approach (Sec-
tion 3) and prove its completeness (Section 4). In Section 5, we evaluate the procedure
on a range of real-world ontologies with large ABoxes. We then discuss related work
(Section 6) and conclude (Section 7). Full proofs and further details are available in a
technical report [5].

2   Preliminaries
We recall the syntax of the Description Logic (DL) ALCHOI in Table 1; its semantics
is defined as usual [1]. For an ontology O, we use con(O), rol(O), and ind(O) as the
sets of concept names, role names, and individual names occurring in O, respectively.
We say that O is concept materialized if A(a) ∈ O whenever O |= A(a), A ∈ con(O),
and a ∈ ind(O); O is role materialized if R(a, b) ∈ O whenever O |= R(a, b), R ∈
rol(O), a, b ∈ ind(O); O is materialized if it is both concept and role materialized.
Remark 1. We present nominals as primitive symbols, which are disjoint from individ-
uals. Some definitions use a special nominal constructor {a} with individual a (in this
          I
case, {a} = {aI }). We can convert such ontologies to our representation by renaming
every nominal {a} with the corresponding nominal symbol oa and adding a concept
assertion oa (a).
3     Computing ABox Materialization by Abstraction
The typical OBDA scenario is such that the ABox contains a large number of individu-
als. If we can identify individuals that yield the same consequences, we can compute the
materialization by computing entailed consequences only for representative individuals.

3.1   Bi-homomorphic Individuals and Individual Types
In order to (syntactically) characterize individuals that yield the same consequences, we
study structure-preserving transformations of ABoxes.
Definition 1. Let A and B be two ABoxes and h : ind(A) → ind(B) a mapping from
the individuals in A to individuals in B. We extend h to axioms in a straightforward
way: h(C(a)) = C(h(a)), h(R(a, b)) = R(h(a), h(b)), and h(α) = α for other
axioms α. We say that h is a homomorphism (from A to B) if h(A) ⊆ B. An individual
a in A is homomorphic to an individual b in B if there exists a homomorphism h from
A to B such that h(a) = b; in addition, if b is homomorphic to a, then a and b are
bi-homomorphic.
Example 1. Consider the ABox A = {R(a, a), R(a, b), R(b, b)}. Then the mappings
h1 = {a 7→ b, b 7→ b} and h2 = {a 7→ a, b 7→ a} are homomorphisms from A to A.
Thus, a and b are bi-homomorphic. Note that there is no isomorphism h from A to A
(a bijective homomorphism such that its inverse is also a homomorphism) such that
h(a) = b or h(b) = a.
  It is easy to show that entailed axioms are preserved under homomorphisms between
ABoxes. In particular, bi-homomorphic individuals are instances of the same concepts.
Lemma 1. Let h : ind(A) → ind(B) be a homomorphism between ABoxes A and B.
Then for every TBox T and every axiom α, A ∪ T |= α implies B ∪ T |= h(α).
Proof. We show that Lemma 1 even holds for SROIQ without unique name as-
sumption. Suppose that A ∪ T |= α. Then h(A ∪ T ) |= h(α). Since h(A ∪ T ) =
h(A) ∪ h(T ) = h(A) ∪ T ⊆ B ∪ T , by monotonicity we obtain B ∪ T |= h(α). t
                                                                           u
Corollary 1. If individuals a and b in an ABox A are bi-homomorphic, then for every
TBox T and every concept C, we have A ∪ T |= C(a) if and only if A ∪ T |= C(b).
    By Corollary 1, bi-homomorphic individuals entail the same assertions w.r.t. every
TBox. This property is too strong for our purpose as we need to deal with just one given
TBox. It can be that many (non bi-homomorphic) individuals are still materialized in a
same way. To take this into account, instead of partitioning the individuals according to
the bi-homomorphism relation we start with an approximation to this relation.
Definition 2. Let A be an ABox. The type of an individual a (w.r.t. A) is a triple
tp(a) = (tp↓ (a), tp→ (a), tp← (a)) where tp↓ (a) = {C | C(a) ∈ A}, tp→ (a) = {R |
∃b : R(a, b) ∈ A}, and tp← (a) = {S | ∃c : S(c, a) ∈ A}.
Intuitively, the type of an individual is defined as a triple consisting of its asserted con-
cepts, successor roles, and predecessor roles in the ABox. Note that bi-homomorphic
individuals have the same types, so the relation between individuals of the same types
is an approximation to the bi-homomorphism relation.
3.2   Abstraction of an ABox
If we compress the ABox by simply merging all individuals with the same type into one,
we might obtain unexpected entailments, even if all individuals are bi-homomorphic.

Example 2. Consider the following ABox A = {R(a, b), R(b, a)}. Clearly, a and b are
bi-homomorphic in A. Let B = {R(a, a)} be obtained from A by replacing individual b
with a, and let T = {> v B t C, ∃R.B v C}. It is easy to check that B ∪ T |= C(a),
but A ∪ T 6|= C(a) (and hence A ∪ T 6|= C(b)).

Instead of merging all individuals with the same type into one, we realize every indi-
vidual type in our abstract ABox.

Definition
S           3 (ABox Abstraction). The abstraction of an ABox A is an ABox B =
  a∈ind(A) tp(a) , where for each type tp = (tp↓ , tp→ , tp← ), we define Btp = {C(xtp ) |
           B
                      R                       S                                   R         S
C ∈ tp↓ } ∪ {R(xtp , ytp ) | R ∈ tp→ } ∪ {S(ztp , xtp ) | S ∈ tp← }, where xtp , ytp , and ztp
are fresh distinguished abstract individuals.

Example 3. Consider the ABox A = {A(a), A(d), R(a, b), R(a, e), R(b, c), R(b, e),
R(c, a), R(d, c), R(e, d)}. We have tp(b) = tp(c) = tp(e) = tp1 = (∅, {R}, {R}) and
tp(a) = tp(d) = tp2 = ({A}, {R}, {R}). The abstraction of A is B = Btp1 ∪ Btp2 with
                  R          R                                          R          R
Btp1 = {R(xtp1 , ytp 1
                       ), R(ztp1
                                 , xtp1 )}, Btp2 = {A(xtp2 ), R(xtp2 , ytp 2
                                                                             ), R(ztp 2
                                                                                        , xtp2 )}.

Intuitively, the abstraction of an ABox is a disjoint union of small ABoxes witnessing
each individual type realized in the ABox. The following lemma shows the soundness
of concept assertions derived from the abstraction.
Lemma 2. Let A be an ABox, B its abstraction, and T a TBox. Then for every type
tp = (tp↓ , tp→ , tp← ), every a ∈ ind(A) with tp(a) = tp w.r.t. A, and every concept C:
(1) B ∪ T |= C(xtp ) implies A ∪ T |= C(a),
                R
(2) B ∪ T |= C(ytp ) and R(a, b) ∈ A implies A ∪ T |= C(b), and
                S
(3) B ∪ T |= C(ztp ) and S(c, a) ∈ A implies A ∪ T |= C(c).

Proof. Consider all mappings h : ind(B) → ind(A) such that h(xtp ) ∈ {a ∈ ind(A) |
                  R                                       S
tp(a) = tp}, h(ytp   ) ∈ {b | R(h(xtp ), b) ∈ A}, and h(ztp ) ∈ {c | S(c, h(xtp )) ∈ A}.
Clearly, h(B) ⊆ A for every such mapping h. Furthermore, for every a ∈ ind(A),
                                                                                 R
every R(a, b) ∈ A and every S(c, a) ∈ A, there exists h with h(xtp ) = a, h(ytp    ) = b,
        S
and h(ztp ) = c for tp = tp(a). Hence, claims (1)–(3) follow by Lemma 1.               t
                                                                                       u

3.3   Abstraction Refinement
Individuals from an ABox A may correspond to several abstract individuals in the ABox
abstraction B: Each individual a corresponds to the abstract individual xtp for tp =
tp(a). In addition, if R(b, a) ∈ A or S(a, b) ∈ A for some individual b, then a also
                 R       S                                                          R
corresponds to ytp and ztp respectively for tp = tp(b). The additional individuals ytp and
 S
ztp were introduced intentionally to refine the initial abstraction when new assertions of
abstract individuals are derived, which in turn, can be used to derive new assertions of
individuals in A. Specifically, if we add the new assertion according to cases (2) and (3)
of Lemma 2, we may obtain different assertions for individuals that previously had the
same types. Hence, adding the newly derived assertions using Lemma 2 may refine the
types of the original individuals and, in turn, result in a new abstraction, for which new
assertions can be derived once again.
    The above suggests the following materialization procedure based on abstraction
refinement. Given an ontology O = A ∪ T we proceed as follows:

 1. Build an initial abstraction B of A according to Definition 3.
 2. Materialize B ∪ T using a reasoner.
 3. Extend A with the newly derived assertions according to Lemma 2.
 4. Update the types of the individuals in A and re-compute its abstraction B.
 5. Repeat from Step 2 until no new assertions can be added to A.

Example 4 (Example 3 continued). Let AI the ABox A from Example 3 and a TBox
T = {A v ∀R.B, B v ∀R− .A}. Let BI be the abstraction B of AI = A computed
                                                                                  R
in Example 3 (see Figure 1). By materializing BI w.r.t. T we get B(ytp             2
                                                                                     ), from which
we obtain AII = AI ∪ {B(b), B(e), B(c)} using Lemma 2. Recomputing the types
of individuals in AII yields tp(b) = tp(c) = tp(e) = tp3 = ({B}, {R}, {R}), while
the types of a and d remain unchanged. The abstraction of AII is thus BII = Btp2 ∪
                                           R        R
Btp3 , where Btp3 = {B(xtp3 ), R(xtp3 , ytp 3
                                              ), R(ztp 3
                                                         , xtp3 )}. By materializing BII , we get
     R
A(ztp3 ), from which we obtain AIII = AII ∪ {A(b)}. We again recompute the types of
individuals in AIII , which gives tp(b) = tp4 = ({A, B}, {R}, {R}), while the types of
the other individuals do not change. The abstraction of AIII is thus BIII = Btp2 ∪ Btp3 ∪
                                                      R            R
Btp4 , where Btp4 = {A(xtp4 ), B(xtp4 ), R(xtp4 , ytp    4
                                                           ), R(ztp 4
                                                                      , xtp4 )}. Materializing BIII
            R            R
yields B(ytp4 ) and A(ztp4 ), which correspond to B(c), B(e), and A(a). However, those
assertions already exist in AIII , so the procedure terminates.

    The abstraction refinement procedure terminates since after every iteration except
the last one, new atomic assertions must be added to A, and there is a bounded number
of such assertions. Specifically, the number of iterations is at most ||ind(O)||×||con(O)||.
The number of realized individual types in every ABox A, and hence the size of every
abstract ABox B, is at most exponential in the number of different concepts and roles
in O. In practice, it is likely to be much smaller since not every possible type is real-
ized. Note also that in practice, it is not necessary to add the newly derived assertions
explicitly to the original ABox—one can recompute the new types using some simple
operations on the sets of individuals (intersection and unions), and keep the derived
assertions only once for every new equivalence class.


4    Completeness

Lemma 2 guarantees the soundness of our procedure. However, in general our proce-
dure is not complete, as demonstrated by the following example.
Example 5. Consider the ABox A = {A(a), R(a, b), B(b)} and the TBox T = {B v
C t D, ∃R.C v C, A u C v ∀R.D}. Note that A ∪ T |= D(b). We have tp(a) =
                                                              Materialized                 Abstractions
                                                            abstract ABoxes        I             II          III
                                                                      ytpR1   {c, e, a, d}
      ABox: A =
                                                                      xtp1 {b, c, e}
                         A(I)                                         ztpR1 {a, b, d}
          (II)
                         a                   (II)   (III)
      B                                 B ,A
                                                                +B    ytpR2    {b, e, c}      {b, e, c} {b, e, c}
                 e                  b
                                                                  A   xtp2     {a, d}         {a, d}      {a, d}
                     d          c                                     ztpR2    {c, e}         {c, e}      {c, e}
            A(I)                    B (II)
                                                                      ytpR3                  {c, e, a, d} {a, d}
                                                                  B   xtp3                   {b, e, c}    {c, e}
      TBox: T =
                                                                +A    ztpR3                  {a, b, d}    {b, d, a}
       A v ∀R.B
       B v ∀R− .A                                               +B    ytpR4                                {c, e}
                                                               A, B   xtp4                                 {b}
                                                                +A    ztpR4                                {a}

Fig. 1. The abstractions I-III produced in Example 4. Each abstraction consists of the ABoxes
corresponding to the four individual types. The inferred assertions are indicated with the “+” sign
and are added to the corresponding original individuals shown in each column. The materialized
assertions in the original ABox are labeled with the first iteration in which they appear.


({A}, {R}, ∅), tp(b) = ({B}, ∅, {R}). Therefore the abstraction B = Btp(a) ∪ Btp(b) ,
                                        R                                  R
where Btp(a) = {A(xtp(a) ), R(xtp(a) , ytp(a) )}, Btp(b) = {B(xtp(b) ), R(ztp(b) , xtp(b) )}.
Since B∪T does not entail any new atomic concept assertions, our procedure terminates
without producing the entailment A ∪ T |= D(b).
The primary reason for incompleteness in this example is that our abstraction breaks
the ABox into disconnected parts, which cannot communicate the non-deterministic
choices, e.g., for the disjunction C t D. The only communication between ABoxes
happens through the entailment of new assertions. If the ontology language does not
allow such non-deterministic constructors, it is possible to obtain a complete procedure.

4.1    Horn ALCHOI
In this section, we restrict ontologies to a Horn fragment of ALCHOI:
Definition 4 (Horn ALCHOI). An ALCHOI ontology O is Horn if, for every con-
cept assertion D(a) and every axiom C v D, the concepts C and D satisfy, respec-
tively, the following grammar definitions:
                   C ::=> | ⊥ | A | o | C1 u C2 | C1 t C2 | ∃R.C,            (1)
                            D ::=> | ⊥ | A | o | D1 u D2 | ∃R.D | ∀R.D | ¬C.                                          (2)
Intuitively, negations and universal restrictions should not occur negatively, and dis-
junctions should not occur positively. We also allow axioms equivalent to Horn axioms.
   It is well-known that every consistent Horn ontology has a so-called canonical
model that entails exactly the consequences entailed by the ontology. For our purpose,
we require a weaker version of this property.
Theorem 1 (Weak Canonical Model Property for Horn ALCHOI). Every consis-
tent Horn ALCHOI ontology O has a model I such that I |= A(a) implies O |= A(a)
for every atomic concept assertion A(a) with a ∈ ind(O) and A ∈ con(O).
Theorem 1 can be proved using the property that Horn ALCHOI models are closed
under direct products. Then a canonical model is obtained from the direct product of
models refuting (finitely many) atomic non-types.
   Before formulating our completeness result, we need to solve one small technical
problem illustrated in the following example.
Example 6. Consider A = {A(a), B(b), R(a, b)}, T = {A u ∃R.B v C}. Clearly,
A ∪ T |= C(a). We have tp(a) = tp1 = ({A}, {R}, ∅), tp(b) = tp2 = ({B}, ∅, {R}),
                                                                            R
so our abstraction B = Btp1 ∪ Btp2 with Btp1 = {A(xtp1 ), R(xtp1 , ytp        1
                                                                                }, and Btp2 =
                R
{B(xtp2 ), R(ztp2 , xtp2 )}. Since no new atomic concept assertions are entailed by B ∪T ,
our procedure terminates without deriving C(a).
                                          R
    Note that Btp2 ∪ T |= (∃R.B)(ztp       2
                                             ), so there is an entailed assertion, just not an
atomic one. To capture this inference, we introduce a new concept that “defines” ∃R.B.
Specifically, let T 0 = {∃R.B v X, A u X v C} where X is a fresh concept name.
Clearly, T 0 is a conservative extension of T , and from T 0 we can derive a new as-
sertion Btp2 ∪ T 0 |= X(ztp   R
                               2
                                 ). If we now add the corresponding assertion X(a) to A
and recompute the abstraction for the updated type tp(a) = tp3 = ({A, X}, {R}, ∅)
                                                                             R
(tp(b) does not change), we have Btp3 = {A(xtp3 ), X(xtp3 ), R(xtp3 , ytp     3
                                                                                )}, and obtain
         0
Btp3 ∪ T |= C(xtp3 ), which gives us the intended result.
Example 6 suggests that to achieve completeness, we need to represent existential re-
strictions on the left hand side of the axioms using new atomic concepts. Note that
∃R.B v X is equivalent to B v ∀R− .X. Thus we can just require that there are
no existential restrictions on the left hand side of concept inclusions, and all universal
restrictions on the right have only atomic concepts as fillers.
Definition 5 (Normal Form for Horn ALCHOI). Horn ALCHOI axioms D(a) and
C v D are in normal form if they satisfy the following grammar definitions:
                  C ::=> | ⊥ | A | o | C1 u C2 | C1 t C2                                  (3)
                  D ::=> | ⊥ | A | o | D1 u D2 | ∃R.D | ∀R.A | ¬C                         (4)
Intuitively, in addition to the constraints for Horn ALCHOI ontologies in Definition 4,
negative occurrence of existential restrictions are not allowed, and (positive) occur-
rences of universal restrictions can only have concept names as fillers. It is easy to
convert axioms to a normal form using a well-known structural transformation.


4.2   Completeness Proof

We are now ready to show the following completeness result:
Theorem 2. Let O = A ∪ T be a normalized Horn ALCHOI ontology and B the
abstraction of A. A is concept materialized if, for every type tp = (tp↓ , tp→ , tp← ),
every individual a ∈ ind(A) with tp(a) = tp, and every atomic concept A, we have:
(1) B ∪ T |= A(xtp ) implies A(a) ∈ A,
                R
(2) B ∪ T |= A(ytp ) and R(a, b) ∈ A implies A(b) ∈ A, and
                S
(3) B ∪ T |= A(ztp ) and S(c, a) ∈ A implies A(c) ∈ A.
Proof. To prove Theorem 2, we extend the abstraction B of A with new role assertions
R(xtp(a) , xtp(b) ) for every R(a, b) ∈ A. Let us denote this extended abstract ABox by
B 0 . Since, for every C(a) ∈ A, we also have C ∈ tp↓ (a) and, thus, C(xtp(a) ) ∈ B ⊆ B 0 ,
the mapping h : ind(A) → ind(B 0 ) defined by h(a) = xtp(a) is a homomorphism from
A to B 0 . Therefore, by Lemma 1, if A ∪ T |= A(a), then B 0 ∪ T |= A(xtp(a) ). The key
part of the proof is to demonstrate that in this case we also have B ∪ T |= A(xtp(a) ).
That is, the extended abstract ABox B 0 does not entail new atomic concept assertions
compared to B. It follows then that A(a) ∈ A by condition (1) of the theorem. This
implies that A is concept materialized.
     To prove that B 0 entails the same atomic concept assertions as B, we use the re-
maining conditions (2) and (3) of Theorem 2 and the canonical model property formu-
lated in Theorem 1. Note that since new role assertions of the form R(xtp(a) , xtp(b) )
are added to B 0 only if R(a, b) ∈ A, we have R ∈ tp→ (a) and R ∈ tp← (b) by Def-
                                                                  R
inition 2. Therefore, we already had role assertions R(xtp(a) , ytp(a) ) ∈ B and likewise
     R
R(ztp(b) , xtp(b) ) ∈ B for the same role R. Furthermore, by condition (2) of Theorem 2,
                      R
if B ∪ T |= A(ytp(a)     ), then since R(a, b) ∈ A, we also have A(b) ∈ A, and thus
                                                               R
A(xtp(b) ) ∈ B. Likewise, by condition (3), if B ∪ T |= A(ztp(b)   ), then A(xtp(a) ) ∈ B.
The following lemma shows that with these properties for B, after adding the new role
assertion R(xtp(a) , xtp(b) ) to B, no new atomic concept assertions can be entailed.
Lemma 3 (Four-Individual Lemma). Let O be a normalized Horn ALCHOI on-
tology such that {R(x1 , y1 ), R(z2 , x2 )} ⊆ O for some x1 , y1 , z2 , x2 , and R. Further,
assume that for every concept name A we have:
(1) O |= A(y1 ) implies O |= A(x2 ), and
(2) O |= A(z2 ) implies O |= A(x1 ).
Finally, let O0 = O∪{R(x1 , x2 )}. Then for every concept name A and every individual
a, we have O0 |= A(a) implies O |= A(a).
Proof (Sketch). Suppose that O0 |= A(a), we will show O |= A(a). If O is inconsistent
then this holds trivially. Otherwise, there exists a canonical model I of O satisfying
Theorem 1. From I we construct an interpretation I 0 that coincides with I apart from
                                            0
the interpretation of roles S such that: S I = S I ∪ {(xI1 , xI2 )} if O |= R v S; and
  I0
S = S ∪ {(x2 , x1 )} if O |= R v S − . We prove that I 0 |= O0 , which implies
          I        I   I

I 0 |= A(a) since O0 |= A(a), and thus I |= A(a) by definition of I 0 , from which
O |= A(a) follows since I satisfies Theorem 1.                                       t
                                                                                     u
                                                                     R
    By repeatedly applying Lemma 3 for each x1 = xtp(a) , y1 = ytp(a)   , x2 = xtp(b) ,
        R                                                   0
z2 = ztp(b) and R such that R(a, b) ∈ A, we obtain that B entails only those atomic
assertions that are entailed by B, which completes the proof of Theorem 2.          t
                                                                                    u
Table 2. Test suite statistics with the number of atomic concepts in the ontology (#A and #AN
for the normalized ontology), roles (#R), individuals (#I), role (#R(a, b)) and concept (#A(a))
assertions, and the number of entailed concept assertions

     Ontology      #A    #AN     #R           #I     #R(a, b)      #A(a)    #A(a) entailed
     Gazetteer    710   711  15   516 150    604 164               11 112          538 799
     Robert        61   135  80       405      1 089                    1            4 255
     Coburn       719 1 161 109   123 515    237 620              297 002          535 124
     UOBM 1        69    90  35    24 925    153 571               44 680          142 747
     UOBM 10       69    90  35   242 549  1 500 628              434 115        1 381 722
     UOBM 50       69    90  35 1 227 445  7 594 996            2 197 035        6 991 583
     UOBM 100      69    90  35 2 462 012 15 241 909            4 409 891       14 027 772




5     Implementation and Evaluation

To evaluate the feasibility of our approach, we implemented the procedure sketched in
Section 3.3 prototypically in Java. The system relies on a database for storing the ABox
and use external reasoners for materializing the abstractions. We focus on features,
which indicate a potential gain of our approach: the size of abstract ABoxes and the
number of refinements steps. In addition, to see how the abstraction changes, we present
statistics for the first and last abstractions, as well as statistics of individual types.
    We selected non-trivial ontologies, for which we present some metrics in Table 2.3
Gazetteer and Corburn are bio ontologies from the NCBO BioPortal and the Phenoscape
project, respectively. Robert is sophisticated and deliberately serves as a worst case ex-
ample for our approach. UOBM ontologies are generated from the University Ontol-
ogy Benchmark,4 where UOBM n denotes a data set for n universities. Note that the
increase of normalized concepts (AN ) in comparison to the original concepts (A) in Ta-
ble 2 is a rough indicator of TBox complexity, which adds extra workload to reasoners.
    Table 3 shows the results of computing and iteratively refining the abstract ABoxes
until the fixpoint is reached. In general, there are only few refinement steps, and the
abstraction reduces the size of the ABox significantly. This can be observed in particular
for the UOBM with growing ABox sizes. Note that for ontologies such as UOBM 1 or
Roberts family ontology the abstraction can be larger than the original ABox. The latter
ontology also requires more refinement steps. This is a result of the heterogeneity of the
ABox (concept and role assertions) in combination with the complexity of the TBox.
    Our qualitative performance evaluations reveals that the compression degree of an
ABox has a correspondence to the gain in time for materializing the abstract ABoxes.
We compared the respective reasoning times of the original ABox with the sum of
reasoning times for all abstractions using the OWL DL reasoning systems HermiT and
Konclude.5 For ontologies with high compression rate such as Gazetteer and Coburn
the sum of the reasoning times for all abstractions is less than a tenth of the reasoning
 3
   Download and references at http://www.derivo.de/dl14-ontologies/
 4
   http://www.cs.ox.ac.uk/isg/tools/UOBMGenerator/
 5
   See http://www.hermit-reasoner.com and http://www.konclude.com
Table 3. Number of refinement steps, relative sizes of the abstractions, and statistics about indi-
vidual types (averaged over all types) for the first and the last abstraction of ontologies

                  # of  Abstract ABox size (%)                Individual types statistics
    Ontology
                 steps % indiv. %A(·) %R(·, ·)          # indiv. / type #AN / type #R / type
    Gazetteer     2      1 1      12 1 < 1 1 279.76 279.76 0.73 2.56 2.05 2.05
    Robert        7     89 142 2 000 68 26 41  5.00 3.12 0.24 22.16 3.50 3.44
    Coburn        3      3 3 <1 1        1 1 116.96 115.22 2.49 5.27 2.79 2.79
    UOBM 1        3    102 158    21 47 15 23  8.03 5.25 3.02 14.18 7.18 7.35
    UOBM 10       3     41 70      9 21  6 10 21.18 13.98 3.34 15.48 7.58 7.86
    UOBM 50       3     18 32      4 10  3 5 47.88 28.36 3.52 16.29 7.82 8.14
    UOBM 100      3     13 22      3 7   2 3 70.36 41.60 3.59 16.62 7.91 8.24



time for the original ABox and requires at most a quarter of RAM. Roberts, as expected
from the abstraction statistics, shows a degradation in runtime (4 to 8 times) and RAM
usage (up to 2 times) over the abstractions. The sequence of growing UOBM ABoxes
(1, 10, 50, 100 universities) effectively shows the benefits of our abstraction procedure
with large ABoxes. Whereas the runtimes for the abstractions of UOBM 1 are still 2 to
4 times that of the original ABox, the runtimes for UOBM 50 are already down by 50%
after abstraction. The original UOBM 100 ontology could neither be processed within
eight hours by HermiT nor by Konclude with a 32 GB RAM limit, but its abstraction
can easily be materialized, e.g., within 84 seconds and 8 GB RAM by Konclude.
    The purpose of the presented evaluation was to estimate the potential gains of using
abstraction for ontology materialization, and for this reason we did not focus yet on
efficient computation of abstractions and of their refinements. As of now we just re-
compute the abstraction from scratch after each refinement step. This takes, e.g., about
200 seconds for each abstraction of UOBM 100, but there is certainly plenty of room
for optimizations.


6    Related Work

Many approaches have been proposed to handle large ABoxes, e.g., by rule-based ma-
terialization in a (relational) database [14,8]. Since we focus on the abstraction of the
ABox, we discuss work that is closely related to our aims.
    The SHER approach [3] merges similar individuals to obtain a compressed, so-
called summary ABox, which is then used for (refutation-based) consistency checking.
The technique (as well as ours) is based on the observation that individuals with the
same asserted types are likely to have the same entailed types. Since merging in SHER
is only based on asserted concepts, the resulting summary ABox might be inconsistent
even if the original ABox is consistent w.r.t. the TBox. To remedy this, justifications [7]
are used to decide which merges caused the inconsistency and to refine the summary
accordingly. Justification-based refinements are also needed for query answering since
SHER does reasoning at query time. We avoid justification computation by partition-
ing individuals of the same type into equivalence classes. Such partitioning guarantees
the soundness of derived atomic concept assertions. We also have to perform refine-
ment steps, but the refinement is to incrementally derive more consequences. What is
computed before remains sound.
    Wandelt and Möller propose a technique for instance retrieval over SHI ontologies
based on modularization [16]. With modularization they refer to the process of parti-
tioning the ABox such that an OWL reasoner can perform (refutation-based) instance
checking over the smaller partitions. To achieve such partitions, the authors allow for
breaking up role assertions when it can be guaranteed (syntactically) that no concepts
are propagated over them. While this does not reduce the overall size of the data, they
also propose an optimization that uses a notion similar to our types, but extended to also
consider the concept assertions for successors and predecessors. As in our case, conse-
quences derived from the ABoxes for these extended types are sound. Completeness is
only guaranteed if the ABox for the extended type completely covers the partition for
the individual. If that is not the case, the whole partition for the individual has to be
processed by the reasoner. While this optimization is similar to our initial abstraction,
Wandelt and Möller resolve to reasoning over the (possibly large) ABox partition for
an individual if necessary for completeness, whereas we propose to refine the abstrac-
tion instead. The former approach syntactically identifies individual partitions, while
our approach semantically, on-demand propagates concept assertions via refinements.


7   Conclusions and Future Work
We have presented an approach for ontology materialization based on abstraction refine-
ment. The main idea is to partition ABox individuals into equivalent classes such that
information derived for their abstract representatives can be used to refine the abstrac-
tion. Our experiment demonstrates that the approach particularly pays off for ontologies
with large ABoxes and simple TBoxes.
    Currently, our approach is complete for Horn ALCHOI due to the property that
only (deterministically) derived assertions are used for abstraction refinement. We could
potentially extend our approach to non-Horn ontology languages by exploiting the addi-
tional information about non-deterministic “possible” instances such as those provided
by the HermiT reasoner. Directly supporting other Horn features, e.g. cardinality and
role chains, is not possible. However, some could be supported without a considerable
modification of the procedure. For example, it is easy to support transitive roles and role
chains by using the well-known encoding of these axioms via concept inclusions [12].
    In this paper we mainly focus on concept materialization since role materialization
for ALCHOI can be essentially computed by expanding role hierarchies (special care
needs to be taken about nominals though). When ontologies contain role chains and
functional roles, however, materialization of role assertions becomes less trivial (e.g.,
the encoding of role chains is not enough). It is left for future work to investigate how
these features can be fully supported.
    The abstract ABoxes could serve not only as a generic interface for communica-
tion with the reasoner, but also as a compact representation of the materialization. This
can be useful for other reasoning tasks. Specifically, when answering instance and con-
junctive queries over the materialized ABoxes, the abstraction can be used to prune the
search space.
References
 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.): The De-
    scription Logic Handbook: Theory, Implementation, and Applications. Cambridge Univer-
    sity Press, second edn. (2007)
 2. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning
    and efficient query answering in description logics: The DL-Lite family. J. of Automated
    Reasoning 39(3), 385–429 (2007)
 3. Dolby, J., Fokoue, A., Kalyanpur, A., Schonberg, E., Srinivas, K.: Scalable highly expressive
    reasoner (SHER). J. of Web Semantics 7(4), 357–361 (2009)
 4. Eiter, T., Ortiz, M., Simkus, M., Tran, T.K., Xiao, G.: Query rewriting for Horn-SHIQ plus
    rules. In: Proc. of the 26th National Conf. on Artificial Intelligence (AAAI 2012). AAAI
    Press (2012)
 5. Glimm, B., Kazakov, Y., Liebig, T., Tran, T.K., Vialard, V.: Abstraction re-
    finement for ontology materialization. Tech. rep., University of Ulm and derivo
    GmbH (2014), https://www.uni-ulm.de/fileadmin/website_uni_ulm/
    iui.inst.090/Publikationen/2014/abstractionrefinementTR.pdf
 6. Glimm, B., Krötzsch, M.: SPARQL beyond subgraph matching. In: Proc. of the 9th Int.
    Semantic Web Conf. (ISWC 2010). LNCS, vol. 6496, pp. 241–256. Springer (2010)
 7. Kalyanpur, A., Parsia, B., Horridge, M., Sirin, E.: Finding all justifications of OWL DL
    entailments. In: Proc. of the 6th Int. Semantic Web Conf. (ISWC 2007). LNCS, vol. 4825,
    pp. 267–280. Springer (2007)
 8. Kolovski, V., Wu, Z., Eadon, G.: Optimizing enterprise-scale OWL 2 RL reasoning in a
    relational database system. In: Proc. of the Int. Semantic Web Conf. (ISWC 2010). LNCS,
    vol. 6496, pp. 436–452. Springer (2010)
 9. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The combined approach
    to query answering in DL-Lite. In: Proc. of the 12th Int. Conf. on the Principles of Knowl-
    edge Representation and Reasoning (KR 2010). AAAI Press (2010)
10. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description logic EL us-
    ing a relational database system. In: Proc. of the 21st Int. Joint Conf. on Artificial Intelligence
    (IJCAI 2009). pp. 2070–2075 (2009)
11. Rodriguez-Muro, M., Calvanese, D.: High performance query answering over DL-Lite on-
    tologies. In: Proc. of the 13th Int. Conf. on the Principles of Knowledge Representation and
    Reasoning (KR 2012) (2012)
12. Simancik, F.: Elimination of complex RIAs without automata. In: Proc. of the 2012 Int.
    Workshop on Description Logics (DL 2012). CEUR Workshop Proceedings, vol. 846.
    CEUR-WS.org (2012)
13. Stefanoni, G., Motik, B., Horrocks, I.: Introducing nominals to the combined query answer-
    ing approaches for EL. In: Proc. of the 27th National Conf. on Artificial Intelligence (AAAI
    2013). AAAI Press (2013)
14. Urbani, J., Kotoulas, S., Maassen, J., van Harmelen, F., Bal, H.E.: WebPIE: A web-scale
    parallel inference engine using MapReduce. J. Web Semantics 10, 59–75 (2012)
15. Volz, R., Staab, S., Motik, B.: Incrementally maintaining materializations of ontologies
    stored in logic databases. J. Data Semantics 2, 1–34 (2005)
16. Wandelt, S., Möller, R.: Towards ABox modularization of semi-expressive description log-
    ics. J. of Applied Ontology 7(2), 133–167 (2012)