Abstraction Refinement for Ontology Materialization Birte Glimm1 , Yevgeny Kazakov1 , Thorsten Liebig2 , Trung-Kien Tran1 , and Vincent Vialard2 1 University of Ulm, Ulm, Germany, .@uni-ulm.de 2 derivo GmbH, Ulm, Germany, @derivo.de Abstract. We present a new procedure for ontology materialization (comput- ing all entailed instances of every atomic concept) in which reasoning over a large ABox is reduced to reasoning over a smaller “abstract” ABox. The abstract ABox is obtained as the result of a fixed-point computation involving two stages: 1) abstraction: partition the individuals into equivalence classes based on told information and use one representative individual per equivalence class, and 2) refinement: iteratively split (refine) the equivalence classes, when new assertions are derived that distinguish individuals within the same class. We prove that the method is complete for Horn ALCHOI ontologies, that is, all entailed instances will be derived once the fixed-point is reached. We implement the procedure in a new database-backed reasoning system and evaluate it empirically on existing ontologies with large ABoxes. We demonstrate that the obtained abstract ABoxes are significantly smaller than the original ones and can be computed with few re- finement steps. 1 Introduction Ontology based data access (OBDA) is an increasingly popular paradigm in the area of knowledge representation and information systems. In ODBA, a TBox with back- ground knowledge is used to enrich and integrate large, incomplete, and possibly semi- structured data, which users can then access via queries. Different approaches have been developed to compute the entailed answers to a query: 1) Query rewriting ap- proaches answer a query by “compiling” the background knowledge of the TBox into the query [2,11]. 2) Materialization techniques take the opposite approach by pre- computing all entailed information upfront, independent of the queries [15]. After ex- tending the ABox with all pre-computed facts, the unmodified queries can be evaluated over the enriched data only (i.e., without considering the schema). 3) Recently, also combined approaches have been proposed, which allow for smaller rewritten queries and are applicable for more expressive languages [10,9,4,13]. How queries can be answered also depends on what kinds of queries are allowed. Classical conjunctive queries include existential variables, which can be satisfied also by individuals that do not occur in the input ontology, but whose existence can be in- ferred. Answering such queries often requires reasoning at query time even for fully materialized ontologies. Many approaches including the entailment regimes of the Se- mantic Web query language SPARQL [6] are based on the simpler form of conjunctive instance queries. Such queries only involve named individuals from the input ontology Table 1. The syntax of ALCHOI Concepts C ::= > | ⊥ | A | o | ¬C | C1 u C2 | C1 t C2 | ∃R.C | ∀R.C Roles R ::= R | R− Axioms C1 v C2 , R1 v R2 , C(a), R(a, b) and can be reduced to instance retrieval. Hence, answering such queries requires only look-ups over a materialized ABox. This is also the setting considered in this paper, i.e., our goal is the materialization of entailed facts for large ABoxes stored in a database. Our work is based on an assumption that individuals with similar asserted types are likely to have the same inferred types. We can group such individuals into equivalence classes and compute the types just for one representative individual. This results in a so- called abstraction that contains one representative individual per equivalence class plus representative individuals for its direct role successors and predecessors. Derivations for the latter individuals can be used to split/refine equivalence classes, in which individuals no longer have the same assertions. The procedure is complete when the fixed point is reached, i.e., no new derivations are made and no refinements happen. The number of individuals in the abstraction is bounded exponentially in the number of different concepts and roles and linearly in the size of the original ABox; hence the abstraction is relatively small when the number of individuals is much larger than the number of concepts and roles used in the ontology. We implement the technique in a database-backed system that interacts with a highly optimized in-memory reasoner that materializes the abstract ABox. The database engine needs to support only simple operations and does not require any knowledge of the TBox. We show that the procedure is sound and it is complete for computing the entailed types of individuals in Horn ALCHOI ontologies. After the preliminaries, we present the theoretical foundation for our approach (Sec- tion 3) and prove its completeness (Section 4). In Section 5, we evaluate the procedure on a range of real-world ontologies with large ABoxes. We then discuss related work (Section 6) and conclude (Section 7). Full proofs and further details are available in a technical report [5]. 2 Preliminaries We recall the syntax of the Description Logic (DL) ALCHOI in Table 1; its semantics is defined as usual [1]. For an ontology O, we use con(O), rol(O), and ind(O) as the sets of concept names, role names, and individual names occurring in O, respectively. We say that O is concept materialized if A(a) ∈ O whenever O |= A(a), A ∈ con(O), and a ∈ ind(O); O is role materialized if R(a, b) ∈ O whenever O |= R(a, b), R ∈ rol(O), a, b ∈ ind(O); O is materialized if it is both concept and role materialized. Remark 1. We present nominals as primitive symbols, which are disjoint from individ- uals. Some definitions use a special nominal constructor {a} with individual a (in this I case, {a} = {aI }). We can convert such ontologies to our representation by renaming every nominal {a} with the corresponding nominal symbol oa and adding a concept assertion oa (a). 3 Computing ABox Materialization by Abstraction The typical OBDA scenario is such that the ABox contains a large number of individu- als. If we can identify individuals that yield the same consequences, we can compute the materialization by computing entailed consequences only for representative individuals. 3.1 Bi-homomorphic Individuals and Individual Types In order to (syntactically) characterize individuals that yield the same consequences, we study structure-preserving transformations of ABoxes. Definition 1. Let A and B be two ABoxes and h : ind(A) → ind(B) a mapping from the individuals in A to individuals in B. We extend h to axioms in a straightforward way: h(C(a)) = C(h(a)), h(R(a, b)) = R(h(a), h(b)), and h(α) = α for other axioms α. We say that h is a homomorphism (from A to B) if h(A) ⊆ B. An individual a in A is homomorphic to an individual b in B if there exists a homomorphism h from A to B such that h(a) = b; in addition, if b is homomorphic to a, then a and b are bi-homomorphic. Example 1. Consider the ABox A = {R(a, a), R(a, b), R(b, b)}. Then the mappings h1 = {a 7→ b, b 7→ b} and h2 = {a 7→ a, b 7→ a} are homomorphisms from A to A. Thus, a and b are bi-homomorphic. Note that there is no isomorphism h from A to A (a bijective homomorphism such that its inverse is also a homomorphism) such that h(a) = b or h(b) = a. It is easy to show that entailed axioms are preserved under homomorphisms between ABoxes. In particular, bi-homomorphic individuals are instances of the same concepts. Lemma 1. Let h : ind(A) → ind(B) be a homomorphism between ABoxes A and B. Then for every TBox T and every axiom α, A ∪ T |= α implies B ∪ T |= h(α). Proof. We show that Lemma 1 even holds for SROIQ without unique name as- sumption. Suppose that A ∪ T |= α. Then h(A ∪ T ) |= h(α). Since h(A ∪ T ) = h(A) ∪ h(T ) = h(A) ∪ T ⊆ B ∪ T , by monotonicity we obtain B ∪ T |= h(α). t u Corollary 1. If individuals a and b in an ABox A are bi-homomorphic, then for every TBox T and every concept C, we have A ∪ T |= C(a) if and only if A ∪ T |= C(b). By Corollary 1, bi-homomorphic individuals entail the same assertions w.r.t. every TBox. This property is too strong for our purpose as we need to deal with just one given TBox. It can be that many (non bi-homomorphic) individuals are still materialized in a same way. To take this into account, instead of partitioning the individuals according to the bi-homomorphism relation we start with an approximation to this relation. Definition 2. Let A be an ABox. The type of an individual a (w.r.t. A) is a triple tp(a) = (tp↓ (a), tp→ (a), tp← (a)) where tp↓ (a) = {C | C(a) ∈ A}, tp→ (a) = {R | ∃b : R(a, b) ∈ A}, and tp← (a) = {S | ∃c : S(c, a) ∈ A}. Intuitively, the type of an individual is defined as a triple consisting of its asserted con- cepts, successor roles, and predecessor roles in the ABox. Note that bi-homomorphic individuals have the same types, so the relation between individuals of the same types is an approximation to the bi-homomorphism relation. 3.2 Abstraction of an ABox If we compress the ABox by simply merging all individuals with the same type into one, we might obtain unexpected entailments, even if all individuals are bi-homomorphic. Example 2. Consider the following ABox A = {R(a, b), R(b, a)}. Clearly, a and b are bi-homomorphic in A. Let B = {R(a, a)} be obtained from A by replacing individual b with a, and let T = {> v B t C, ∃R.B v C}. It is easy to check that B ∪ T |= C(a), but A ∪ T 6|= C(a) (and hence A ∪ T 6|= C(b)). Instead of merging all individuals with the same type into one, we realize every indi- vidual type in our abstract ABox. Definition S 3 (ABox Abstraction). The abstraction of an ABox A is an ABox B = a∈ind(A) tp(a) , where for each type tp = (tp↓ , tp→ , tp← ), we define Btp = {C(xtp ) | B R S R S C ∈ tp↓ } ∪ {R(xtp , ytp ) | R ∈ tp→ } ∪ {S(ztp , xtp ) | S ∈ tp← }, where xtp , ytp , and ztp are fresh distinguished abstract individuals. Example 3. Consider the ABox A = {A(a), A(d), R(a, b), R(a, e), R(b, c), R(b, e), R(c, a), R(d, c), R(e, d)}. We have tp(b) = tp(c) = tp(e) = tp1 = (∅, {R}, {R}) and tp(a) = tp(d) = tp2 = ({A}, {R}, {R}). The abstraction of A is B = Btp1 ∪ Btp2 with R R R R Btp1 = {R(xtp1 , ytp 1 ), R(ztp1 , xtp1 )}, Btp2 = {A(xtp2 ), R(xtp2 , ytp 2 ), R(ztp 2 , xtp2 )}. Intuitively, the abstraction of an ABox is a disjoint union of small ABoxes witnessing each individual type realized in the ABox. The following lemma shows the soundness of concept assertions derived from the abstraction. Lemma 2. Let A be an ABox, B its abstraction, and T a TBox. Then for every type tp = (tp↓ , tp→ , tp← ), every a ∈ ind(A) with tp(a) = tp w.r.t. A, and every concept C: (1) B ∪ T |= C(xtp ) implies A ∪ T |= C(a), R (2) B ∪ T |= C(ytp ) and R(a, b) ∈ A implies A ∪ T |= C(b), and S (3) B ∪ T |= C(ztp ) and S(c, a) ∈ A implies A ∪ T |= C(c). Proof. Consider all mappings h : ind(B) → ind(A) such that h(xtp ) ∈ {a ∈ ind(A) | R S tp(a) = tp}, h(ytp ) ∈ {b | R(h(xtp ), b) ∈ A}, and h(ztp ) ∈ {c | S(c, h(xtp )) ∈ A}. Clearly, h(B) ⊆ A for every such mapping h. Furthermore, for every a ∈ ind(A), R every R(a, b) ∈ A and every S(c, a) ∈ A, there exists h with h(xtp ) = a, h(ytp ) = b, S and h(ztp ) = c for tp = tp(a). Hence, claims (1)–(3) follow by Lemma 1. t u 3.3 Abstraction Refinement Individuals from an ABox A may correspond to several abstract individuals in the ABox abstraction B: Each individual a corresponds to the abstract individual xtp for tp = tp(a). In addition, if R(b, a) ∈ A or S(a, b) ∈ A for some individual b, then a also R S R corresponds to ytp and ztp respectively for tp = tp(b). The additional individuals ytp and S ztp were introduced intentionally to refine the initial abstraction when new assertions of abstract individuals are derived, which in turn, can be used to derive new assertions of individuals in A. Specifically, if we add the new assertion according to cases (2) and (3) of Lemma 2, we may obtain different assertions for individuals that previously had the same types. Hence, adding the newly derived assertions using Lemma 2 may refine the types of the original individuals and, in turn, result in a new abstraction, for which new assertions can be derived once again. The above suggests the following materialization procedure based on abstraction refinement. Given an ontology O = A ∪ T we proceed as follows: 1. Build an initial abstraction B of A according to Definition 3. 2. Materialize B ∪ T using a reasoner. 3. Extend A with the newly derived assertions according to Lemma 2. 4. Update the types of the individuals in A and re-compute its abstraction B. 5. Repeat from Step 2 until no new assertions can be added to A. Example 4 (Example 3 continued). Let AI the ABox A from Example 3 and a TBox T = {A v ∀R.B, B v ∀R− .A}. Let BI be the abstraction B of AI = A computed R in Example 3 (see Figure 1). By materializing BI w.r.t. T we get B(ytp 2 ), from which we obtain AII = AI ∪ {B(b), B(e), B(c)} using Lemma 2. Recomputing the types of individuals in AII yields tp(b) = tp(c) = tp(e) = tp3 = ({B}, {R}, {R}), while the types of a and d remain unchanged. The abstraction of AII is thus BII = Btp2 ∪ R R Btp3 , where Btp3 = {B(xtp3 ), R(xtp3 , ytp 3 ), R(ztp 3 , xtp3 )}. By materializing BII , we get R A(ztp3 ), from which we obtain AIII = AII ∪ {A(b)}. We again recompute the types of individuals in AIII , which gives tp(b) = tp4 = ({A, B}, {R}, {R}), while the types of the other individuals do not change. The abstraction of AIII is thus BIII = Btp2 ∪ Btp3 ∪ R R Btp4 , where Btp4 = {A(xtp4 ), B(xtp4 ), R(xtp4 , ytp 4 ), R(ztp 4 , xtp4 )}. Materializing BIII R R yields B(ytp4 ) and A(ztp4 ), which correspond to B(c), B(e), and A(a). However, those assertions already exist in AIII , so the procedure terminates. The abstraction refinement procedure terminates since after every iteration except the last one, new atomic assertions must be added to A, and there is a bounded number of such assertions. Specifically, the number of iterations is at most ||ind(O)||×||con(O)||. The number of realized individual types in every ABox A, and hence the size of every abstract ABox B, is at most exponential in the number of different concepts and roles in O. In practice, it is likely to be much smaller since not every possible type is real- ized. Note also that in practice, it is not necessary to add the newly derived assertions explicitly to the original ABox—one can recompute the new types using some simple operations on the sets of individuals (intersection and unions), and keep the derived assertions only once for every new equivalence class. 4 Completeness Lemma 2 guarantees the soundness of our procedure. However, in general our proce- dure is not complete, as demonstrated by the following example. Example 5. Consider the ABox A = {A(a), R(a, b), B(b)} and the TBox T = {B v C t D, ∃R.C v C, A u C v ∀R.D}. Note that A ∪ T |= D(b). We have tp(a) = Materialized Abstractions abstract ABoxes I II III ytpR1 {c, e, a, d} ABox: A = xtp1 {b, c, e} A(I) ztpR1 {a, b, d} (II) a (II) (III) B B ,A +B ytpR2 {b, e, c} {b, e, c} {b, e, c} e b A xtp2 {a, d} {a, d} {a, d} d c ztpR2 {c, e} {c, e} {c, e} A(I) B (II) ytpR3 {c, e, a, d} {a, d} B xtp3 {b, e, c} {c, e} TBox: T = +A ztpR3 {a, b, d} {b, d, a} A v ∀R.B B v ∀R− .A +B ytpR4 {c, e} A, B xtp4 {b} +A ztpR4 {a} Fig. 1. The abstractions I-III produced in Example 4. Each abstraction consists of the ABoxes corresponding to the four individual types. The inferred assertions are indicated with the “+” sign and are added to the corresponding original individuals shown in each column. The materialized assertions in the original ABox are labeled with the first iteration in which they appear. ({A}, {R}, ∅), tp(b) = ({B}, ∅, {R}). Therefore the abstraction B = Btp(a) ∪ Btp(b) , R R where Btp(a) = {A(xtp(a) ), R(xtp(a) , ytp(a) )}, Btp(b) = {B(xtp(b) ), R(ztp(b) , xtp(b) )}. Since B∪T does not entail any new atomic concept assertions, our procedure terminates without producing the entailment A ∪ T |= D(b). The primary reason for incompleteness in this example is that our abstraction breaks the ABox into disconnected parts, which cannot communicate the non-deterministic choices, e.g., for the disjunction C t D. The only communication between ABoxes happens through the entailment of new assertions. If the ontology language does not allow such non-deterministic constructors, it is possible to obtain a complete procedure. 4.1 Horn ALCHOI In this section, we restrict ontologies to a Horn fragment of ALCHOI: Definition 4 (Horn ALCHOI). An ALCHOI ontology O is Horn if, for every con- cept assertion D(a) and every axiom C v D, the concepts C and D satisfy, respec- tively, the following grammar definitions: C ::=> | ⊥ | A | o | C1 u C2 | C1 t C2 | ∃R.C, (1) D ::=> | ⊥ | A | o | D1 u D2 | ∃R.D | ∀R.D | ¬C. (2) Intuitively, negations and universal restrictions should not occur negatively, and dis- junctions should not occur positively. We also allow axioms equivalent to Horn axioms. It is well-known that every consistent Horn ontology has a so-called canonical model that entails exactly the consequences entailed by the ontology. For our purpose, we require a weaker version of this property. Theorem 1 (Weak Canonical Model Property for Horn ALCHOI). Every consis- tent Horn ALCHOI ontology O has a model I such that I |= A(a) implies O |= A(a) for every atomic concept assertion A(a) with a ∈ ind(O) and A ∈ con(O). Theorem 1 can be proved using the property that Horn ALCHOI models are closed under direct products. Then a canonical model is obtained from the direct product of models refuting (finitely many) atomic non-types. Before formulating our completeness result, we need to solve one small technical problem illustrated in the following example. Example 6. Consider A = {A(a), B(b), R(a, b)}, T = {A u ∃R.B v C}. Clearly, A ∪ T |= C(a). We have tp(a) = tp1 = ({A}, {R}, ∅), tp(b) = tp2 = ({B}, ∅, {R}), R so our abstraction B = Btp1 ∪ Btp2 with Btp1 = {A(xtp1 ), R(xtp1 , ytp 1 }, and Btp2 = R {B(xtp2 ), R(ztp2 , xtp2 )}. Since no new atomic concept assertions are entailed by B ∪T , our procedure terminates without deriving C(a). R Note that Btp2 ∪ T |= (∃R.B)(ztp 2 ), so there is an entailed assertion, just not an atomic one. To capture this inference, we introduce a new concept that “defines” ∃R.B. Specifically, let T 0 = {∃R.B v X, A u X v C} where X is a fresh concept name. Clearly, T 0 is a conservative extension of T , and from T 0 we can derive a new as- sertion Btp2 ∪ T 0 |= X(ztp R 2 ). If we now add the corresponding assertion X(a) to A and recompute the abstraction for the updated type tp(a) = tp3 = ({A, X}, {R}, ∅) R (tp(b) does not change), we have Btp3 = {A(xtp3 ), X(xtp3 ), R(xtp3 , ytp 3 )}, and obtain 0 Btp3 ∪ T |= C(xtp3 ), which gives us the intended result. Example 6 suggests that to achieve completeness, we need to represent existential re- strictions on the left hand side of the axioms using new atomic concepts. Note that ∃R.B v X is equivalent to B v ∀R− .X. Thus we can just require that there are no existential restrictions on the left hand side of concept inclusions, and all universal restrictions on the right have only atomic concepts as fillers. Definition 5 (Normal Form for Horn ALCHOI). Horn ALCHOI axioms D(a) and C v D are in normal form if they satisfy the following grammar definitions: C ::=> | ⊥ | A | o | C1 u C2 | C1 t C2 (3) D ::=> | ⊥ | A | o | D1 u D2 | ∃R.D | ∀R.A | ¬C (4) Intuitively, in addition to the constraints for Horn ALCHOI ontologies in Definition 4, negative occurrence of existential restrictions are not allowed, and (positive) occur- rences of universal restrictions can only have concept names as fillers. It is easy to convert axioms to a normal form using a well-known structural transformation. 4.2 Completeness Proof We are now ready to show the following completeness result: Theorem 2. Let O = A ∪ T be a normalized Horn ALCHOI ontology and B the abstraction of A. A is concept materialized if, for every type tp = (tp↓ , tp→ , tp← ), every individual a ∈ ind(A) with tp(a) = tp, and every atomic concept A, we have: (1) B ∪ T |= A(xtp ) implies A(a) ∈ A, R (2) B ∪ T |= A(ytp ) and R(a, b) ∈ A implies A(b) ∈ A, and S (3) B ∪ T |= A(ztp ) and S(c, a) ∈ A implies A(c) ∈ A. Proof. To prove Theorem 2, we extend the abstraction B of A with new role assertions R(xtp(a) , xtp(b) ) for every R(a, b) ∈ A. Let us denote this extended abstract ABox by B 0 . Since, for every C(a) ∈ A, we also have C ∈ tp↓ (a) and, thus, C(xtp(a) ) ∈ B ⊆ B 0 , the mapping h : ind(A) → ind(B 0 ) defined by h(a) = xtp(a) is a homomorphism from A to B 0 . Therefore, by Lemma 1, if A ∪ T |= A(a), then B 0 ∪ T |= A(xtp(a) ). The key part of the proof is to demonstrate that in this case we also have B ∪ T |= A(xtp(a) ). That is, the extended abstract ABox B 0 does not entail new atomic concept assertions compared to B. It follows then that A(a) ∈ A by condition (1) of the theorem. This implies that A is concept materialized. To prove that B 0 entails the same atomic concept assertions as B, we use the re- maining conditions (2) and (3) of Theorem 2 and the canonical model property formu- lated in Theorem 1. Note that since new role assertions of the form R(xtp(a) , xtp(b) ) are added to B 0 only if R(a, b) ∈ A, we have R ∈ tp→ (a) and R ∈ tp← (b) by Def- R inition 2. Therefore, we already had role assertions R(xtp(a) , ytp(a) ) ∈ B and likewise R R(ztp(b) , xtp(b) ) ∈ B for the same role R. Furthermore, by condition (2) of Theorem 2, R if B ∪ T |= A(ytp(a) ), then since R(a, b) ∈ A, we also have A(b) ∈ A, and thus R A(xtp(b) ) ∈ B. Likewise, by condition (3), if B ∪ T |= A(ztp(b) ), then A(xtp(a) ) ∈ B. The following lemma shows that with these properties for B, after adding the new role assertion R(xtp(a) , xtp(b) ) to B, no new atomic concept assertions can be entailed. Lemma 3 (Four-Individual Lemma). Let O be a normalized Horn ALCHOI on- tology such that {R(x1 , y1 ), R(z2 , x2 )} ⊆ O for some x1 , y1 , z2 , x2 , and R. Further, assume that for every concept name A we have: (1) O |= A(y1 ) implies O |= A(x2 ), and (2) O |= A(z2 ) implies O |= A(x1 ). Finally, let O0 = O∪{R(x1 , x2 )}. Then for every concept name A and every individual a, we have O0 |= A(a) implies O |= A(a). Proof (Sketch). Suppose that O0 |= A(a), we will show O |= A(a). If O is inconsistent then this holds trivially. Otherwise, there exists a canonical model I of O satisfying Theorem 1. From I we construct an interpretation I 0 that coincides with I apart from 0 the interpretation of roles S such that: S I = S I ∪ {(xI1 , xI2 )} if O |= R v S; and I0 S = S ∪ {(x2 , x1 )} if O |= R v S − . We prove that I 0 |= O0 , which implies I I I I 0 |= A(a) since O0 |= A(a), and thus I |= A(a) by definition of I 0 , from which O |= A(a) follows since I satisfies Theorem 1. t u R By repeatedly applying Lemma 3 for each x1 = xtp(a) , y1 = ytp(a) , x2 = xtp(b) , R 0 z2 = ztp(b) and R such that R(a, b) ∈ A, we obtain that B entails only those atomic assertions that are entailed by B, which completes the proof of Theorem 2. t u Table 2. Test suite statistics with the number of atomic concepts in the ontology (#A and #AN for the normalized ontology), roles (#R), individuals (#I), role (#R(a, b)) and concept (#A(a)) assertions, and the number of entailed concept assertions Ontology #A #AN #R #I #R(a, b) #A(a) #A(a) entailed Gazetteer 710 711 15 516 150 604 164 11 112 538 799 Robert 61 135 80 405 1 089 1 4 255 Coburn 719 1 161 109 123 515 237 620 297 002 535 124 UOBM 1 69 90 35 24 925 153 571 44 680 142 747 UOBM 10 69 90 35 242 549 1 500 628 434 115 1 381 722 UOBM 50 69 90 35 1 227 445 7 594 996 2 197 035 6 991 583 UOBM 100 69 90 35 2 462 012 15 241 909 4 409 891 14 027 772 5 Implementation and Evaluation To evaluate the feasibility of our approach, we implemented the procedure sketched in Section 3.3 prototypically in Java. The system relies on a database for storing the ABox and use external reasoners for materializing the abstractions. We focus on features, which indicate a potential gain of our approach: the size of abstract ABoxes and the number of refinements steps. In addition, to see how the abstraction changes, we present statistics for the first and last abstractions, as well as statistics of individual types. We selected non-trivial ontologies, for which we present some metrics in Table 2.3 Gazetteer and Corburn are bio ontologies from the NCBO BioPortal and the Phenoscape project, respectively. Robert is sophisticated and deliberately serves as a worst case ex- ample for our approach. UOBM ontologies are generated from the University Ontol- ogy Benchmark,4 where UOBM n denotes a data set for n universities. Note that the increase of normalized concepts (AN ) in comparison to the original concepts (A) in Ta- ble 2 is a rough indicator of TBox complexity, which adds extra workload to reasoners. Table 3 shows the results of computing and iteratively refining the abstract ABoxes until the fixpoint is reached. In general, there are only few refinement steps, and the abstraction reduces the size of the ABox significantly. This can be observed in particular for the UOBM with growing ABox sizes. Note that for ontologies such as UOBM 1 or Roberts family ontology the abstraction can be larger than the original ABox. The latter ontology also requires more refinement steps. This is a result of the heterogeneity of the ABox (concept and role assertions) in combination with the complexity of the TBox. Our qualitative performance evaluations reveals that the compression degree of an ABox has a correspondence to the gain in time for materializing the abstract ABoxes. We compared the respective reasoning times of the original ABox with the sum of reasoning times for all abstractions using the OWL DL reasoning systems HermiT and Konclude.5 For ontologies with high compression rate such as Gazetteer and Coburn the sum of the reasoning times for all abstractions is less than a tenth of the reasoning 3 Download and references at http://www.derivo.de/dl14-ontologies/ 4 http://www.cs.ox.ac.uk/isg/tools/UOBMGenerator/ 5 See http://www.hermit-reasoner.com and http://www.konclude.com Table 3. Number of refinement steps, relative sizes of the abstractions, and statistics about indi- vidual types (averaged over all types) for the first and the last abstraction of ontologies # of Abstract ABox size (%) Individual types statistics Ontology steps % indiv. %A(·) %R(·, ·) # indiv. / type #AN / type #R / type Gazetteer 2 1 1 12 1 < 1 1 279.76 279.76 0.73 2.56 2.05 2.05 Robert 7 89 142 2 000 68 26 41 5.00 3.12 0.24 22.16 3.50 3.44 Coburn 3 3 3 <1 1 1 1 116.96 115.22 2.49 5.27 2.79 2.79 UOBM 1 3 102 158 21 47 15 23 8.03 5.25 3.02 14.18 7.18 7.35 UOBM 10 3 41 70 9 21 6 10 21.18 13.98 3.34 15.48 7.58 7.86 UOBM 50 3 18 32 4 10 3 5 47.88 28.36 3.52 16.29 7.82 8.14 UOBM 100 3 13 22 3 7 2 3 70.36 41.60 3.59 16.62 7.91 8.24 time for the original ABox and requires at most a quarter of RAM. Roberts, as expected from the abstraction statistics, shows a degradation in runtime (4 to 8 times) and RAM usage (up to 2 times) over the abstractions. The sequence of growing UOBM ABoxes (1, 10, 50, 100 universities) effectively shows the benefits of our abstraction procedure with large ABoxes. Whereas the runtimes for the abstractions of UOBM 1 are still 2 to 4 times that of the original ABox, the runtimes for UOBM 50 are already down by 50% after abstraction. The original UOBM 100 ontology could neither be processed within eight hours by HermiT nor by Konclude with a 32 GB RAM limit, but its abstraction can easily be materialized, e.g., within 84 seconds and 8 GB RAM by Konclude. The purpose of the presented evaluation was to estimate the potential gains of using abstraction for ontology materialization, and for this reason we did not focus yet on efficient computation of abstractions and of their refinements. As of now we just re- compute the abstraction from scratch after each refinement step. This takes, e.g., about 200 seconds for each abstraction of UOBM 100, but there is certainly plenty of room for optimizations. 6 Related Work Many approaches have been proposed to handle large ABoxes, e.g., by rule-based ma- terialization in a (relational) database [14,8]. Since we focus on the abstraction of the ABox, we discuss work that is closely related to our aims. The SHER approach [3] merges similar individuals to obtain a compressed, so- called summary ABox, which is then used for (refutation-based) consistency checking. The technique (as well as ours) is based on the observation that individuals with the same asserted types are likely to have the same entailed types. Since merging in SHER is only based on asserted concepts, the resulting summary ABox might be inconsistent even if the original ABox is consistent w.r.t. the TBox. To remedy this, justifications [7] are used to decide which merges caused the inconsistency and to refine the summary accordingly. Justification-based refinements are also needed for query answering since SHER does reasoning at query time. We avoid justification computation by partition- ing individuals of the same type into equivalence classes. Such partitioning guarantees the soundness of derived atomic concept assertions. We also have to perform refine- ment steps, but the refinement is to incrementally derive more consequences. What is computed before remains sound. Wandelt and Möller propose a technique for instance retrieval over SHI ontologies based on modularization [16]. With modularization they refer to the process of parti- tioning the ABox such that an OWL reasoner can perform (refutation-based) instance checking over the smaller partitions. To achieve such partitions, the authors allow for breaking up role assertions when it can be guaranteed (syntactically) that no concepts are propagated over them. While this does not reduce the overall size of the data, they also propose an optimization that uses a notion similar to our types, but extended to also consider the concept assertions for successors and predecessors. As in our case, conse- quences derived from the ABoxes for these extended types are sound. Completeness is only guaranteed if the ABox for the extended type completely covers the partition for the individual. If that is not the case, the whole partition for the individual has to be processed by the reasoner. While this optimization is similar to our initial abstraction, Wandelt and Möller resolve to reasoning over the (possibly large) ABox partition for an individual if necessary for completeness, whereas we propose to refine the abstrac- tion instead. The former approach syntactically identifies individual partitions, while our approach semantically, on-demand propagates concept assertions via refinements. 7 Conclusions and Future Work We have presented an approach for ontology materialization based on abstraction refine- ment. The main idea is to partition ABox individuals into equivalent classes such that information derived for their abstract representatives can be used to refine the abstrac- tion. Our experiment demonstrates that the approach particularly pays off for ontologies with large ABoxes and simple TBoxes. Currently, our approach is complete for Horn ALCHOI due to the property that only (deterministically) derived assertions are used for abstraction refinement. We could potentially extend our approach to non-Horn ontology languages by exploiting the addi- tional information about non-deterministic “possible” instances such as those provided by the HermiT reasoner. Directly supporting other Horn features, e.g. cardinality and role chains, is not possible. However, some could be supported without a considerable modification of the procedure. For example, it is easy to support transitive roles and role chains by using the well-known encoding of these axioms via concept inclusions [12]. In this paper we mainly focus on concept materialization since role materialization for ALCHOI can be essentially computed by expanding role hierarchies (special care needs to be taken about nominals though). When ontologies contain role chains and functional roles, however, materialization of role assertions becomes less trivial (e.g., the encoding of role chains is not enough). It is left for future work to investigate how these features can be fully supported. The abstract ABoxes could serve not only as a generic interface for communica- tion with the reasoner, but also as a compact representation of the materialization. This can be useful for other reasoning tasks. Specifically, when answering instance and con- junctive queries over the materialized ABoxes, the abstraction can be used to prune the search space. References 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.): The De- scription Logic Handbook: Theory, Implementation, and Applications. Cambridge Univer- sity Press, second edn. (2007) 2. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning 39(3), 385–429 (2007) 3. Dolby, J., Fokoue, A., Kalyanpur, A., Schonberg, E., Srinivas, K.: Scalable highly expressive reasoner (SHER). J. of Web Semantics 7(4), 357–361 (2009) 4. Eiter, T., Ortiz, M., Simkus, M., Tran, T.K., Xiao, G.: Query rewriting for Horn-SHIQ plus rules. In: Proc. of the 26th National Conf. on Artificial Intelligence (AAAI 2012). AAAI Press (2012) 5. Glimm, B., Kazakov, Y., Liebig, T., Tran, T.K., Vialard, V.: Abstraction re- finement for ontology materialization. Tech. rep., University of Ulm and derivo GmbH (2014), https://www.uni-ulm.de/fileadmin/website_uni_ulm/ iui.inst.090/Publikationen/2014/abstractionrefinementTR.pdf 6. Glimm, B., Krötzsch, M.: SPARQL beyond subgraph matching. In: Proc. of the 9th Int. Semantic Web Conf. (ISWC 2010). LNCS, vol. 6496, pp. 241–256. Springer (2010) 7. Kalyanpur, A., Parsia, B., Horridge, M., Sirin, E.: Finding all justifications of OWL DL entailments. In: Proc. of the 6th Int. Semantic Web Conf. (ISWC 2007). LNCS, vol. 4825, pp. 267–280. Springer (2007) 8. Kolovski, V., Wu, Z., Eadon, G.: Optimizing enterprise-scale OWL 2 RL reasoning in a relational database system. In: Proc. of the Int. Semantic Web Conf. (ISWC 2010). LNCS, vol. 6496, pp. 436–452. Springer (2010) 9. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The combined approach to query answering in DL-Lite. In: Proc. of the 12th Int. Conf. on the Principles of Knowl- edge Representation and Reasoning (KR 2010). AAAI Press (2010) 10. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description logic EL us- ing a relational database system. In: Proc. of the 21st Int. Joint Conf. on Artificial Intelligence (IJCAI 2009). pp. 2070–2075 (2009) 11. Rodriguez-Muro, M., Calvanese, D.: High performance query answering over DL-Lite on- tologies. In: Proc. of the 13th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR 2012) (2012) 12. Simancik, F.: Elimination of complex RIAs without automata. In: Proc. of the 2012 Int. Workshop on Description Logics (DL 2012). CEUR Workshop Proceedings, vol. 846. CEUR-WS.org (2012) 13. Stefanoni, G., Motik, B., Horrocks, I.: Introducing nominals to the combined query answer- ing approaches for EL. In: Proc. of the 27th National Conf. on Artificial Intelligence (AAAI 2013). AAAI Press (2013) 14. Urbani, J., Kotoulas, S., Maassen, J., van Harmelen, F., Bal, H.E.: WebPIE: A web-scale parallel inference engine using MapReduce. J. Web Semantics 10, 59–75 (2012) 15. Volz, R., Staab, S., Motik, B.: Incrementally maintaining materializations of ontologies stored in logic databases. J. Data Semantics 2, 1–34 (2005) 16. Wandelt, S., Möller, R.: Towards ABox modularization of semi-expressive description log- ics. J. of Applied Ontology 7(2), 133–167 (2012)