<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Advances in Accessing Big Data with Expressive Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ralf Moller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Neuenstadt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>O zgur L. O zcep</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Wandelt</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hamburg University of Technology</institution>
          ,
          <addr-line>21073 Hamburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Humboldt-Universitat zu Berlin</institution>
          ,
          <addr-line>10099 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontology-based query answering has to be supported w.r.t. secondary memory and very expressive ontologies to meet practical requirements in some applications. Recently, advances for the expressive DL SHI have been made in the dissertation of S. Wandelt for conceptbased instance retrieval on Big Data descriptions stored in secondary memory. In this paper we extend this approach by investigating optimization algorithms for answering grounded conjunctive queries.3</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Triplestores, originally designed to store Big Data in RDF format on secondary
memory with SPARQL as a query language, are currently more and more used in
settings where query answering (QA) w.r.t. ontologies is bene cial. However,
reasoning w.r.t. ontologies in secondary memory is provided for weakly expressive
languages only (e.g., RDFS), if at all, and in some cases, query answering
algorithms are known to be incomplete. For weakly expressive DL languages, such
as DL-Lite, good results for sound and complete query answering w.r.t. large
(virtual) Aboxes have already been achieved with OBDA based query rewriting
techniques and schema speci c mapping rules [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, for expressive, more
powerful DLs such as ALC and beyond only rst steps have been made. Solving
the problem of Accessing Big Data with Expressive Ontologies (ABDEO) is an
important research goal.
      </p>
      <p>
        A strategy to solve the ABDEO problem is to \summarize" Big Aboxes by
melting individuals such that Aboxes t into main memory [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In some
situations inconsistencies occur, and summarization individuals must be \re ned"
(or unfolded) at query answering time in order to guarantee soundness and
completeness, a rather expensive operation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Other approaches make use of Abox
modularization techniques and try to extract independent modules such that
query answering is sound and complete. A rst investigation of Abox
modularization for answering instance queries w.r.t. the DL SHIF is presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
3 This work has been partially supported by the European Commission as part of the
      </p>
      <p>
        FP7 project Optique (http://www.optique-project.eu/).
However, modularization with iterative instance checks over all individuals and
modules of an Abox is not su cient to ensure fast performance [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].4
      </p>
      <p>
        The ABDEO approach presented here is based on a modularization approach
developed by Wandelt [
        <xref ref-type="bibr" rid="ref12 ref14 ref15">15,12,14</xref>
        ] for really large Aboxes containing data
descriptions for &gt; 1000 universities in terms of LUBM scale measures [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], i.e., datasets in
the range of billions of triples. Modules (islands) derived by Wandelt's techniques
are usually small in practical applications and can be loaded into main memory
such that a standard tableau prover can be used for instance checks. Iteration
over all individuals gives sound and complete answers, in principle. Compared
to [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Wandelt (i) proposed extended modularization rules, (ii) implemented
incremental ways of computing Abox modularizations, and (iii) investigated new
ways to optimize sound and complete concept-based query answering (instance
queries) with tableau-based reasoning systems for the logic SHI. In particular,
\similarities" between modules are detected such that a single instance query on
a representative data structure (a so-called one-step node) yields multiple results
at a time, and thus, instance checks are saved (the approach is reminiscent of but
di erent from [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], see below or cf. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] for details). Due to modularization rules,
one-step node query answering is sound [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and in many (well-de ned) cases
complete for eliminating candidates for a successive iterative instance
checking process. In addition, to eliminate candidates, Wandelt and colleagues also
investigate complete approximation techniques (see [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] for details).
      </p>
      <p>
        In this paper we extend Wandelt's modularization based approach for query
answering by investigating optimization techniques for answering grounded
conjunctive queries w.r.t. SHI ontologies. Grounded conjunctive queries are more
expressive from a user's point of view than instance queries. We argue that
grounded conjunctive queries substantially narrow down the set of instance
checking candidates if selective role atoms mentioned in queries are exploited for
generating candidates for concept atoms, such that approximation techniques
(to, e.g., DL-Lite) are not required in many cases. We demonstrate our ndings
using the LUBM benchmark as done, e.g., in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. As an additional
extension to Wandelt's work, which uses speci c storage layouts for storing Abox data
descriptions and internal information in SQL databases, we investigate
ontologybased access to existing data stores, namely triplestores, while providing query
answering w.r.t. expressive ontologies.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        We assume the reader is familiar with description logic languages, ontologies
(knowledge bases), inference problems, and optimized tableau-based reasoning
algorithms (see, e.g., [
        <xref ref-type="bibr" rid="ref11 ref7">11,7</xref>
        ]). For the reader's convenience we de ne conjunctive
queries in general, and grounded conjunctive queries in particular (adapted from
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]). In the following we use AtCon; Con; and Rol for the sets of atomic
4 Note that Abox modularization is di erent from Tbox modularization, as for instance
investigated in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
concept descriptions, concept descriptions, and role descriptions, respectively, in
the ontology.
      </p>
      <p>A conjunctive query (CQ) is a rst-order query q of the form 9u: (u; v)
where is a conjunction of concept atoms A(t) and role atoms R(t; t0), with
A and R being concept and role names, respectively. The parameters t; t0 are
variables from u or v or constants (individual names). The variables in u are
the existentially quanti ed variables of q and v are the free variables, also called
distinguished variables or answer variables of q. The query q is called a k-ary
query i jvj = k. In a grounded conjunctive query (GCQ), u is empty. We only
consider grounded conjunctive queries in this paper. We de ne an operator skel
that can be applied to a CQ to compute a new CQ in which all concept atoms
are dropped.</p>
      <p>
        The query answering problem is de ned w.r.t. an ontology O = (T ; R; A).
Let Inds(A) denote the individuals in A. For I an interpretation, q = (v)
a k-ary grounded conjunctive query, and a1; : : : ; ak 2 Inds(A), we write I j=
q[a1; : : : ; ak] if I satis es q (i.e., all atoms of q) with variables vi replaced by
ai; 1 i k. A certain answer for a k-ary conjunctive query q and a ontology
O is a tuple (a1; : : : ; ak) such that I j= q[a1; : : : ; ak] for each model I of O.
We use cert(q; O) to denote the set of all certain answers for q and O. This
de nes the query answering problem. Given a SHI ontology O and a GCQ q,
compute cert(q; O). It should be noted that \tree-shaped" conjunctive queries
can be transformed into grounded conjunctive queries, possibly with additional
axioms in the Tbox [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The restriction to grounded conjunctive queries is not
too severe in many practical applications.
      </p>
      <p>Grounded conjunctive query answering can be implemented in a naive way
by computing the certain answers for each atom and doing a join afterwards.
Certain answers for concept atoms can be computed by iterating over Inds(A)
with separate instance checks for each individual. Rather than performing an
instance check on the whole Abox, which is too large to t in main memory in
many application scenarios, the goal is to do an instance check on a module such
that results are sound and complete. More formally, given an input individual
a, the proposal is to compute a set of Abox assertions Aisl (a subset of the
source Abox A), such that for all atomic (!) concept descriptions A, we have
hT; R; Ai A(a) i hT; R; Aisli A(a).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Speeding Up Instance Retrieval</title>
      <p>In order to de ne subsets of an Abox relevant for reasoning over an individual
a, we de ne an operation which splits up role assertions in such a way that we
can apply graph component-based modularization techniques over the outcome
of the split.</p>
      <p>De nition 1 (Abox Split). Given
{ a role description R,
{ two distinct named individuals a and b,
{ Else
{ two distinct fresh individuals c and d, and,
{ an Abox A,</p>
      <p>R(a;b): SA ! SA, de ned as follows ( SA is the
an Abox split is a function #c;d
set of Aboxes and A 2 SA):
{ If R(a; b) 2 A and fc; dg * Ind(A), then</p>
      <p>R(a;b) (A) =A n fR(a; b)g [ fR(a; d); R(c; b)g [
#c;d
fC(c) j C(a) 2 Ag [
fC(d) j C(b) 2 Ag</p>
      <p>R(a;b) (A) = A:
#c;d</p>
      <p>
        In the following we assume that the Tbox is transformed into a normal form
such that all axioms are \internalized" (i.e., on the lefthand side of a GCI there
is only &gt; mentioned. For a formal de nition of the normal form of a Tbox, see
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Here we use an example to illustrate the idea.
      </p>
      <p>Example 1 (Example for an Extended 8-info Structure). Let
then the TBox in normal form is</p>
      <p>TEx1 = fChair v 8headOf:Department;
9memberOf:&gt; v P erson;</p>
      <p>GraduateStudent v Studentg;</p>
      <p>REx1 = fheadOf v memberOfg;
TEx1norm = f&gt; v :Chair t 8headOf:Department;
&gt; v 8memberOf:? t P erson;
&gt; v :GraduateStudent t Studentg
and the extended 8-info structure for TEx1norm and REx1 is:
extinfo 8T;R (R) =</p>
      <p>f?g
&gt;
:;
8&gt;fDepartment; ?g if R = headOf;
&lt; if R = memberOf;
otherwise:</p>
      <sec id="sec-3-1">
        <title>De nition 2 (Extended 8-info Structure). Given a TBox T in normal form</title>
        <p>and an Rbox R, an extended 8-info structure for T and R is a function extinfo 8T;R :
Rol ! }(Con), such that we have C 2 extinfo 8T;R (R) if and only if there exists
a role R2 2 Rol, such that R R v R2 and 8R2:C 2 clos (T), where clos (T)
denotes the set of all concept descriptions mentioned in T.</p>
        <p>Now we are ready to de ne a data structure that allows us to check which
concept descriptions are (worst-case) \propagated" over role assertions in
SHIontologies. If nothing is \propagated" that is not already stated in corresponding
Abox assertions, a role assertion is called splittable. This is formalized in the
following de nition.</p>
      </sec>
      <sec id="sec-3-2">
        <title>De nition 3 (SHI-splittability of Role Assertions). Given a SHI-ontology</title>
        <p>O = hT; R; Ai and a role assertion R(a; b), we say that R(a; b) is SHI-splittable
with respect to O i
1. there exists no transitive role R2 with respect to R, such that R
2. for each C 2 extinfo 8T;R (R)
{ C = ? or
{ there is a C2(b) 2 A and T C2 v C or
{ there is a C2(b) 2 A and T C u C2 v ?
and
3. for each C 2 extinfo 8T;R (R )
{ C = ? or
{ there is a C2(a) 2 A and T C2 v C or
{ there is a C2(a) 2 A and T C u C2 v ?.</p>
        <p>R v R2,</p>
        <p>So far, we have introduced approaches to modularization of the assertional
part of an ontology. In the following, we use these modularization techniques to
de ne structures for e cient reasoning over ontologies.</p>
        <p>We formally de ne a subset of assertions, called an individual island, which is
worst-case necessary, i.e. possibly contains more assertions than really necessary
for sound and complete instance checking. Informally speaking, we take the graph
view of an Abox and, starting from a given individual, follow all role assertions
in the graph until we reach a SHI-splittable role assertion. We show that this
strategy is su cient for entailment of atomic concepts. The formal foundations
for these subsets of assertions have been set up before, where we show that, under
some conditions, role assertions can be broken up while preserving soundness and
completeness of instance checking algorithms. First, in De nition 4, we formally
de ne an individual island candidate with an arbitrary subset of the original
Abox. The concrete computation of the subset is then further de ned below.</p>
        <sec id="sec-3-2-1">
          <title>De nition 4 (Individual Island Candidate).</title>
          <p>Given an ontology O = hT; R; Ai and a named individual a 2 Ind(A), an
individual island candidate, is a tuple ISLa = hT; R; Aisl; ai, such that Aisl A.
Given an individual island candidate ISLa = hT; R; Aisl; ai and an
interpretation I, we say that I is a model of ISLa , denoted I ISLa , if I hT; R; Aisli.
Given an individual island candidate ISLa = hT; R; Aisl; ai, we say that ISLa
entails a concept assertion C(a), denoted hT; R; Aisl; ai C(a), if for all
interpretations I, we have I ISLa =) I C(a). We say that ISLa entails a role
assertion R(a1; a2), denoted hT; R; Aisl; ai R(a1; a2), if for all interpretations
I, we have I ISLa =) I R(a1; a2).</p>
          <p>Please note that entailment of concept and role assertions can be directly
reformulated as a decision problem over ontologies, i.e., we have hT; R; Aisl; ai
C(a) () hT; R; Aisli C(a). In order to evaluate the quality of an individual
island candidate, we de ne soundness and completeness criteria for individual
island candidates.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>De nition 5 (Soundness and Completeness for Island Candidates).</title>
          <p>Given an ontology O = hT; R; Ai and an individual island candidate ISLa =
hT; R; Aisl; ai, we say that ISLa is sound for instance checking in ontology
O if for all atomic concept descriptions C 2 AtCon, ISLa C(a) =)
hT; R; Ai C(a). ISLa is complete for instance checking in ontology O if for all
atomic concept descriptions C 2 AtCon, hT; R; Ai C(a) =) ISLa C(a).</p>
          <p>We say that ISLa is sound for relation checking in ontology O if for all role
descriptions R 2 Rol and all individuals a2 2 Inds(A)
{ ISLa
{ ISLa</p>
          <p>R(a; a2) =) hT; R; Ai
R(a2; a) =) hT; R; Ai</p>
          <p>R(a; a2) and</p>
          <p>R(a2; a).</p>
          <p>ISLa is complete for relation checking in ontology O if for all role descriptions
R 2 Rol and all individuals a2 2 Inds(A)
{ hT; R; Ai
{ hT; R; Ai</p>
          <p>R(a; a2) =) ISLa
R(a2; a) =) ISLa</p>
          <p>R(a; a2) and</p>
          <p>R(a2; a).</p>
          <p>We say that ISLa is sound for reasoning in ontology O if ISLa is sound for
instance and relation checking in O. We say that ISLa is complete for reasoning
in O if ISLa is complete for instance and relation checking in O.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>De nition 6 (Individual Island).</title>
          <p>Given an individual island candidate ISLa = hT; R; Aisl; ai for an ontology
O = hT; R; Ai, ISLa is called individual island for O if ISLa is sound and
complete for reasoning in O.</p>
          <p>An individual island candidate becomes an individual island if it can be used
for sound and complete reasoning. It is easy to see that each individual island
candidate is sound for reasoning since it contains a subset of the original Abox
assertions.</p>
          <p>In Fig. 1, we de ne an algorithm which computes an individual island starting
from a given named individual a. The set agenda manages the individuals which
have to be visited. The set seen collects already visited individuals. Individuals
are visited if they are connected by a chain of SHI-unsplittable role assertions to
a. We add the role assertions of all visited individuals and all concept assertions
for visited individuals and their direct neighbors.</p>
          <p>Theorem 1 shows that the computed set of assertions is indeed su cient for
complete reasoning.</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>Theorem 1 (Island Computation yields Individual Islands for Ontolo</title>
          <p>gies). Given an ontology O = hT; R; Ai and an individual a 2 Inds(A), the
algorithm in Fig. 1 computes an individual island ISLa = hT; R; Aisl; ai for a.
Input: Ontology O = hT; R; Ai, individual a 2 Inds(A)
Output: Individual island ISLa = hT; R; Aisl; ai
Algorithm:</p>
          <p>Let agenda = fag
Let seen = ;
Let Aisl = ;
While agenda 6= ; do</p>
          <p>Remove a1 from agenda
Add a1 to seen
Let Aisl = Aisl [ fC(a1) j C(a1) 2 Ag
For each R(a1; a2) 2 A</p>
          <p>Aisl = Aisl [ fR(a1; a2) 2 Ag
If R(a1; a2) 2 A is SHI-splittable with respect to O then</p>
          <p>Aisl = Aisl [ fC(a2) j C(a2) 2 Ag
else agenda = agenda [ (fa2g n seen)
For each R(a2; a1) 2 A</p>
          <p>Aisl = Aisl [ fR(a2; a1) 2 Ag
If R(a2; a1) 2 A is SHI-splittable with respect to O then</p>
          <p>Aisl = Aisl [ fC(a2) j C(a2) 2 Ag
else agenda = agenda [ (fa2g n seen)</p>
          <p>
            The proof is given in [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ].
          </p>
          <p>For each individual there is an associated individual island, and Abox
consistency can be checked by considering each island in turn (islands can be loaded
into main memory on the y). Individual islands can be used for sound and
complete instance checks, and iterating over all individuals gives a sound and
complete (albeit still ine cient) instance retrieval procedure for very large Aboxes.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>De nition 7 (Pseudo Node Successor). Given an Abox A, a pseudo node</title>
        <p>successor of an individual a 2 Inds(A) is a pair pnsa;A = hrs; csi, such that
there is an a2 2 Ind(A) with
1. 8R 2 rs:(R(a; a2) 2 A _ R (a2; a) 2 A),
2. 8C 2 cs:C(a2) 2 A, and
3. rs and cs are maximal.</p>
        <sec id="sec-3-3-1">
          <title>De nition 8 (One-Step Node).</title>
          <p>Given O = hT; R; Ai and an individual a 2 Inds(A), the one-step node of a
for A, denoted osna;A , is a tuple osna;A = hrootconset; re set; pnsseti, such
that rootconset = fCjC(a) 2 Ag, re set = fRjR(a; a) 2 A _ R (a; a) 2 Ag,
and pnsset is the set of all pseudo node successors of individual a.</p>
          <p>
            It should be obvious that for realistic datasets, multiple individuals in an
Abox will be mapped to a single one-step node data structure. We associate the
corresponding individuals with their one-step node. In addition, it is clear that
one-step nodes can be mapped back to Aboxes. The obvious mapping function is
called Abox. If Abox(osna;A ) j= C(a) for a query concept C (a named concept),
all associated individuals of osna;A are instances of C. It is also clear that not
every one-step node is complete for determining whether a is not an instance of
C. This is the case only if one-step nodes \coincide" with the islands derived
for the associated individuals (splittable one-step nodes). Wandelt found that
for LUBM in many cases islands are very small, and one-step nodes are indeed
complete in the sense that if Abox(osna;A ) 6j= C(a) then A 6j= C(a) (for details
see [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]). In the following we assume that for instance retrieval, it is possible to
specify a subset of Abox individuals as a set of possible candidates. If the set of
candidates is small, with some candidates possibly eliminated by one-step nodes,
then iterative instance checks give us a feasible instance retrieval algorithm in
practice.
4
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Answering Grounded Conjunctive Queries</title>
      <p>
        In this section we will shortly describe an implementation of the introduced
techniques with a triplestore database. As other groups we use the Lehigh
University Benchmark or LUBM [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for evaluating algorithms and data structures.
This benchmark is an ontology system designed to test large ontologies with
respect to OWL applications. With the LUBM generator, the user can generate
n universities each consisting of a random number of departments and
individuals. As the number of individuals and the number of assertions increases
nearly linear with the number of universities, LUBM is an instrument to test
the performance for query answering machines, especially for grounded
conjunctive queries in a scalable Abox environment. If a system cannot handle
LUBM with, say, a billion triples, it cannot deal with more complex
scenarios occurring in future applications, either. The code we used in this paper for
evaluating the optimization techiques is written in Java and can be downloaded
at http://www.sts.tu-harburg.de/people/c.neuenstadt/. We store data in
the triplestore AllegroGraph, which provides access to role instances (triples)
w.r.t. RDFS plus transitive roles, i.e., role hierarchies and transitive roles are
handled by AllegroGraph. Alternatively one could use materialization or query
expansion in the OBDA style for role hierarchies. SPARQL is used as a query
language for specifying queries to be executed on a particular triplestore database.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Setting up an AllegroGraph Triplestore</title>
        <p>AllegroGraph is run as a server program. In our setting, data is loaded directly
into the server, whereas islands as well as one-step nodes are computed by a
remote program run on a client computer (we cannot extend the server program
easily). In a rst step the whole data system has to be set up before we can
start query answering. During the setup process, the communication between
client and server system consists basically of sending SPARQL queries for data
access required in the algorithm shown in Fig. 1 as well as sending AllegroGraph
statements for adding additional triples (or changing existing ones) for storing
islands and one-step nodes. Islands are indicated using the subgraph components
of AllegroGraph triples (actually, quintuples).</p>
        <p>The \similarity" of one-step nodes is deifned using hashvalues with a su
cient bitlength. We rst compute a set from each island, compute a hashvalue
for it, and store it together with the speci c island in the triplestore. Identical
hash values allow one to refer to \similar" one-step nodes (with additional checks
applied to the collision list as usual for hashing).</p>
        <p>Given the concept description C from the query and the named individual a
from the tuple, we load the speci c one-step node for a from the database and
determine whether osna entails C(a). Depending on the outcome, three states
are possible:
{ Osna entails C(a), then a is actually an instance of C.
{ Osna entails :C(a) or does not entail C(a) and is splittable, then a is
actually not an instance of C.
{ Osna is not splittable, then the client has to load and check the entire island
associated with a to nd out whether a actually is an instance of C.</p>
        <p>Candidates for concept atoms are determined in our experiments by rst
doing a retrieval for a query q by executing skel(q) (see above for a de nition).
Bindings for variables in skel(q) de ne the candidates for retrieval with concept
atoms. By eliminating all skeleton query result tuples that include individuals
which do not belong to corresponding concept assertions used in the query, nally
all remaining tuples are correct answers to the original conjunctive query.</p>
        <p>
          Wandelt has already investigated the e ciency of Abox modularization
techniques for an SQL database server. Here, instead, we work directly on an
existing AllegroGraph triplesotore, convert the large ontology step by step into small
chunks and compare the generated modules with the local modularization of
Sebastian Wandelt on the SQL server [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. The processing time for one university
is about 5000 seconds on AllegroGraph server, where it is like one minute for
Wandelt's approach. The modularization of one university takes nearly 200,000
queries. The decrease in performance is based on the huge number of SPARQL
mini queries between server and the remote modularization client in the
prototype implementation. Thus, only 5 universities are investigated for query
answering experiments.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluating Conjunctive Queries</title>
        <p>For evaluating grounded conjunctive queries, LUBM provides 14 prede ned test
queries, which check several database criteria. We run the tests with an
ontology, which uses the description logic language SHI. LUBM queries di er in
the amount of input, selectivity and reasoning behaviour for example by relying
on inverse roles, role hierarchy, or transitivity inferences.5 Selectivity basically
5 In the LUBM dataset, explict assertions about subrelations of degreeFrom are made
(e.g., doctoralDegreeFrom). The relation degreeFrom is declared as an inverse to
means that the grounded conjunctive queries being considered have role atoms
with large result sets to be reduced by atom queries, which we call less selective
(proportion of the instances involved in the skeleton are high compared to those
that actually satisfy the query criteria), or automatically excludes a lot of
individuals, what we call a highly selective query. Rather than doing a join on the
result sets of all atoms in a grounded conjunctive query, role atoms de ne
candidates for concept atoms. Thus for selective queries, candidate sets for concept
atoms are smaller. This reduces the number of instance checks that remain if,
e.g., one-step node optimizations are not applicable (see above).</p>
        <p>The result indicates that the less selective a query is w.r.t. role atoms, the
more instance checks we need afterwards, and the more time consuming retrieval
is (see Figure 2). Nevertheless, most of the LUBM queries are handled fast,
even with the simple implementation for concept atoms with repetitive instance
checks. Performance for Query 8 will be increased with an implementation of
full one-step node retrieval (with multiple individuals returned at a time, see
above). Queries 6 and 14 contain only concept atoms and are not tested here.</p>
        <p>To demonstrate that our skeleton query candidate generator is able to
significantly improve the results for queries with even low selectivity, we compare the
hasAlumnus. Thus, although, e.g., Query 13 contains a reference to University0
(asking for llers of hasAlumnus), adding new universities with degreeFrom tuples
with University0 on the righthand side causes the cardinality of the set of llers for
hasAlumnus w.r.t. University0 to increase, i.e., having a constant in the query does
not mean the result set to be independent of the number of universities.
approach of skeleton queries with the naive approach without skeleton queries in
Figure 3. One can directly see the huge performance gain of the skeleton query
even for less selective queries. We avoid lots of instance checks and can therefore
decrease the answering time by orders of magnitude in many cases.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>
        In this work we extended the Abox modularization strategies of Wandelt and
colleagues to the e cient use of grounded conjunctive queries on triplestore
servers. Results obtained with the techniques discussed in this paper are sound
and complete. Note that query engines investigated in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] are incomplete.
      </p>
      <p>Our prototype needs linear time to add information to the triplestore in a
setup phase. Therefore we were not able to run queries on billions of triples. We
conclude that island computation needs to be built into the triplestore software
iteself and cannot be done from a remote client.</p>
      <p>In the average case, the size of the individual island (with respect to the
number of assertion in its Abox) is considerably smaller than the original Abox. In
our experiments the size is usually orders of magnitudes smaller. Please note that
these modularization techniques allow traditional description logic reasoning
systems to deal with ontologies which they cannot handle without modularizations
(because the data or the computed model abstraction does not t into main
memory).</p>
      <p>
        In addition, the evaluation of the prototype showed how grounded conjuctive
queries on triplestore servers w.r.t. expressive ontologies (SHI) can be
implemented using only a small size of main memory. The main strategy is to use a
skeleton query and try to keep the necessary amount of instance checks in the
second step as small as possible. If the number of results for less selective
skeleton queries gets larger, the number of instance checks increases rapidly. In some
cases it would obviously have been better to reduce the set of possible tuples by
considering concept atoms rst. This observation has also been made in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and,
much earlier, in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] where more elaborate query plan generation techniques are
investigated, albeit for main memory systems.
      </p>
      <p>
        We would like to emphasize that the proposed optimizations can be used for
parallel reasoning over ontologies [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. This will be further investigated in future
work such that ABDEO will become possible for practically relevant datasets
and ontologies that are more demanding than LUBM.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>RodriguezMuro, and</article-title>
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Ontologies and databases: The DL-Lite approach</article-title>
          .
          <source>Reasoning Web. Semantic Technologies for Information Systems</source>
          , pages
          <fpage>255</fpage>
          {
          <fpage>356</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J.</given-names>
            <surname>Dolby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fokoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kalyanpur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kershenbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Schonberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Ma</surname>
          </string-name>
          .
          <article-title>Scalable semantic retrieval through summarization and re nement</article-title>
          .
          <source>In Proceedings of the National Conference on Arti cial Intelligence</source>
          , page 299. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press;
          <year>1999</year>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Julian</given-names>
            <surname>Dolby</surname>
          </string-name>
          , Achille Fokoue, Aditya Kalyanpur, Edith Schonberg, and
          <string-name>
            <given-names>Kavitha</given-names>
            <surname>Srinivas</surname>
          </string-name>
          .
          <article-title>Scalable highly expressive reasoner (SHER)</article-title>
          .
          <source>J. Web Sem</source>
          .,
          <volume>7</volume>
          (
          <issue>4</issue>
          ):
          <volume>357</volume>
          {
          <fpage>361</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>B.C.</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          , E. Sirin,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kalyanpur</surname>
          </string-name>
          .
          <article-title>Modularity and web ontologies</article-title>
          .
          <source>proc. KR</source>
          ,
          <year>2006</year>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          and
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>He in. A scalable approach for partitioning owl knowledge bases</article-title>
          .
          <source>In Proc. of the 2nd International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2006)</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          , and
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>He in. LUBM: A benchmark for OWL knowledge base systems</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ):
          <volume>158</volume>
          {
          <fpage>182</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>V.</given-names>
            <surname>Haarslev</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mo</surname>
          </string-name>
          <article-title>ller. On the scalability of description logic instance retrieval</article-title>
          .
          <source>Journal of Automated Reasoning</source>
          ,
          <volume>41</volume>
          (
          <issue>2</issue>
          ):
          <volume>99</volume>
          {
          <fpage>142</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Ian</given-names>
            <surname>Horrocks</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sergio</given-names>
            <surname>Tessaris</surname>
          </string-name>
          .
          <article-title>A conjunctive query language for description logic ABoxes</article-title>
          .
          <source>In Proc. of the 17th Nat. Conf. on Arti cial Intelligence (AAAI</source>
          <year>2000</year>
          ), pages
          <fpage>399</fpage>
          {
          <fpage>404</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Ilianna</given-names>
            <surname>Kollia</surname>
          </string-name>
          , Birte Glimm, and
          <string-name>
            <given-names>Ian</given-names>
            <surname>Horrocks</surname>
          </string-name>
          .
          <article-title>SPARQL query answering over OWL ontologies</article-title>
          .
          <source>In Proc. of the 8th European Semantic Web Conf. (ESWC</source>
          <year>2011</year>
          ), Lecture Notes in Computer Science, pages
          <volume>382</volume>
          {
          <fpage>396</fpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Carsten</surname>
            <given-names>Lutz</given-names>
          </string-name>
          , David Toman,
          <string-name>
            <given-names>and Frank</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>Conjunctive query answering in the description logic EL using a relational database system</article-title>
          . In Craig Boutilier, editor,
          <source>IJCAI</source>
          , pages
          <year>2070</year>
          {
          <year>2075</year>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. R. Moller and
          <string-name>
            <given-names>V.</given-names>
            <surname>Haarslev</surname>
          </string-name>
          .
          <article-title>Tableaux-based reasoning</article-title>
          . In S. Staab and R. Studer, editors,
          <source>Handbook of Ontologies</source>
          , pages
          <volume>509</volume>
          {
          <fpage>528</fpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>S.</given-names>
            <surname>Wandelt</surname>
          </string-name>
          .
          <article-title>E cient instance retrieval over semi-expressive ontologies</article-title>
          .
          <source>PhD thesis</source>
          , Hamburg University of Technology,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Wandelt</surname>
          </string-name>
          and
          <article-title>Ralf Moller. Distributed island-based query answering for expressive ontologies</article-title>
          . In Paolo Bellavista,
          <string-name>
            <surname>Ruay-Shiung Chang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Han-Chieh</surname>
            <given-names>Chao</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shin-Feng Lin</surname>
          </string-name>
          , and
          <string-name>
            <surname>Peter M.</surname>
          </string-name>
          <article-title>A</article-title>
          . Sloot, editors,
          <source>Advances in Grid and Pervasive Computing</source>
          , 5th International Conference, GPC 2010, Hualien, Taiwan, May
          <volume>10</volume>
          - 13,
          <year>2010</year>
          . Proceedings, volume
          <volume>6104</volume>
          of Lecture Notes in Computer Science, pages
          <volume>461</volume>
          {
          <fpage>470</fpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Wandelt</surname>
          </string-name>
          and
          <article-title>Ralf Moller. Towards Abox modularization of semiexpressive description logics</article-title>
          .
          <source>Journal of Applied Ontology</source>
          ,
          <volume>7</volume>
          (
          <issue>2</issue>
          ):
          <volume>133</volume>
          {
          <fpage>167</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sebastian</surname>
            <given-names>Wandelt</given-names>
          </string-name>
          , Ralf Moller, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Wessel</surname>
          </string-name>
          .
          <article-title>Towards scalable instance retrieval over ontologies</article-title>
          .
          <source>Journal of Software and Informatics</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>Michael</given-names>
            <surname>Wessel</surname>
          </string-name>
          .
          <article-title>Flexible und kon gurierbare Software-Architekturen fr datenintensive ontologiebasierte Informationssysteme</article-title>
          .
          <source>PhD thesis</source>
          , Technische Universitat Hamburg-Harburg, Hamburg, Germany,
          <year>2009</year>
          . ISBN 978-3-
          <fpage>8325</fpage>
          -2312-1.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>