<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluation of Query Rewriting Approaches for OWL 2</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hector Perez-Urbina</string-name>
          <email>hector@clarkparsia.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edgar Rodr guez-D az</string-name>
          <email>edgar@clarkparsia.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Grove</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Konstantinidis</string-name>
          <email>george@clarkparsia.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evren Sirin</string-name>
          <email>evren@clarkparsia.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Clark &amp; Parsia, LLC</institution>
          <country country="US">United States</country>
        </aff>
      </contrib-group>
      <fpage>32</fpage>
      <lpage>44</lpage>
      <abstract>
        <p>Query answering over ontologies is a crucial feature in contexts such as ontology-based data access and semantic information integration. There is considerable research interest in using query rewriting for e cient and scalable query answering: instead of evaluating a given query over the ontology with the (potentially very large) data directly, one rewrites the query with respect to the relevant knowledge in the ontology, and delegates the evaluation of the computed rewriting to a (possibly deductive) database system where the data resides. In this paper we examine the performance and scalability of producing unions of conjunctive queries versus datalog queries as rewritings. We present an empirical comparison between two representative approaches that consider very expressive ontology languages.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The use of ontologies for query answering allows for the extraction of both
explicit and implicit knowledge from the underlying data. Query answering over
ontologies is a crucial problem in contexts such as ontology-based data access
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and semantic information integration [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        The main query language considered in the literature is that of conjunctive
queries (which captures the core of SPARQL queries). In contrast, several
ontology languages of various levels of expressivity have been considered. Query
answering for very expressive languages such as OWL 2 DL is known to be
intractable [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Fortunately, three pro les of OWL 2 with good computational
properties have been identi ed: QL, RL, and EL.1 In fact, query answering over
QL ontologies is known to be only as hard as evaluating SQL queries over a
relational database [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].2
      </p>
      <p>Query answering in QL can be performed via query rewriting in two steps:
rst, the query and the terminological part of the ontology (i.e., the schema
or TBox) are transformed into a so-called rewriting ; and then the rewriting is
evaluated over the assertional part of the ontology (i.e., the data or ABox) only.</p>
      <sec id="sec-1-1">
        <title>1 http://www.w3.org/TR/owl2-profiles/</title>
        <p>2 With respect to data complexity.
In this case, the rewriting is an expanded version of the original query in the
form of a union of conjunctive queries (UCQ). Therefore, reasoners implementing
query rewriting not only avoid keeping potentially very large ABoxes in memory,
but may delegate evaluation of the rewriting to o -the-shelf, highly optimized
RDBMSs.</p>
        <p>
          Various UCQ-producing rewriting algorithms have been devised for (variants
of) QL [
          <xref ref-type="bibr" rid="ref11 ref15 ref3 ref5 ref8">3, 11, 15, 5, 8</xref>
          ]; alas, the size of the rewritings has been shown to be
exponential with respect to the size of the original query and the TBox [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. In
practice, this means that the computed rewriting might contain hundreds or
thousands of queries, rendering it too big to evaluate e ciently (or at all) over
existing technology. In order to address this problem, alternative approaches have
been devised [
          <xref ref-type="bibr" rid="ref16 ref6">16, 6</xref>
          ] in which, instead of producing a potentially large UCQ, the
original query and the TBox are rewritten into a more succinct datalog query
(DQ). Datalog queries, however, are harder to evaluate than UCQs [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], which
suggests there is a trade-o between the size of the rewriting and the complexity
of its evaluation.
        </p>
        <p>
          In this paper, we consider the advantages and disadvantages of producing
DQs over UCQs in terms of scalability of query answering. We begin by
presenting the query rewriting approach in more detail in Section 2. We then present
a general overview of existing approaches (of the two kinds) in Section 3. The
main contribution of the paper is an empirical evaluation in which we compare
the (DQ-producing) approach of Eiter et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] against Blackout|a highly
optimized version of the (UCQ-producing) approach of Perez-Urbina et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. We
present Blackout in more detail in Section 4. The results of our evaluation are
presented in Section 5. We present our conclusions in Section 6, and discuss our
plans for future work in Section 7.
2
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Query Answering via Rewriting</title>
      <p>In this section, we introduce the notion of query rewriting informally by means of
an example; we then discuss the advantages and disadvantages of the approach,
and we nish with relevant formal de nitions.</p>
      <p>Query rewriting is a technique that can be used to solve the problem of query
answering over ontologies|that is, given a conjunctive query and an ontology,
composed of a TBox and an ABox, compute the set of certain answers of the
query with respect to the ontology. The main idea behind query rewriting is to
transform the given query and TBox into an expanded query that can be later
evaluated over the ABox only. Intuitively, the expanded query contains all the
relevant information captured in the TBox, making the latter unnecessary for
query evaluation.</p>
      <p>Example 1. Suppose we have an ontology O = hT ; Ai that talks about
universities, students, professors, and so on. The TBox T contains the following axioms
(shown in Manchester syntax3):</p>
      <sec id="sec-2-1">
        <title>Class: Teacher SubClassOf: teaches some Thing</title>
      </sec>
      <sec id="sec-2-2">
        <title>Class: Professor SubClassOf: Teacher</title>
      </sec>
      <sec id="sec-2-3">
        <title>ObjectProperty: hasTutor Range: Professor</title>
        <p>where axiom (1) states that teachers teach at least someone, axiom (2) states
that professors are teachers, and axiom (3) states that all tutors are professors.</p>
        <p>
          Suppose that we want to retrieve the list of individuals who teach according
to O using the query Q (shown in datalog syntax [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]):
        </p>
        <p>Q(x)</p>
        <p>teaches(x; y)</p>
        <p>Before considering the data in A, we can rewrite the query with respect to
the TBox|that is, expand Q with the relevant knowledge in T . According to
the meaning of axioms (1){(3), we conclude that all teachers, professors, and
tutors teach; therefore, we expand Q with:</p>
        <p>Q(x)
Q(x)
Q(y)</p>
        <p>Teacher(x)
Professor(x)
hasTutor(x; y)</p>
        <p>We can now evaluate the resulting union of queries (4){(7) over A without
further consideration of T .</p>
        <p>The query rewriting approach has important advantages over the `direct'
query answering approach implemented in reasoners like Pellet,4 HermiT,5 or
FaCT++.6 Since the query and the TBox only are considered, reasoners
implementing query rewriting need not maintain potentially large ABoxes in memory,
a crucial feature for some applications in terms of scalability. Once the query
has been rewritten, its evaluation can be delegated to existing highly optimized
(deductive) database systems. Moreover, as the rewriting is independent from
the ABox, one does not need to recompute it every time the data changes, but
only when the TBox does. This is important in many application domains where
data tends to change much more often than the schema.</p>
        <p>
          The speci cation of OWL 2 includes the de nition of various fragments or
pro les that have been tailored with speci c use cases in mind. In particular, the
QL pro le was designed so as to bene t from the advantages of query rewriting,
both in terms of scalability and performance. It has been shown that queries
posed over OWL 2 QL TBoxes can be rewritten into unions of conjunctive queries
(UCQs) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Producing UCQs is particularly desirable as their evaluation can be
delegated to RDBMSs [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
3 http://www.w3.org/TR/owl2-manchester-syntax/
4 http://clarkparsia.com/pellet/
5 http://hermit-reasoner.com/
6 http://owl.man.ac.uk/factplusplus/
(1)
(2)
(3)
(4)
(5)
(6)
(7)
        </p>
        <p>
          Query rewriting is, alas, not a silver bullet. Depending of the nature of the
expanded query, it might turn out to be too big and/or complex to evaluate e
ciently (or at all). In particular, for instance, the size of a UCQ computed from
a query and an OWL 2 QL TBox has been shown to be worst-case exponential
(with respect to the size of the inputs) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. This means that we might compute
a UCQ containing hundreds or thousands of queries, which would compromise
the feasibility of its evaluation. Regarding complexity, once we consider more
expressive fragments of OWL than QL, the resulting expanded query may not
longer be a UCQ. Depending on how far we go with respect to ontology
expressivity, we might need to produce recursive or even disjunctive datalog queries in
order to ensure the soundness and completeness of the results. As one might
expect, these types of query are harder to evaluate than UCQs. In such cases, one
needs a more sophisticated machinery, such as that implemented in a deductive
database system, for query evaluation.
        </p>
        <p>
          An alternative approach to query rewriting is a technique based on forward
chaining [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], known as materialization. This approach consists of expanding the
ABox, instead of the query, with respect to the TBox, to e ectively make all the
implicit knowledge explicit. This approach might be preferable to query
rewriting in cases where there are no changes to the ontology; queries may be executed
as they are, without the need to rewrite them into potentially larger, more
difcult to answer ones. Materialization, however, has signi cant drawbacks when
changes to the data are frequent. This is due to the fact that materialized
inferences need to be maintained in order for query answering to remain sound
and complete. Thus, materialization may not be e cient in domains where data
changes frequently. In contrast, query rewritings are independent of the ABox;
therefore, one does not need to recompute them every time the data changes,
but only when the TBox does. Materialization may also not be feasible as the
expanded ABox might be prohibitively large. In contrast, query rewriting requires
no modi cations to the ABox.
        </p>
        <p>Other alternatives include hybrid approaches where both the query and the
ABox are expanded with respect to the TBox. The objective of these approaches
is to avoid the potential exponential explosion in the size of the expanded query
by using certain types of axiom to expand the ABox, while still retaining a
manageable size. Unfortunately, similarly to materialization, these approaches
might not be very e cient in scenarios where the data changes frequently.</p>
        <p>
          We nish this section by giving a formal de nition of the various notions
described thus far. We use the well-known notions of constants, variables, function
symbols, terms, and atoms of rst-order logic [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>De nition 1. A Horn clause is an expression of the form H B1 ^ ^ Bm,
where H is a possibly empty atom and fBig is a set of atoms. The atom H is
called the head and the set fBig is called the body. A Horn clause C is safe if
all variables occurring in the head also occur in the body.</p>
        <p>A datalog program P is a set of function-free, safe Horn clauses. The
extensional database (EDB) predicates of P are those that do not occur in the
head atom of any Horn clause in P ; all other predicates are called intensional
database (IDB) predicates. A datalog query (DQ) Q is a tuple hQP ; P i, where
QP is a query predicate and P is a datalog program. Q = hQP ; P i is a union of
conjunctive queries (UCQ) if QP is the only IDB predicate in P and the body of
each clause in P does not contain QP , and Q is a conjunctive query (CQ) if it
is a union of conjunctive queries and P contains exactly one Horn clause.</p>
        <p>
          An ontology O is a tuple hT ; Ai, where T is the terminological box or TBox,
and A is the assertional box or ABox [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. A tuple of constants ~a is an answer
of a datalog query Q = hQP ; P i on an ontology O = hT ; Ai if and only if
O [ P j= QP (~a), where P is considered to be a set of universally quanti ed
implications with the usual rst-order semantics; the set of all answers of Q on
O is denoted by ans(Q; O).
        </p>
        <p>Given a conjunctive query Q and a TBox T , a datalog query QT is said to
be a rewriting of Q w.r.t. T if and only if ans(Q; T [ A) = ans(QT ; A) for every
A.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>State of the Art</title>
      <p>The success of query rewriting depends on algorithms that produce
manageable rewritings, both in terms of size and complexity. Since the seminal work
of Calvanese et al. on DL-Lite|the Description Logic that provides the
logical underpinning for OWL 2 QL|many rewriting algorithms aimed at e cient
query answering via query rewriting have been proposed, most of which have
been implemented in prototypes or commercial systems.</p>
      <p>
        We limit ourselves to approaches implementing query rewriting as de ned
in De nition 1; that is, approaches where only the query gets expanded and
the ABox is considered to be independent. Materialization-based and hybrid
approaches|such as those by Kontchakov et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and Rodr guez-Muro and
Calvanese [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]|are out of the scope of this paper.
      </p>
      <p>
        Notable approaches include that of Calvanese et al. (PerfectRef) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
PerezUrbina et al. (Requiem) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Chortaras et al. (Rapid) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Rosati and Almatelli
(Presto) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], Rosati (Prexto) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], Gottlob et al. (Nyaya) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and Eiter et al.
(Clipper) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In Table 1 these approaches are classi ed by the type of
rewritings they produce (either UCQ or DQ) and by the Description Logic (DL) they
support. Unsurprisingly, most approaches have been proposed for DL-Lite;
however, note that there are a few algorithms that support DL-Lite as well as more
expressive logics (shown in parentheses). Among the approaches that go beyond
DL-Lite, Requiem is the only one that produces UCQs for DL-Lite and DQs for
more expressive logics. In fact, Requiem will only produce DQs when the
rewriting has to be recursive in order to ensure correct results; therefore, in many cases
Requiem will produce UCQs even for logics more expressive than DL-Lite.
      </p>
      <p>Most of the papers cited previously include an empirical evaluation of the
approach with respect to others. We have summarized these results in Table 2,
which shows the comparison of the approaches with respect to the rewritings size
(number of clauses), their structural complexity, the time it takes to compute
them, and the time it takes to evaluate them over some ABox.
[Clipper Presto], Prexto &lt;
[PerfectRef = Requiem = Rapid = Nyaya]
Complexity [PerfectRef
[Clipper</p>
      <p>Requiem
Presto]</p>
      <p>Prexto</p>
      <p>Rapid</p>
      <p>Nyaya] &lt;
Time</p>
      <p>Rapid, Nyaya, [Clipper</p>
      <p>Presto] &lt; Requiem &lt; PerfectRef
Eval time
[Requiem</p>
      <p>Clipper</p>
      <p>Presto] &lt; PerfectRef
Note 1. All these comparison were made over DL-Lite ontologies. The relationship
between approaches separated with commas is not discussed in the literature.</p>
      <p>The approaches that produce DQs (see Table 1) produce smaller rewritings
than their UCQ-producing counterparts, with the exception of Prexto. This is
due to the fact that UCQs are larger than semantically equivalent (non UCQ)
DQs (conjunctive normal form versus disjunctive normal form). Prexto stands
apart because, unlike other UCQ-producing approaches, it implements an
optimization that considers the ABox to reduce the size of the rewritings.
Regarding complexity, we see that UCQ-producing approaches do better than
DQproducing ones. With respect to time, as it is related to size, it is not surprising
that producing DQs is faster than producing UCQs; it is important to mention,
however, that among those algorithms that produce UCQs, some approaches are
much more e cient than others (hours versus seconds). Finally, with respect to
evaluation time, we see that Presto outperforms PerfectRef, and, interestingly,
that Requiem, Clipper, and Presto perform similarly, in spite of the fact that
Requiem's rewritings are larger.</p>
      <p>The comparison between Requiem, Clipper, and Presto regarding evaluation
times was carried out by evaluating the computed rewritings of each system using
DLV.7 Using such a system was necessary as both Presto and Clipper require a
deductive database of this type for query evaluation even for DL-Lite ontologies;
note, however, that Requiem produces UCQs in this scenario. Therefore, these
results suggest that evaluating semantically equivalent UCQs and DQs in DLV
takes approximately the same time, so there is no substantial gain in evaluation
time by producing (smaller but more complex) DQs versus UCQs. It would be
interesting to see, however, whether UCQs can be evaluated more e ciently in
an RDBMS or an RDF database.</p>
      <p>Another important aspect to consider is the time it takes to compute the
rewritings. According to Table 2, both Clipper and Presto outperform Requiem
with respect to this metric (in fact, Requiem is outperformed by every algorithm
except PerfectRef). It would be interesting to see whether Requiem can be made
faster by enhancing it with the various optimization techniques used in the other,
more e cient approaches.</p>
      <p>In order to address these two questions, we present an empirical evaluation
of Clipper and Blackout|an optimized version of Requiem|in Section 5. We
chose these two approaches as they are the ones that support the most
expressive logic within their respective rewriting type (UCQ vs DQ) categories. The
optimizations implemented in Blackout are described in Section 4.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Blackout Optimizations</title>
      <p>In this section we describe Blackout, a highly optimized version of Requiem.
Blackout is part of the state-of-the-art triplestore Stardog.8 Besides careful
software engineering for e ciency, Blackout improves Requiem with two core
optimizations.</p>
      <p>First, Blackout implements an eager query containment optimization, as
opposed to Requiem's lazy approach that computes query containment as the last
step. As observed in most of the papers referenced in Section 3, this is one of
Requiem's major drawbacks as the nal containment step could take a very
long time. Eager containment prunes redundant queries earlier in the rewriting
process; thus, it prevents additional rewritings from being generated from these
redundant queries, which themselves would be redundant. Even though this does
not ultimately change the number of rewritings, it does signi cantly minimize
the time spent on query containment checks. Thus, Blackout does not waste time
generating redundant queries that will eventually be pruned, and it reduces the
total number of containment checks performed.</p>
      <p>
        Second, Blackout implements an optimization technique known as data
oracle. This optimization is related to the so-called extensional constraints technique
presented in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. If a derived query contains an atom which is empty with respect
to a given ABox, the query is discarded as it would obviously produce empty
7 http://www.dlvsystem.com/
8 http://stardog.com/
results when evaluated. For instance, consider the rewriting obtained in Example
1, if we knew via the data oracle that the class Professor had no instances in A,
then there would be no need to include query (6) in the nal rewriting.
      </p>
      <p>The e ectiveness of this optimization lies in the fact that even when a TBox
might contain a large number of classes and properties, the assertions in the
ABox typically use a much smaller number of classes and properties. This is
frequently the case for deep class hierarchies where asserted types use leaf classes
of the hierarchy, instead of more general classes higher in the hierarchy. For this
reason, querying for a generic class might produce many rewritings because of
the class hierarchy, but we might not need to execute all of those rewritings
depending on the speci c ABox at hand.</p>
      <p>In Section 2 we pointed out that one advantage of the query rewriting
approach is that it is independent of the ABox, whereas clearly the data oracle
optimization introduces a dependency. This dependency, however, is a very weak
one and requires the data oracle to only check the existence of an atom, a class
or a property, in the data. Stardog, like other RDF databases, maintains special
index structures that make this very e cient. Therefore, the query rewriting
component can still be loosely-coupled from the storage system and does not
need to maintain special in-memory data structures for this optimization.
Moreover, as will be discussed further, the data oracle implementation can be crucial
to the success of query rewriting in practice.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>In this section we present an empirical evaluation of Clipper and Blackout. Our
evaluation is based on the LUBM benchmark|a well-known standard that
provides customizable data generation capabilities.9 We rst examined the
performance of the two approaches with respect to size/complexity of the rewritings,
including the time it took to compute them; and then, we looked into how these
rewritings perform when evaluated.</p>
      <p>All experiments were performed on Ubuntu 3.0.0 with a 3.2Ghz AMD
Phenom processor, 8GB of RAM running Java 1:6:033 with 8GB allocated to the
JVM for each run. We recorded the average of 10 runs after 5 warmups for each
experiment.
5.1</p>
      <p>Computing Rewritings
The rst part of the evaluation consisted of rewriting the 14 LUBM queries with
respect to three TBoxes: TQL, TRL, and TEL, which correspond to QL, RL, and
EL versions of the LUBM TBox, respectively. Table 3 summarizes our results.
On the left-hand-side, we show the time in milliseconds that each system took
to produce the rewritings for the 14 queries, whereas on the right-hand-side we
show the number of clauses that each system produced overall.</p>
      <sec id="sec-5-1">
        <title>9 http://swat.cse.lehigh.edu/projects/lubm/</title>
        <p>As can be seen, Blackout is generally slower than Clipper, but it produces
smaller rewritings. We believe the rewritings with few clauses produced by
Blackout are the result of the data oracle optimization. In order to verify the bene ts
of this optimization, we ran Blackout without it and obtained 66 clauses for TQL,
65 clauses for TRL, and 253 clauses for TEL. The most signi cant gain was in
query 5 over TEL, in which Blackout produced a rewriting containing 51 clauses,
whereas Blackout with no data oracle produced 117. These results suggest that
the data oracle optimization, when applicable, may have a big impact.</p>
        <p>As can be seen in Table 4, Clipper produced DQs exclusively for all the
queries and pro les, whereas Blackout produced UCQs most of the time, even
for RL and EL. These results suggest that it is not always necessary to produce
DQs as rewritings even for RL or EL ontologies. The type of the rewriting
impacts evaluation time as UCQs are structurally less complex than DQs, and
might be easier to evaluate, depending on their size.
The second part of our evaluation consisted of evaluating the rewritings produced
by the two approaches. We used DLV|a state-of-the-art deductive database
system (datalog engine) maintained by DLVSYSTEM s.r.l.|to evaluate Clipper
and Blackout rewritings, and we used Stardog to evaluate Blackout rewritings
only.10 We considered four ABoxes of increasing size: A1, A10, A100, and A1000,
which contain approximately 138K, 1.38M, 13.8M, and 138M triples,
respectively.11
10 Note that Stardog cannot presently evaluate DQs.
11 Since the current implementation of Clipper does not support data property
assertions, we ignored this type of assertion in our tests.</p>
        <p>As can be seen, DLV was able to execute Blackout rewritings faster than those
of Clipper, which is not surprising as the former are smaller and less complex than
the latter. Unfortunately, DLV was unable to execute the rewritings over A100
and A1000 due to lack of memory since it maintains all the clauses in memory.
Clearly, in order to be able to evaluate rewritings and queries in general over
large ABoxes, we would need a system that makes use of secondary storage.
In this section we summarize the results from previous sections with respect
to Blackout and Stardog. Table 7 shows the overall performance of Stardog in
milliseconds counting the time it took Blackout to compute the rewritings and
the time it took Stardog to evaluate them. It also shows the percentage of time
that was spent on evaluating the produced rewritings.</p>
        <p>Our results show that the larger the ABox, the longer time is spent on
evaluating the rewritings. Therefore, we believe it is important to produce the simplest
and smallest rewritings possible, even if this means spending a bit more time on
the rewriting phase.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this section we present a summary of our results.</p>
      <p>DQ-producing approaches, such as Clipper, do not necessarily produce smaller
rewritings than UCQ-producing approaches as formerly thought. Taking into
consideration the underlying data might result in optimizations, such as the
data oracle optimization, that can signi cantly reduce the size of the produced
UCQs. This type of optimization should apply to DQs as well; it would be
interesting to see to what extent it does and the impact it has.</p>
      <p>It is not necessary to produce DQs for RL and EL in many cases. This is
important as users can bene t from several of the languages features that are
not included in QL without needing a datalog engine for query evaluation. As
shown in this paper, the size of UCQs can be reduced and, importantly, UCQs
are amenable to straightforward parallel evaluation. Therefore, we believe that
one should only have to deal with DQs, and datalog engines, when necessary.</p>
      <p>Evaluating the rewritings dominates the overall query answering time at large
scales. Therefore, even though it may not be bene cial at small scales, it is worth
investing time on optimizing the computation of rewritings in order to produce
smaller and simpler rewritings that can be evaluated more e ciently.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Future Work</title>
      <p>
        We are currently working on adding datalog evaluation to Stardog so that it can
correctly evaluate Blackout rewritings even when they are DQs. Our ongoing
implementation is based on the well-known algorithm Query-SubQuery [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Once
it is ready, it will be interesting to compare its performance with that of DLV
and other state-of-the-art datalog engines (e.g., IRIS12).
      </p>
      <p>Additionally, we plan to work on the parallelization of UCQ evaluation within
Stardog. Currently, Stardog executes the results of the query rewriting as a single
query; it takes the UCQ produced by Blackout and creates a single query by
unioning each of the queries. However, the queries composing the UCQ tend
to be relatively small, simple, and easy to evaluate. Crucially, they are also
independent: the results of one conjunct are not needed to produce the results
of another. This lends itself very nicely to evaluation of each query in parallel,
which can be done by taking advantage of existing architecture within Stardog.</p>
      <p>We are also working on the implementation of SPARQL 1.1 with Stardog.
There are new features in SPARQL 1.1 such as sub-queries and property paths
that have an interesting overlap with features provided, or planned, within
Stardog and Blackout. First, we will explore what the performance implications are
for rewriting sub-queries in SPARQL 1.1, and if there are advantages the QSQ
approach can provide during evaluation, or even if rewriting sub-queries is a
feasible design. Additionally, with some OWL language features, such as
transitivity, now available in SPARQL, we will determine whether Blackout can be
used to handle queries utilizing these features.
12 http://www.iris-reasoner.org/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hull</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Vianu</surname>
          </string-name>
          .
          <source>Foundations of Databases. Addison-Wesley</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Nutt</surname>
          </string-name>
          . Basic Description Logics, chapter
          <volume>2</volume>
          , pages
          <fpage>47</fpage>
          {
          <fpage>100</fpage>
          . Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Tractable Reasoning and E cient Query Answering in Description Logics: The DL-Lite Family</article-title>
          .
          <source>Journal of Automated Reasoning</source>
          ,
          <volume>9</volume>
          :
          <fpage>385</fpage>
          {
          <fpage>429</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>C.-L. Chang and R. C</surname>
          </string-name>
          .-T. Lee.
          <article-title>Symbolic Logic and Mechanical Theorem Proving</article-title>
          . Academic Press, Inc., Orlando, FL, USA,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Chortaras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Trivela</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. B.</given-names>
            <surname>Stamou</surname>
          </string-name>
          .
          <article-title>Optimized Query Rewriting for OWL 2 QL</article-title>
          . In N. Bj rner and V. Sofronie-Stokkermans, editors,
          <source>CADE</source>
          , volume
          <volume>6803</volume>
          of Lecture Notes in Computer Science, pages
          <volume>192</volume>
          {
          <fpage>206</fpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>T.</given-names>
            <surname>Eiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simkus</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.-K. Tran</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Xiao</surname>
          </string-name>
          .
          <article-title>Towards Practical Query Answering for Horn-SHIQ</article-title>
          . In Y. Kazakov,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          , and F. Wolter, editors,
          <source>Description Logics</source>
          , volume
          <volume>846</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B.</given-names>
            <surname>Glimm</surname>
          </string-name>
          , I. Horrocks,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>Conjunctive Query Answering for the Description Logic SHIQ</article-title>
          . CoRR, abs/1111.0049,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>G.</given-names>
            <surname>Gottlob</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Orsi, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pieris</surname>
          </string-name>
          . Ontological Queries:
          <article-title>Rewriting and Optimization</article-title>
          . In S. Abiteboul,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>Bohm, C. Koch, and</article-title>
          K.-L. Tan, editors,
          <source>ICDE</source>
          , pages
          <volume>2</volume>
          {
          <fpage>13</fpage>
          . IEEE Computer Society,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Toman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>The combined approach to query answering in dl-lite</article-title>
          . In F. Lin and
          <string-name>
            <surname>U</surname>
          </string-name>
          . Sattler, editors,
          <source>Proceedings of the 12th International Conference on Principles of Knowledge Representation and Reasoning (KR2010)</source>
          . AAAI Press,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          .
          <article-title>Data Integration: a theoretical perspective</article-title>
          .
          <source>In PODS '02: Proceedings of the twenty- rst ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems</source>
          , pages
          <volume>233</volume>
          {
          <fpage>246</fpage>
          , New York, NY, USA,
          <year>2002</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. H.
          <string-name>
            <surname>Perez-Urbina</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>and I.</given-names>
          </string-name>
          <string-name>
            <surname>Horrocks</surname>
          </string-name>
          .
          <article-title>Tractable Query Answering and Rewriting under Description Logic Constraints</article-title>
          .
          <source>J. Applied Logic</source>
          ,
          <volume>8</volume>
          (
          <issue>2</issue>
          ):
          <volume>186</volume>
          {
          <fpage>209</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Linking Data to Ontologies</article-title>
          .
          <source>J. on Data Semantics</source>
          , X:
          <volume>133</volume>
          {
          <fpage>173</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. M.
          <string-name>
            <surname>Rodriguez-Muro</surname>
            and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Calvanese</surname>
          </string-name>
          . Dependencies:
          <article-title>Making Ontology Based Data Access Work</article-title>
          . In P. Barcelo and V. Tannen, editors,
          <source>AMW</source>
          , volume
          <volume>749</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. M.
          <string-name>
            <surname>Rodriguez-Muro</surname>
            and
            <given-names>D. Calvanese.</given-names>
          </string-name>
          <article-title>High performance query answering over dl-lite ontologies</article-title>
          . In G. Brewka,
          <string-name>
            <given-names>T.</given-names>
            <surname>Eiter</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. A</surname>
          </string-name>
          . McIlraith, editors,
          <source>KR</source>
          . AAAI Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          . Prexto:
          <article-title>Query Rewriting under Extensional Constraints in DL-Lite</article-title>
          . In E. Simperl,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          , and V. Presutti, editors,
          <source>ESWC</source>
          , volume
          <volume>7295</volume>
          of Lecture Notes in Computer Science, pages
          <volume>360</volume>
          {
          <fpage>374</fpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Almatelli</surname>
          </string-name>
          .
          <article-title>Improving Query Answering over DL-Lite Ontologies</article-title>
          . In F. Lin,
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler</surname>
          </string-name>
          , and M. Truszczynski, editors,
          <source>KR</source>
          . AAAI Press,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>