<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>cient Query Answering in DL-Lite through FOL Reformulation (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Damian Bursztyn</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francois Goasdoue</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ioana Manolescu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>INRIA</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>U. Paris-Sud</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>U. Rennes 1 &amp; INRIA</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose a general query optimization framework for formalisms enjoying FOL reducibility of query answering, for which it reduces to the evaluation of a FOL query against facts. This framework allows searching within a set of alternative equivalent FOL queries, i.e., FOL reformulations, one with minimal evaluation cost when evaluated through a relational database management system. We provide two algorithms, an exhaustive and a greedy, for exploring the optimization space. This framework is applied to the lightweight description logic DL-LiteR underpinning the W3C's OWL2 QL pro le, for which an experimental evaluation validates the interest and applicability of our technique.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Cover-based query answering optimization</title>
      <p>
        RDBMS query optimizers consider a set of evaluation alternatives (a.k.a. logical
and physical plans), and select the one minimizing a cost estimation function.
Since the number of alternatives is in O(2n n!) for a conjunctive query (CQ)
of n atoms [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], modern optimizers rely on heuristics to explore only a few
alternatives; this works (very) well for small-to-moderate size CQs. However, FOL
reformulations go beyond CQs in general, and may be extremely large, leading the
RDBMS to perform poorly.
      </p>
      <p>To work around this limitation, we introduce the cover-based query
answering technique to de ne a space of equivalent FOL reformulations of a CQ. A cover
de nes how the query is split into subqueries, that may overlap, called
fragment queries, such that substituting each subquery with its FOL reformulation
(obtained from any state-of-the-art technique) and joining the corresponding
(reformulated) subqueries, may yield a FOL reformulation for the query to
answer. Not every cover of a query leads to a FOL reformulation; but every cover
which does, yields an alternative cover-based FOL reformulation of the original
query. Crucially for our problem, a smart cover choice may lead to a cover-based
reformulation whose evaluation is more e cient. Thus, the cover-based
technique amounts to circumventing the di culty of modern RDBMSs to e ciently
evaluate FOL reformulations in general.</p>
      <p>Problem 1 (Optimization problem). Given a CQ q and a description logic KB K,
the cost-driven cover-based query answering problem consists of nding a
coverbased reformulation of q based on K with lowest (estimated) evaluation cost.</p>
      <p>We solve this problem for DL-LiteR in two steps. First, we provide a su cient
condition for a cover to be safe for query answering, i.e., to lead to a cover-based
FOL reformulation. The main idea for this condition is to have a cautious
approximation of the query atoms which are interdependent w.r.t. reformulation,
i.e., which (directly or after specialization) unify through state-of-the-art
reformulation techniques, and keep them in the same cover fragment. The space of all
covers of a query q satisfying this condition is denoted Lq; all Lq covers turn out
to correspond to some fusion of fragments from a certain root cover we denote
Croot. We also re ne our su cient condition to identify an extended space of
covers Eq, which includes Lq and also leads to FOL reformulations of q.</p>
      <p>
        Second, based on a function estimating the evaluation cost of a given FOL
query through an RDBMS, we devise two cover search algorithms. The rst
one, termed EC-DL (Exhaustive Covers), starts from Croot and explores all
Eq covers in the case of DL-LiteR. The second one, named GC-DL (Greedy
Covers), also starts from Croot but explores Eq partially, in greedy fashion. It
uses an explored cover set initialized with fCrootg, from which it picks a cover C
inducing a qFOL reformulation with minimum cost (C), and attempts to build
from C a cover C0, by fusing two fragments, or adding (copying) an atom to a
fragment. GC-DL only adds C0 to the explored set if (C0) &lt; (C), thus it only
explores a small part of the search space. Both algorithms return a cover-based
reformulation with the minimum estimated cost w.r.t. the explored space. When
fusing two fragments into one, or adding an atom to a fragment, (C) decreases
if the new fragment is more selective than the fragment(s) it replaces. Therefore,
the RDBMS may nd a more e cient way to evaluate the query of this new
fragment, and/or its result may be smaller, making the evaluation of qFOL based
on the new cover C0 faster.
We implemented our cover-based approach in Java 7, on top of PostgreSQL
v9.3.2. We used the LUBM290 DL-LiteR TBox and associated EUDG data
generator [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: LUBM290 consists of 34 roles, 128 concepts and 212 constraints; the
generated ABox comprises 15 million facts. We chose RAPID [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for CQ-to-UCQ
(unions of CQs) reformulation. For , we used Postgres' own estimation, obtained
using the explain directive. We devised a set of 13 CQs, ranging from 2 to 10
atoms (5:77 on average); their UCQ reformulations have 35 to 667 CQs (290:2 on
average).
      </p>
      <p>
        Figure 1(a) depicts the evaluation time through Postgres, of four FOL
reformulations: (i) the UCQ produced by RAPID [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]; (ii) the JUCQ (joins of UCQs)
reformulation based on Croot; (iii) the JUCQ reformulation corresponding to the
best-performing cover found by our algorithm EC-DL, and (iv) the JUCQ
reformulation based on the best-performing cover found by GC-DL. First, the gure
shows that xed FOL reformulations are not e ciently evaluated, e.g., UCQ for
Q1, Q5 and Q9-Q11, and the one based on Croot for Q6-Q8 and Q13. This poor
performance correlates with the large size of the UCQ reformulations: such very
large unions of CQs are very poorly handled by current RDBMS optimizers, which
are designed and tuned for small CQs. Second, the reformulation based on the
cover returned by EC-DL is always more e cient than UCQ reformulation (more
than one order of magnitude for Q5), respectively, Croot-based reformulation (up
to a factor of 230 for Q6). Third, in our experiments, the GC-DL-chosen cover
leads to a JUCQ reformulation as e cient as the EC-DL one, demonstrating that
even a partial, greedy cover search leads to good performance (this cannot be
guaranteed in general). For Q7 and Q9-Q13, the best cover we found is safe; for
all the others, this is not the case, con rming the interest of the larger space Eq.
      </p>
      <p>Figure 1(b) depicts the running time of the EC-DL and GC-DL algorithms,
which can be seen as the overhead of our cover-based technique. The time is very
small, between 2 ms (Q11-Q13, with just 2 atoms) and 221 ms (EC-DL on Q10, of
10 atoms). The time is higher for more complex queries, but these are precisely
the cases where our techniques are most bene cial, e.g., for Q10, EC-DL runs in
less than 2% of the time to evaluate the UCQ reformulation, while the cover we
recommend is more than 4 times faster than UCQ. As expected, GC-DL is faster
than EC-DL due to the exploration of less covers. Together, Figure 1(a) and
1(b) con rm the bene ts and practical interest of our cost-based cover search.
Acknowledgements This work has been partially funded by the Programme
Investissement d'Avenir Datalyse project and the ANR PAGODA project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giacomo</surname>
            ,
            <given-names>G.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Tractable reasoning and e cient query answering in description logics: The DL-Lite family</article-title>
          .
          <source>JAR</source>
          <volume>39</volume>
          (
          <issue>3</issue>
          ),
          <volume>385</volume>
          {
          <fpage>429</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chortaras</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trivela</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamou</surname>
            ,
            <given-names>G.B.</given-names>
          </string-name>
          :
          <article-title>Optimized query rewriting for OWL 2 QL</article-title>
          . In: CADE (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seylan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>The combined approach to OBDA: taming role hierarchies using lters</article-title>
          .
          <source>In: ISWC</source>
          . pp.
          <volume>314</volume>
          {
          <issue>330</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ono</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lohman</surname>
            ,
            <given-names>G.M.</given-names>
          </string-name>
          :
          <article-title>Measuring the complexity of join enumeration in query optimization</article-title>
          .
          <source>In: VLDB</source>
          (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Perez-Urbina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>E cient query answering for OWL 2</article-title>
          . In: ISWC (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almatelli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Improving query answering over DL-Lite ontologies</article-title>
          .
          <source>In: KR</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Venetis</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamou</surname>
            ,
            <given-names>G.B.</given-names>
          </string-name>
          :
          <article-title>Incremental query rewriting for OWL 2 QL</article-title>
          . In: Description Logics (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>