<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology-Mediated Query Answering: Performance Challenges and Advances</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Univ Rennes</institution>
          ,
          <addr-line>Lannion</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontology-mediated query answering (OMQA) is a recent data management trend in the Artificial Intelligence, Database and Semantic Web areas, which aims at answering database queries on knowledge bases. Because it is an intricate combination of automated reasoning and database query evaluation, it raises major performance challenges. In this demonstration, we showcase a decade of OMQA optimization to understand “Where do we stand now and how did we get there?” and we highlight a promising new OMQA optimization that brings further significant performance improvement to discuss “What's next?”.</p>
      </abstract>
      <kwd-group>
        <kwd>query answering</kwd>
        <kwd>knowledge base</kwd>
        <kwd>summary</kwd>
        <kwd>optimization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivations</title>
      <p>
        Ontology-mediated query answering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (OMQA) amounts to answering database
queries on a knowledge base (KB), i.e., data described by an ontology modelling
the background knowledge of the application domain. It goes beyond database
query answering that produces answers just from the data, by identifying
additional answers through reasoning about the data with the help of the ontology.
OMQA has become recently a hot topic in the Artificial Intelligence, Database
and Semantic Web communities, notably for conjunctive queries (a.k.a. the core
select-project-join database queries) on KBs expressed with Description Logics,
on which builds the W3C’s Web Ontology Language (OWL2), as well as with
the closely related Datalog± and Existential Rules.
      </p>
      <p>
        Three OMQA techniques have been devised in the literature. They amount to
reducing OMQA to standard relational database query evaluation by compiling
the ontology knowledge in the query via reformulation or rewriting (e.g., [
        <xref ref-type="bibr" rid="ref13 ref3 ref5">5,13,3</xref>
        ]),
in the data via saturation or materialization (e.g., [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]), or in both via combined
or hybrid approaches (e.g., [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). The OMQA technique based on query
reformulation, on which we focus in this demonstration, is the by far the most studied
and utilized one. It was introduced in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and consists in reformulating every
incoming conjunctive query (CQ) q into a union of conjunctive queries (UCQ)
q′ using the KB’s ontology, so that the standard database evaluation of (the
SQLized) q′ on the KB’s data stored in a relational database yields the correct
answers to q. Crucially, such a reformulated query q′ enumerates within its union
all the (worst case exponentially many) specializations of q w.r.t. the ontology
knowledge. In practice, reformulations may be large, in which case relational
database management systems (RDBMSs), even modern ones, cannot answer
them eficiently (all the CQs in the UCQ are evaluated). To mitigate this
performance issue, investigations have focused on alternative reformulation languages
that may be easier to evaluate than large UCQs, by limiting redundancies across
CQs within UCQs: minimal UCQs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and unions of semi-CQs [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] (USCQs).
However, such OMQA approaches produce a single reformulation for every query,
thus are doomed to fail or to poor performance if the produced reformulation
is too costly to evaluate by the RDBMS at hand. Based on this observation,
the language of joins of UCQs (JUCQs) was considered to explore a space of
semantically equivalent but syntactically diferent reformulations, among which
one with lowest estimated evaluation cost on the RDBMS is chosen [
        <xref ref-type="bibr" rid="ref2 ref3">3,2</xref>
        ].
      </p>
      <p>
        With this demonstration, our ambition is twofold. First, we want to
showcase a decade of OMQA optimization to understand “Where do we stand now
and how did we get there?” For that, we compare and analyze the performance
improvement that the literature has brought over the years with the
state-ofthe-art UCQ, then USCQ and finally JUCQ reformulations. Second, we want to
discuss “What’s next?” by highlighting a novel advance on OMQA optimization
that we propose, and which allows improving significantly the performance of
UCQ, USCQ and JUCQ reformulations using a summary of the queried data [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
up to several orders of magnitude in some cases! To substantiate the discussion,
we focus on OMQA for CQs on KBs expressed with the DL-liteR description
logic [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]; this setting is particularly relevant from both the technical and
practical viewpoints: (i) the UCQ, USCQ and JUCQ reformulations were defined for
it and (ii) DL-liteR underpins the QL profile of OWL2, i.e., the well-established
W3C standard for OMQA on large data volume.
      </p>
      <p>In Sec. 2, we recall the reformulation-based OMQA technique, point out why
the evaluation performance of state-of-the-art query reformulations is
intrinsically limited, and discuss our new optimization based on data summaries to
circumvent this performance issue. Then, in Sec. 3, we present the demo
scenarios we prepared, and sketch the demo experience we planned for the attendees.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Reformulation-based OMQA and optimization thereof</title>
      <p>A DL-liteR KB K consists of a database called an ABox, noted A, described by
an ontology called a TBox, noted T ; we note such a KB K = (T , A).</p>
      <p>Reformulation-based OMQA amounts to reformulating a CQ q using a TBox
T into a reformulation qT so that for all ABox A′, the relational evaluation
of qT on A′ stored in an RDBMS as a relational database computes the exact
answer set of q on the KB K = (T , A′). We argue that the universality of a qT
reformulation needlessly limits its evaluation performance: when
reformulationbased OMQA is used to answer q on a particular KB K = (T , A), the ABox A
at hand is just one of all the possible ABoxes qT accommodates to. Because
of that, some sub-queries in qT may be useless when evaluating qT on A: they
have no answer on it, hence do not participate in producing q’s answers from it.
Their evaluation, which may take significant time, should thus be avoided to the
extent possible; this is the goal of the new optimization we devise below.</p>
      <p>
        As the names of the state-of-the-art query reformulation languages
suggest, reformulations encode semantically equivalent but syntactically diferent
algebraic combinations of sub-CQ queries: UCQ, USCQ and JUCQ. Our
optimization aims at pruning rapidly away from query reformulations such sub-CQs
with no answer on the queried KB’s ABox. It relies on the notion of ABox
summary [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which was proposed for fast consistency checking and instance retrieval
with the SHIN DL [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] , and which we straightforwardly adapt to DL-liteR:
ABox facts are restricted to instances of atomic concepts and roles, i.e., just
concept and role predicates instead of SHIN formulae. A summary S of an
ABox A is an ABox homomorphic to A defined by a so-called canonical
function f from the constants in A to the constants in S: S = Af , where Af is
obtained from A by replacing every constant c by its image f (c) through f . A
central property of such a summary, which directly follows from the existence of
the A-to-S homomorphism defined by the function f , is that if a CQ q has some
answer on A, then its translation qf has some answer on S, where qf is obtained
from q by replacing every constant c by its image f (c) through f [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The crux
of our optimization is to exploit the contraposition of this property, i.e., if qf
has no answer on S, then q has no answer on A. We therefore optimize a UCQ,
USCQ or JUCQ reformulation by pruning away from it each of its sub-CQs with
no answer on the KB’s ABox summary. Finally, and crucially for our summary
usage, ABox summaries can be rapidly computed and maintained upon ABox
updates [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Demonstration outline</title>
      <p>
        For the demo 1, we selected a set of well-established DL-liteR benchmarks,
notably LUBM∃ [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and NPD [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] that are respectively based on synthetic
and real data. For each, we stored ABoxes of increasing sizes, from
small-tolarge, in the popular DB2 (v11.5), MySQL (v8.0.25) and PostgreSQL (v12.7)
RDBMSs, on top of which we deployed the of-the-shelf Rapid (v0.93) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
Compact (v1.0b6) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and GDL (v1.0) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] query reformulation tools2 for
(minimal) UCQ, USCQ and JUCQ reformulations respectively. We also developed
a PHP/JSP/jQuery-based GUI (see Fig. 1 and Fig. 2) that ofers a nice
1 A one-minute teaser video of our demo is available at:
http://people.irisa.fr/Francois.Goasdoue/ISWC21/demo-iswc21-teaser.mp4
2 Query reformulation time is typically negligible w.r.t. reformation evaluation time,
thus the particular choice of tools we made does not afect our conclusions.
visual attendee experience in order to examine the performance challenges and
advances in OMQA optimization.
      </p>
      <p>Scenario 1“Where do we stand now and how did we get there?”:
We answer this question
by carrying out
performance analyses through
our GUI. For each, we
select: (i) ABoxes and
queries from a benchmark
based on a collection of
statistics (e.g., ABox and
query sizes, number of
answers per ABox/query
pair, etc) but also on the
graphical inspection of the Fig. 1. Performance analysis
queries, (ii) query
reformulation languages, and (iii) RDBMS(s). For a given analysis, query answering
times are reported by RDBMSs, then by ABoxes, then by queries; they are
displayed using stacked bar charts to show, in each case, the time spent in
reformulating the query w.r.t. the time spent in evaluating the query reformulation
(Fig. 1). Further, query reformulations can be graphically inspected to understand
the challenges raised by their evaluation through RDBMSs. Based on such
analysis, we first show that the UCQ and USCQ reformulation approaches are not
robust OMQA solutions because the single reformulation they explore for a given
query may lead to poor performance or just fail in being evaluated on
moderateto-large ABoxes, on all RDBMSs. We also point out that the performance of the
JUCQ reformulation approach strongly depends on the quality of the cost model
used to select the reformulation to send to the RDBMS. Because of that, though
in principle the JUCQ reformulation approach should be competitive w.r.t. UCQ
and USCQ reformulation ones, this is not always the case.</p>
      <p>Scenario 2“What’s next?”:
We answer this question by
advertising the potential of our
summarybased optimization for further
significant improvement of OMQA
performance (up to several order of
magnitude) for the UCQ, USCQ and JUCQ
reformulation approaches. When
enabled, (i) statistics are also reported
for ABox summaries (size and
compression ratio, time to be computed,
and average update time upon
update) when setting up a performance
analysis and (ii) stacked bar chars
also display for every optimized
re</p>
      <p>Fig. 2. UCQ reformulation inspection
formulation the time spent in pruning its empty sub-CQs based on the ABox
summary (Fig. 1). Further, to understand the performance gain, the graphical
inspection of an optimized UCQ, USCQ or JUCQ query reformulation also
highlights its empty sub-CQs on the ABox, and which of them are pruned out based
on the summary.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>This work was partially supported by the ANR project CQFD
(ANR-18-CE230003) and the ARED project Gedeon (Reg´ion Bretagne &amp; Lannion Terg´or
Communauet)´.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bienvenu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Ontology-mediated query answering: Harnessing knowledge to get more from data</article-title>
          .
          <source>In: IJCAI</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bursztyn</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Goasdoue,´ F.,
          <string-name>
            <surname>Manolescu</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Optimizing FOL reducible query answering: Understanding performance challenges</article-title>
          .
          <source>In: ISWC</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bursztyn</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Goasdoue,´ F.,
          <string-name>
            <surname>Manolescu</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Teaching an RDBMS about ontological constraints</article-title>
          .
          <source>PVLDB</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Tractable reasoning and eficient query answering in description logics: The DL-Lite family</article-title>
          .
          <source>J. Autom. Reasoning</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chortaras</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trivela</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamou</surname>
          </string-name>
          , G.:
          <article-title>Optimized query rewriting for OWL2QL</article-title>
          . In: CADE (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dolby</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fokoue</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schonberg</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Scalable highly expressive reasoner (SHER)</article-title>
          .
          <source>J. Web Semant</source>
          .
          <volume>7</volume>
          (
          <issue>4</issue>
          ) (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>El</given-names>
            <surname>Vaigh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.B.</given-names>
            ,
            <surname>Goasdoue</surname>
          </string-name>
          ,´ F.:
          <article-title>A well-founded graph-based summarization framework for description logics</article-title>
          . In: International workshop on Description Logics (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fokoue</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kershenbaum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schonberg</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>The summary abox: Cutting ontologies down to size</article-title>
          . In: ISWC (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kontchakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The combined approach to query answering in dl-lite</article-title>
          .
          <source>In: KR</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lanti</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rezk</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The NPD benchmark: Reality check for OBDA systems</article-title>
          . In: EDBT (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Leone</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manna</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terracina</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veltri</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Fast query answering over existential rules</article-title>
          .
          <source>ACM Trans. Comput. Log</source>
          .
          <volume>20</volume>
          (
          <issue>2</issue>
          ) (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seylan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>The combined approach to OBDA: taming role hierarchies using filters</article-title>
          .
          <source>In: ISWC</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Thomazo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Compact rewritings for existential rules</article-title>
          .
          <source>In: IJCAI</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>