<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>First-Order Rewritability of Frontier-Guarded Ontology-Mediated Queries⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pablo Barcelo´</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gerald Berger</string-name>
          <email>gberger@dbai.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carsten Lutz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Pieris</string-name>
          <email>apieris@inf.ed.ac.uk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bremen</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Chile</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Edinburgh</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Vienna University of Technology</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontology-based data access (OBDA) is a successful application of knowledge representation and reasoning technologies in information management systems. One premier goal is to facilitate access to data that is heterogeneous and incomplete. This is achieved via an ontology that enriches the user query, typically a union of conjunctive queries, with domain knowledge. It turned out that the ontology and the user query can be seen as two components of one composite query, called ontology-mediated query (OMQ). The problem of answering OMQs is thus central to OBDA. There is a consensus that the required level of scalability in OMQ answering can be achieved by using standard database management systems. To this end, a standard approach used nowadays is query rewriting: the ontology O and the database query q are combined into a new query qO, the so-called rewriting, which gives the same answer as the OMQ consisting of O and q over all input databases. It is of course essential that the rewriting qO is expressed in a language that can be handled by standard database systems. The typical language that is considered is the class of first-order (FO) queries. In this work, we focus on two central OMQ languages based on guarded and frontierguarded tuple-generating dependencies (TGDs), and we study the problem whether an OMQ is FO-rewritable, i.e, it can be equivalently expressed as a first-order query. Recall that a guarded (resp., frontier-guarded) TGD is a sentence of the form 8x; y (φ(x; y) ! 9z (x; z)), where φ and are conjunctions of relational atoms, and φ has an atom that contains all the variables (x [ y) (resp., x) [1, 8]. Our goal is to develop specially tailored techniques that allow us to understand the above non-trivial problem, and also to pinpoint its computational complexity. To this end, as we discuss below, we follow two different approaches. Our results can be summarized as follows: I We first focus on the simpler OMQ language based on guarded TGDs and atomic queries, and, in Section 2, we provide a characterization of FO-rewritability that forms the basis for applying tree automata techniques. I We then exploit, in Section 3, standard two-way alternating parity tree automata. In particular, we reduce our problem to the problem of checking the finiteness of the language of an automaton. The reduction relies on a refined version of the characterization of FO-rewritability established in Section 2. This provides a transparent solution to our problem based on standard tools, but it does not lead to an optimal result.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        ⋆ This is a short version of [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
      </p>
      <p>
        I Towards an optimal result, we use, in Section 4, a more sophisticated automata
model, known as cost automata. In particular, we reduce our problem to the problem
of checking the boundedness of a cost automaton. This allows us to show that
FOrewritability for OMQs based on guarded TGDs and atomic queries is in 2EXPTIME,
and in EXPTIME for predicates of bounded arity. The complexity analysis relies on an
intricate result on the boundedness problem for a certain class of cost automata [
        <xref ref-type="bibr" rid="ref5 ref9">5, 9</xref>
        ].
      </p>
      <p>I Finally, in Section 5, by using the results of Section 4, we provide a complete
picture for the complexity of our problem, i.e., deciding whether an OMQ based on
(frontier-)guarded TGDs and arbitrary (unions of) conjunctive queries is FO-rewritable.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Semantic Characterization</title>
      <p>We first need to recall the basics on ontology-mediated querying. An ontology-mediated
query (OMQ) is a triple Q = (S; O; q), where S is a (non-empty) schema (the data
schema), O is a set of TGDs (the ontology), and q is a UCQ over S [ sig(O), where
sig(O) is the set of predicates occurring in O. The semantics of Q is given in terms of
certain answers. More precisely, the evaluation of Q over an S-database D, that is, a
finite set of atoms of the form R(t), where R 2 S and t is a tuple of constants, denoted
Q(D), is defined as the certain answers to q w.r.t. D and O. We write (C; Q) for the
class of OMQs (S; O; q), where O falls in the class of TGDs C, and q in the query
language Q. For example, (G; AQ0) is the class of OMQs where the ontology falls in
the class of guarded TGDs (G), and the query falls in the class of propositional atomic
queries (AQ0). Analogously, we define the OMQ languages based on frontier-guarded
TGDs (FG), conjunctive queries (CQ), and unions thereof (UCQ).</p>
      <p>
        We proceed to give a characterization of FO-rewritability of OMQs from (G; AQ0)
in terms of the existence of certain tree-like databases. Our characterization is related
to, but different from characterizations used for OMQs based on DLs such as E L and
E LI [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. In what follows, we write wd(S) for the width of S, i.e., the maximum arity
among all predicates of S, and TS for the integer maxf0; wd(S) 1g. Given a database
D and an OMQ Q 2 (G; AQ0), we write D j= Q for the fact that Q(D) is non-empty.
Theorem 1. Consider an OMQ Q 2 (G; AQ0) with data schema S. Q is FO-rewritable
iff there is a k 0 such that, for every S-database D of tree-width at most TS, if
D j= Q, then there is a D′ D with at most k facts such that D′ j= Q.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Alternating Tree Automata Approach</title>
      <p>
        In this section, we exploit the well-known algorithmic tool of two-way alternating
parity tree automata (2ATA) over finite trees of bounded degree (see, e.g., [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]) and prove
that the problem of deciding whether a query from (G; AQ0) is FO-rewritable can be
solved in triple exponential time. Although this result is not optimal, our construction
provides a transparent solution based on standard tools.
      </p>
      <p>
        The idea behind our solution is, given a query Q 2 (G; AQ0), to devise a 2ATA
BQ such that Q is FO-rewritable iff the language accepted by BQ is finite. This is a
standard idea with roots in the study of the boundedness problem for monadic Datalog
(see e.g., [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]). In particular, we show that for a query Q 2 (G; AQ0) with data schema
S, there is a 2ATA BQ on trees of degree at most 2wd(S) such that Q is FO-rewritable
iff the language of BQ is finite. The state set of BQ is of size double exponential in
wd(S), and of size exponential in jS [ sig(O)j. Moreover, BQ can be constructed in
time double exponential in the size of Q. This result relies on a refined version of the
semantic characterization provided by Theorem 1. The key idea is to construct a 2ATA
BQ whose language corresponds to suitable encodings of databases D of bounded
treewidth that “minimally” entail Q, i.e., D j= Q, but if we remove any atom from D, then
Q is no longer entailed. Having the above result in place, we can then exploit standard
techniques on 2ATAs [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ] in order to establish the following result:
Theorem 2. The problem of deciding whether a query Q 2 (G; AQ0) is FO-rewritable
is in 3EXPTIME, and in 2EXPTIME for predicates of bounded arity.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Cost Automata Approach</title>
      <p>We proceed to study FO-rewritability for (G; AQ0) using the more sophisticated model
of cost automata. Cost automata extend traditional automata (on words, trees, etc.) by
providing counters that can be manipulated at each transition. Instead of assigning a
Boolean value to each input structure (indicating whether the input is accepted or not),
these automata assign a value from N [ f1g to each input. Here, we focus on cost
automata that work on finite trees of unbounded degree, and allow for two-way
movements. This allows us to improve the complexity obtained in Theorem 2:
Theorem 3. The problem of deciding whether a query Q 2 (G; AQ0) is FO-rewritable
is in 2EXPTIME, and in EXPTIME for predicates of bounded arity.</p>
      <p>
        As above, we develop a semantic characterization, by refining the semantic
characterization of Theorem 1, that relies on a minimality criterion for trees accepted by cost
automata. The extra features provided by cost automata allow us to deal with such a
minimality criterion in a more efficient way than standard 2ATAs. While our
application of cost automata is transparent, the complexity analysis relies on an intricate result
on the boundedness problem for a certain class of cost automata from [
        <xref ref-type="bibr" rid="ref5 ref9">5, 9</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>The Complete Picture</title>
      <p>We show the following result:
Theorem 4. The problem of deciding whether a query Q 2 (C; Q) is FO-rewritable is
– 2EXPTIME-complete, for C = FG and Q 2 fUCQ; CQ; AQ0g, even for predicates
of arity at most two.
– 2EXPTIME-complete, for C = G and Q 2 fUCQ; CQg, even for predicates of arity
at most two.
– 2EXPTIME-complete, and EXPTIME-complete for predicates of bounded arity (even
if the predicates have arity at most two), for C = G and Q = AQ0.</p>
      <p>
        Lower bounds. The 2EXPTIME-hardness in the first and the second items is inherited
from [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which focuses on OMQs based on E LI and CQs. For the 2EXPTIME-hardness
in the third item, we exploit the fact that containment for OMQs from (G; AQ0) is
2EXPTIME-hard, even if the right-hand side query is FO-rewritable; this is implicit
in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The EXPTIME-hardness is inherited from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], where it is shown that deciding
FO-rewritability for OMQs based on E L and atomic queries is EXPTIME-hard.
Upper bounds. The fact that for predicates of bounded arity FO-rewritability for OMQs
from (G; AQ0) is in EXPTIME is obtained from Theorem 3. It remains to show that the
problem for (FG; UCQ) is in 2EXPTIME. We first reduce FO-rewritability for (FG; UCQ)
to FO-rewritability for (FG; AQ0) via an easy polynomial time reduction, and then
show that the latter is in 2EXPTIME. This is achieved by reducing the problem to
FO-rewritability for (G; AQ0), and then apply Theorem 3. This reduction relies on a
technique known as treeification, and is inspired by a construction from [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Acknowledgements. Barcel o´ is funded by the Millennium Institute for Foundational
Research on Data and Fondecyt grant 1170109. Berger is funded by the Austrian
Science Fund (FWF), project number W1255-N23 and DOC fellowship of the Austrian
Academy of Sciences. Lutz is funded by the ERC grant 647289 “CODA”. Pieris is
funded by the EPSRC programme grant EP/M025268/ “VADA”.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Jean-Franc¸ois Baget, Michel Lecle`re,
          <string-name>
            <surname>Marie-Laure Mugnier</surname>
            , and
            <given-names>Eric</given-names>
          </string-name>
          <string-name>
            <surname>Salvat</surname>
          </string-name>
          .
          <article-title>On rules with existential variables: Walking the decidability line</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>175</volume>
          (
          <fpage>9</fpage>
          -10):
          <fpage>1620</fpage>
          -
          <lpage>1654</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Vince</given-names>
            <surname>Ba</surname>
          </string-name>
          <article-title>´ra´ny, Balder ten Cate, and Luc Segoufin. Guarded negation</article-title>
          .
          <source>J. ACM</source>
          ,
          <volume>62</volume>
          (
          <issue>3</issue>
          ):
          <volume>22</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          :
          <fpage>26</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Pablo Barcelo´,
          <string-name>
            <surname>Gerald</surname>
            <given-names>Berger</given-names>
          </string-name>
          , Carsten Lutz, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Pieris</surname>
          </string-name>
          .
          <article-title>First-order rewritability of frontier-guarded ontology-mediated queries</article-title>
          .
          <source>In IJCAI</source>
          ,
          <year>2018</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Pablo Barcelo´,
          <string-name>
            <given-names>Miguel</given-names>
            <surname>Romero</surname>
          </string-name>
          , and
          <string-name>
            <surname>Moshe</surname>
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Vardi</surname>
          </string-name>
          .
          <article-title>Does query evaluation tractability help query containment? In PODS</article-title>
          , pages
          <fpage>188</fpage>
          -
          <lpage>199</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Michael</given-names>
            <surname>Benedikt</surname>
          </string-name>
          , Balder ten Cate, Thomas Colcombet, and Michael Vanden Boom.
          <article-title>The complexity of boundedness for guarded logics</article-title>
          .
          <source>In LICS</source>
          , pages
          <fpage>293</fpage>
          -
          <lpage>304</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Meghyn</given-names>
            <surname>Bienvenu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Hansen</surname>
          </string-name>
          , Carsten Lutz, and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>First order-rewritability and containment of conjunctive queries in horn description logics</article-title>
          .
          <source>In IJCAI</source>
          , pages
          <fpage>965</fpage>
          -
          <lpage>971</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Meghyn</given-names>
            <surname>Bienvenu</surname>
          </string-name>
          , Carsten Lutz, and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Wolter</surname>
          </string-name>
          .
          <article-title>First-order rewritability of atomic queries in horn description logics</article-title>
          .
          <source>In IJCAI</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Cal</surname>
          </string-name>
          <article-title>`ı, Georg Gottlob, and Michael Kifer. Taming the infinite chase: Query answering under expressive relational constraints</article-title>
          .
          <source>J. Artif. Intell. Res.</source>
          ,
          <volume>48</volume>
          :
          <fpage>115</fpage>
          -
          <lpage>174</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Colcombet</surname>
          </string-name>
          and Nathanae¨l Fijalkow.
          <article-title>The bridge between regular cost functions and omega-regular languages</article-title>
          .
          <source>In ICALP</source>
          , pages
          <volume>126</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>126</lpage>
          :
          <fpage>13</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Stavros</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Cosmadakis</surname>
          </string-name>
          , Haim Gaifman, Paris C. Kanellakis, and
          <string-name>
            <surname>Moshe</surname>
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Vardi</surname>
          </string-name>
          .
          <article-title>Decidable optimization problems for database logic programs (preliminary report)</article-title>
          .
          <source>In STOC</source>
          , pages
          <fpage>477</fpage>
          -
          <lpage>490</lpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Moshe</surname>
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Vardi</surname>
          </string-name>
          .
          <article-title>Automata theory for database theoreticans</article-title>
          .
          <source>In Theoretical Studies in Computer Science</source>
          , pages
          <fpage>153</fpage>
          -
          <lpage>180</lpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Moshe</surname>
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Vardi</surname>
          </string-name>
          .
          <article-title>Reasoning about the past with two-way automata</article-title>
          .
          <source>In ICALP</source>
          , pages
          <fpage>628</fpage>
          -
          <lpage>641</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>