<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Approximating Incomplete Queries over Partially Complete Databases (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ognjen Savkovic</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evgeny Kharlamov</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Werner Nutt</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre Senellart</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ecole Normale Superieure</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Free University of Bozen-Bolzano</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Oxford</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Motivation. Building reliable systems over partially complete data poses significant challenges because queries they send to the available data retrieve answers that may signi cantly di er from the real answers. This may lead to a wrong understanding of the data and the events and processes it describes. This problem is especially critical for analytical systems that aggregate retrieved data since missing answers may signi cantly change results of analytical computations, e.g., computation of minimal or average values is sensitive to missing values [2,7]. One way to ensure reliability of (analytical) systems over partially complete data is to guarantee that whatever data they touch is complete w.r.t. to the real data. A possible way to model partial data completeness is with tuple generating dependencies (TGDs) [1] that specify what parts of a relation are complete [8{ 10]: where Ra is the available (or complete) part of the ideal Ri (or real) part of R, t is a term of size ary(R), i is a conjunction of atoms over i-annotated predicates. Let be a nite set of completeness statements as in Equation (1). With Da (resp. D ) we denote a database instance of a-annotated (resp. i-annotated) i predicates. The semantics of is de ned using pairs (Di; Da) where Da Di, i.e., where the available (incomplete) data is a subset of the real data, as follows: (Di; Da) j= i Da [ Di j= . The setting can be extended by considering constraints on the ideal part of relations and modelled with TGDs: 9X i(X; Y )</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Ra(t)</p>
      <p>Ri(t); i;
(1)
'i(Y ; Z);
(2)
where and ' are conjunctive queries over i-annotated predicates. If is a nite
set of constraints as in Equation (2), then the semantics of can be extended to
account for constraints , that is, (Di; Da) j= ( ; ), by considering only those
Di's that satisfy .</p>
      <p>Given and , a query Q is ( ; )-complete if for any pair (Di; Da) such that
(Di; Da) j= ( ; ) we have Qi(Di) = Qa(Da), where Qi (resp. Qa) is obtained from
Q by annotating each predicate with i (resp. a). While query completeness is a
desirable property, in practice many queries may be incomplete. In these cases we
would like to be able to approximate the original query with alternative queries
that are as close as possible to the original one, but whose answers can be
veri ed to be complete. A natural kind of approximations are those from below,
called query specialisations , and above, called query generalisation. Formally,
given queries Q and Q0 and a setting ( ; ), Q0 is a ( ; )-specialisation of Q if
Q0i v Qi, that is, if Q0 is contained in Q over any ideal database that satis es the
constraints . We are interested in complete specialisations, and among them in
maximal ones. Formally, a query Q0 is a ( ; )-maximal complete specialisation
(MCS) of a query Q, or just MCS when ( ; ) is clear, if (i) Q0i v Qi, (ii) Q0
is ( ; )-complete, and (iii) Q0 is maximal in the sense that there is no other
( ; )-complete Q00 such that Q0i @ Q00i v Qi. Generalisations and maximal
complete generalisations can be de ned analogously.</p>
      <p>
        The problem of completeness has been studied in [8{10] for settings with
a weaker form of constraints. In particular, it is known that the complexity of
checking query completeness ranges from NP for the setting without constraints
to 2P for the setting with nite domains. Moreover, approximation has not been
studied in the context of partially complete data. In this work we investigate the
problem of completeness for conjunctive queries for expressive constraints and
the problem of approximation. We now give an overview of our results.
Query Completeness. We prove characterisations of query completeness in
terms of the well-studied problem of query containment over TGDs. That is, Q
is ( ; )-complete i [ j= (Qi Qa). Interestingly, also the converse holds:
the containment under TGDs can be represented as completeness. Query
containment under TGDs is known to be undecidable, thus checking completeness
is undecidable. Since the undecidability comes from the constraints, we turn our
attention to settings with practically motivated types of constraints: (cyclic)
foreign keys, acyclic TGDs [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] sticky TGDs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], guarded TGDs [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For all these
constraints we show that the combined complexity of completeness is high, at
least PSpace-complete.
      </p>
      <p>Query Specialisation. Intuitively, one can specialise a conjunctive query by
instantiating the query variables or by joining new atoms. One can do it by
following the TGDs of and backwards, and thus instantiate and add atoms
as little as needed.</p>
      <p>
        More formally, one can nd specialisations by a procedure that is similar
to the resolution proof-scheme [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or backward chaining [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. More precisely, for
each query atom one has to nd in [ a TGD that can transfer the atom or
an instantiation. In this way, the atom may need to be instantiated according
to the TGD, but it may also mean that one needs to add new atoms from the
body of the TGD. For newly introduced atoms now again one has to nd their
TGD, etc. The di erence with the backward chase is that the query that is
specialised is the database and the query at the same time. Thus, with each
backward application of the TGD one may instantiate the atom but also change
the database. This produces a (potentially in nite) set of (potentially in nite)
queries that includes all MCSs but also non-maximal specialisations. However,
it contain all MCSs. This is because some combination of rules may lead to more
general specialisations than others. We also observe that a query may have more
than one MCS both among in nite and nite conjunctive queries. Checking if a
query Q0 is a ( ; )-specialisation of a query Q is undecidable for unrestricted
and this corresponds to the case when the procedure above does not terminate.
Weak acyclicity of inverted TGDs from (that is, where the direction of the
arrow is reversed) yields termination and for such settings each conjunctive query
has a nite number of nite size MCSs. It remains an open question if for sticky
or guarded inverted we can have a terminating procedure.
      </p>
      <p>We are still working on the problem of generalisation for incomplete queries.
Acknowledgements This was was partially supported by the EPSRC projects
MaSI3, DBOnto, and ED3, and by the projects MAGIC and PARCIS, funded
by the Free University of Bozen-Bolzano.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Serge</surname>
            <given-names>Abiteboul</given-names>
          </string-name>
          , Richard Hull, and
          <string-name>
            <given-names>Victor</given-names>
            <surname>Vianu</surname>
          </string-name>
          .
          <source>Foundations of databases. Addison-Wesley</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Marcelo</given-names>
            <surname>Arenas</surname>
          </string-name>
          , Leopoldo Bertossi, Jan Chomicki, Xin He,
          <string-name>
            <surname>Vijay Raghavan</surname>
            , and
            <given-names>Jeremy</given-names>
          </string-name>
          <string-name>
            <surname>Spinrad</surname>
          </string-name>
          .
          <article-title>Scalar aggregation in inconsistent databases</article-title>
          .
          <source>Theor. Comput. Sci.</source>
          ,
          <volume>296</volume>
          (
          <issue>3</issue>
          ),
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jean-Francois</surname>
            <given-names>Baget</given-names>
          </string-name>
          , Michel Leclere, and
          <string-name>
            <surname>Marie-Laure Mugnier</surname>
          </string-name>
          .
          <article-title>Walking the decidability line for rules with existential variables</article-title>
          .
          <source>KR'10</source>
          , pages
          <fpage>466</fpage>
          {
          <fpage>476</fpage>
          . AAAI Press,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Cal</surname>
          </string-name>
          , Georg Gottlob, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Lukasiewicz</surname>
          </string-name>
          .
          <article-title>A general datalog-based framework for tractable query answering over ontologies</article-title>
          .
          <source>J. Web Sem</source>
          .,
          <volume>14</volume>
          :
          <fpage>57</fpage>
          {
          <fpage>83</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Cal</surname>
          </string-name>
          , Georg Gottlob, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Pieris</surname>
          </string-name>
          .
          <article-title>Towards more expressive ontology languages: The query answering problem</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>193</volume>
          :
          <fpage>87</fpage>
          {
          <fpage>128</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Ronald</given-names>
            <surname>Fagin</surname>
          </string-name>
          , Phokion G. Kolaitis,
          <string-name>
            <given-names>Renee J</given-names>
            .
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Lucian</given-names>
            <surname>Popa</surname>
          </string-name>
          .
          <article-title>Data exchange: Semantics and query answering</article-title>
          .
          <source>In Proc. ICDT</source>
          , pages
          <volume>207</volume>
          {
          <fpage>224</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Guagliardo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Leonid</given-names>
            <surname>Libkin</surname>
          </string-name>
          .
          <article-title>Making sql queries correct on incomplete databases: A feasibility study</article-title>
          .
          <source>PODS '16</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>A.Y. Levy.</surname>
          </string-name>
          <article-title>Obtaining complete answers from incomplete databases</article-title>
          .
          <source>In Proc. VLDB</source>
          , pages
          <volume>402</volume>
          {
          <fpage>412</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Werner</given-names>
            <surname>Nutt</surname>
          </string-name>
          , Sergey Paramonov, and
          <string-name>
            <given-names>Ognjen</given-names>
            <surname>Savkovic</surname>
          </string-name>
          .
          <article-title>Implementing query completeness reasoning</article-title>
          .
          <source>In CIKM</source>
          , pages
          <volume>733</volume>
          {
          <fpage>742</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Simon</given-names>
            <surname>Razniewski</surname>
          </string-name>
          and
          <string-name>
            <given-names>Werner</given-names>
            <surname>Nutt</surname>
          </string-name>
          .
          <article-title>Completeness of queries over incomplete databases</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>4</volume>
          (
          <issue>11</issue>
          ):
          <volume>749</volume>
          {
          <fpage>760</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>