<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>DL</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Accessing Answers to Conjunctive Queries with Ideal Time Guarantees</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nofar Carmeli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Inria, LIRMM, University of Montpellier</institution>
          ,
          <addr-line>CNRS</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>36</volume>
      <fpage>2</fpage>
      <lpage>4</lpage>
      <abstract>
        <p>When can we answer conjunctive queries with ideal time guarantees? We will start this talk by examining diferent kinds of query-answering tasks and the connections between them. These tasks include enumerating all answers, sampling answers without repetitions, and simulating a sorted array of the answers. From a data complexity point of view, the ideal time guarantees for these tasks are constant time per answer following a linear preprocessing (required to read the database input). Our goal is to avoid the polynomial preprocessing required to produce all answers. We will then have an examplebased discussion of the complexity landscape for these tasks. In particular, we will see how self-joins, constraints and unions can play a crucial role in determining the complexity and designing eficient algorithms.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;conjunctive queries</kwd>
        <kwd>fine-grained complexity</kwd>
        <kwd>constant-delay enumeration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>2(actor, year) ←</p>
      <p>
        Cast(movie, actor), Release(movie, year)
3(p1, p2, room, grade) ← Seating(p1, room), Seating(p2, room), Grade(p1, grade), Grade(p2, grade)
4(post, p1, p2) ← Posts(post, p1), Follows(p1, p2)
5(post, p1, p2) ← Posts(post, p0), Follows(p0, p1), Friends(p1, p2)
require that the free variables are connected in a non-standard way. The hardness side of this
dichotomy assumes the hardness of Boolean matrix multiplication and hyperclique detection in
hypergraphs. For more details about this dichotomy, see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Consider for example the first two
queries defined in Figure 1. These two queries are classified as hard according to Theorem 1 as
1 is cyclic and 2 is acyclic but not free-connex. If the variable year was removed from the
head of 2, this query would become acyclic free-connex and classified as easy.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Additional Tractable Cases</title>
      <p>Can we find CQs that can be answered with ideal guarantees even though they are not
freeconnex acyclic? As a first impression, we might think that the answer is ’no’ due to the negative
side of Theroem 1. However, we will see that such cases exist.</p>
      <p>
        As a first example, consider 3. This query is cyclic, but it can be answered with linear
preprocessing and constant delay due to the self-joins it contains. Theroem 1 only applies to CQs
in which every atom refers to a diferent relation. If this is not the case, we say that the query
contains self-joins, and we can sometimes use these to design more eficient algorithms [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The
complete classification of CQs with self-joins is not yet known.
      </p>
      <p>
        Next, let us consider 2 again. This query asks for a list of actors and the years in which they
were active. If we just consider the structure of the query and assume the input database can be
general, this query is classified as hard according to Theroem 1 as it is not free-connex. However,
considering the semantics of this query, we can assume that every movie has one release year.
This information can be modeled using the functional dependency Release : movie → year.
If we can assume that the input database conforms to this constraint, the problem becomes
easier, and it can now be solved with ideal guarantees [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The complete classification of CQs
with general functional dependencies is not yet known. We do have a classification for unary
functional dependencies, where one variable implies another, for CQs without self-joins. This
classification is done by extending the query according to the dependencies (in the case of this
example, adding the variable year in all places that contain the variable movie), and checking
whether the extended query is acyclic free-connex.
      </p>
      <p>
        Finally, assume the CQ is used as part of a larger query, and consider the union of CQs
4 ∪ 5. Even though 5 is not free-connex, the entire union can be answered with ideal
guarantees since we can intertwine the computation of 5 with that of 4. In fact, even a union
that contains only non-free-connex CQs can become tractable in a similar way. The complete
classification of unions of CQs is not yet known, but we have a classification that applies for
some subclasses [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Doing Even Better in the Tractable Cases</title>
      <p>Sometimes the final goal of the user is not just to produce all answers to a CQ, but they can be
interested in diferent query answering tasks such as computing the best answer or a median
answer according to some ranking function. In other cases, the user may want to compute some
statistics over the answers and be willing to compromise for the accuracy of these statistics
in favor of speed. In such a case, a random sample of the answers can be useful. Figure 2
shows a summary of some query-answering tasks and the relation of the implication of eficient
algorithms between tasks.</p>
      <p>
        The previous section discusses enumeration without order requirements. Clearly, ranked
enumeration and random-ordered enumeration are stricter requirements. However, ranked
enumeration allows eficiently computing the best answer or in general the top k answers for
any k. Random-ordered enumeration is in other words sampling without repetitions, and so this
is stronger than the task of sampling with repetitions. Direct access is the task of simulating an
array containing the query answers in a way that allows random access to answers in arbitrary
indices. It also requires outputting an error message if the index requested is too large, and
this can be used together with binary search to determine the number of answers (a task called
counting) with only a logarithmic number of access calls. Direct access can also be used to
achieve random-ordered enumeration by generating a random permutation of the indices and
accessing the corresponding answers on the fly [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. If the simulated array is required to be
ordered, we call this task ranked access, and it can be used to directly access the middle answer
for computing the median or to similarly compute any other quantile.
      </p>
      <p>
        Assume we are interested in ordering the query answers in a lexicographic order. That is,
we order the answers by the assignment to one variable, then break ties according to another
variable, and so on. The free-connex acyclic CQs, which we know have extremely eficient
enumeration algorithms, also have eficient algorithms for quantile computation, direct access
and ranked enumeration [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], with linear preprocessing and logarithmic time per outputted
answer. Regarding ranked enumeration, this depends on the specific lexicographic order [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Brault-Baron</surname>
          </string-name>
          , De la pertinence de l'
          <article-title>énumération: complexité en logiques propositionnelle et du premier ordre</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Université de Caen,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bagan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Durand</surname>
          </string-name>
          , E. Grandjean,
          <article-title>On acyclic conjunctive queries and constant delay enumeration</article-title>
          , in: International Workshop on Computer Science Logic, Springer,
          <year>2007</year>
          , pp.
          <fpage>208</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Berkholz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gerhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schweikardt</surname>
          </string-name>
          ,
          <article-title>Constant delay enumeration for conjunctive queries: a tutorial</article-title>
          ,
          <source>ACM SIGLOG News</source>
          <volume>7</volume>
          (
          <year>2020</year>
          )
          <fpage>4</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carmeli</surname>
          </string-name>
          , L. Segoufin,
          <article-title>Conjunctive queries with self-joins, towards a fine-grained complexity analysis</article-title>
          ,
          <source>in: PODS'23</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carmeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kröll</surname>
          </string-name>
          ,
          <article-title>Enumeration complexity of conjunctive queries with functional dependencies</article-title>
          ,
          <source>Theory of Computing Systems</source>
          <volume>64</volume>
          (
          <year>2020</year>
          )
          <fpage>828</fpage>
          -
          <lpage>860</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carmeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kröll</surname>
          </string-name>
          ,
          <article-title>On the enumeration complexity of unions of conjunctive queries</article-title>
          ,
          <source>ACM Transactions on Database Systems (TODS) 46</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bringmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Carmeli</surname>
          </string-name>
          ,
          <article-title>Unbalanced triangle detection and enumeration hardness for unions of conjunctive queries</article-title>
          ,
          <source>arXiv preprint arXiv:2210.11996</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carmeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zeevi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berkholz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Conte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kimelfeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schweikardt</surname>
          </string-name>
          ,
          <article-title>Answering (unions of) conjunctive queries using random access and random-order enumeration</article-title>
          ,
          <source>ACM Transactions on Database Systems (TODS) 47</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carmeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tziavelis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gatterbauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kimelfeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedewald</surname>
          </string-name>
          ,
          <article-title>Tractable orders for direct access to ranked answers of conjunctive queries</article-title>
          ,
          <source>ACM Transactions on Database Systems</source>
          <volume>48</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tziavelis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gatterbauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedewald</surname>
          </string-name>
          ,
          <article-title>Beyond equi-joins: Ranking, enumeration and factorization</article-title>
          ,
          <source>in: Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases</source>
          , volume
          <volume>14</volume>
          , NIH Public Access,
          <year>2021</year>
          , p.
          <fpage>2599</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>