<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Answering Aggregate Queries with Ordered Direct Access</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Idan Elda r</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>NofarCarmel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benny Kimelfeld</string-name>
          <email>bennyk@technion.ac.il</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop Proceedings</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Santiago, Chile</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Inria, LIRMM, Univ Montpellier</institution>
          ,
          <addr-line>CNRS</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Technion - Israel Institute of Technology</institution>
          ,
          <addr-line>Haifa 3200023</addr-line>
          ,
          <country country="IL">Israel</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The paper presents recent findings on the fine-grained complexity of conjunctive queries with aggregation. For common aggregate functions (e.g., min, max, sum), such a query can be phrased as an ordinary conjunctive query over a database annotated with a suitable commutative semiring. We investigate the ability to evaluate queries by constructing in log-linear time a data structure that provides logarithmictime direct access to the answers, ordered by a lexicographic order of choice. Importantly, these complexity guarantees hold even if the result of the query is asymptotically larger than log-linear in the size of the input.</p>
      </abstract>
      <kwd-group>
        <kwd>Teams(player</kwd>
        <kwd>country)</kwd>
        <kwd>Sponsors(org</kwd>
        <kwd>country)</kwd>
        <kwd>and Goals(game</kwd>
        <kwd>player</kwd>
        <kwd>time)</kwd>
        <kwd>The following</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CQ finds times where sponsors got exposure due to goals of supported teams:
 1(, , , ) ∶−</p>
      <p>Teams(, ), Sponsors(, ), Goals(, , )
Suppose also that we would like the answers to be ordered lexicographically by their order in the
head: first by country, then by organization, and so on. Note tha,t ,  and arefree variables
and is anexistential variable. Carmeli et al4.][studied the ability to evaluate such ordered
queries with direct access. In the case of1, they show that there is an eficient direct-access
evaluation, since the query ifsree-connex and there is nodisruptive trio. We explain these terms
next. (For more precise definition, we refer the reader to Carmeli et a4]l.)[</p>
      <p>In general, a CQ has the for(m)⃗∶−  1(,⃗)⃗, … ,  ℓ(,⃗)⃗ where ⃗ and ⃗ are disjoint sequences
of variables, and eac h (,⃗)⃗ is anatomic query. A CQ is free-connex if it is acyclic and remains
acyclic even if the head is viewed as an atom. Also recall thdaitsrauptive trio of a CQ()⃗ is a
set of three distinct free variable1s,  2, and 3 such that  1 and 2 neighbor 3 but not each
other, and 3 appears after both  1 and 2 in  ⃗. Two variables are considered neighbors if they
appear together in some query atom.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Ordered CQs over Annotated Databases</title>
      <p>In this work, we continue with the line of work of Carmeli et a4]l.a[nd investigate the ability
to support query evaluation via direct access for queries over annotated databases. To illustrate,
suppose that we would like to count the goals per sponsorship and player. We can phrase this
query as follows.</p>
      <p>2(, Count(), , ) ∶−</p>
      <p>Teams(, ), Sponsors(, ), Goals(, , )
Here, we order the answers first b y, then by Count(), and then by and . Semantically, ,  ,
and are treated as thegrouping variables (rather than free variables), where each combination
of values defines a group of tuples over(, , , , ) andCount() counts the tuples in the group.</p>
      <p>
        CQs with some common aggregate functions can be translated into ordinary CQs over
annotated databases5[
        <xref ref-type="bibr" rid="ref6">, 6</xref>
        ]. We adopt the well-known framework porfovenance semiring [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
phrase the query as an ordinary CQ with the annotation carrying the aggregate value (number
of goals in our example). To design direct-access evaluation, we found it more elegant, general,
and insightful to support annotation rather than SQL-like aggregate functions. For illustration,
we can phrase 2 as the following query.
      </p>
      <p>3(, ⋆, , ) ∶−</p>
      <p>Teams(, ), Sponsors(, ), Goals(, , )
Ignoring the⋆ symbol, this is an ordinary CQ applied when the database is annotated by
elements from the domain of a commutative semiring(, ⊕, ⊗, 0̄, 1̄). In a nutshell, the idea is
that each tuple is annotated with an element of the semiring, the annotation of each tuple in
the group is the product of the participating tuple annotations, and the annotation of the whole
group is the sum of all tuple annotations in the group’s tuples. We can use diferent semirings
and annotations to compute diferent aggregate functions like sum, min, and max. In the case
of  3, for instance, the semiring is(ℤ, +, ⋅, 0, 1) and each tuple is annotated simply with the
number 1. Moreover, we order by, then by the annotation, and then b yand  . Notationally,
we specify the annotation position by the symbo⋆l and refer to the query as a C⋆Q,which
is defined similarly to a CQ but with⋆ added to the head. More precisely, we write a C⋆ Qas
(,⃗⋆, )⃗ to denote that⋆ is between the (possibly empty) sequence s⃗ and ⃗ of head variables.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Complexity Results</title>
      <p>We now describe our results on direct access for⋆Cs.QAs conventionally done in fine-grained
analysis of queries, we use the RAM model8][. We assume logarithmic space for representing
each element of the semiring, and constant time for the operatio⊕nsand ⊗. We also assume
that the semiring domain is ordered, and comparison is in constant time.</p>
      <p>We begin with the case where the ordedroes not involve the annotation. Equivalently, we
discuss CQ⋆s of the form(,⃗⋆) . The following theorem states that, in this case, the results of
Carmeli et al.4[] continue to hold for annotated databases. The theorem refers
toHtYhPeERCLIQUE and SparseBMM hypotheses, which are common assumptions of lower bounds in
ifne-grained complexity. (We refer the reader to Carmeli et a4l]. f[or details.)
Theorem 1. Let (, ⊕, ⊗, 0̄, 1̄) be a commutative semiring and (,⃗⋆)
a CQ⋆.
1. If  is free-connex and with no disruptive trio, then direct access for  is in ⟨loglinear, log⟩
on  -databases.
2. Otherwise, if  is also self-join-free, then direct access for  is not in ⟨loglinear, log⟩, assuming
the HYPERCLIQUE hypothesis (in case  is cyclic) and the SparseBMM hypothesis (in case
 is acyclic).</p>
      <p>
        The following theorem states a lower bound in an extremely simple query (Cartesian product)
for the semirings used to compute the aggregates count, sum, min, and max, assuming the
3SUM conjecture [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
      </p>
      <p>Theorem 2. Let (, ⊕, ⊗, 0̄, 1̄) be one of the counting, numerical, max tropical, or min tropical
semirings. Direct access for the CQ⋆ (⋆, ,  ) ∶− (), ( ) is not in ⟨loglinear, log⟩ over 
databases, assuming the 3SUM conjecture.</p>
      <p>Theorem2 implies that, to obtain evaluation algorithms⟨ilnoglinear, log⟩ while incorporating
the annotation in the order, we need to restrict the class o⋆fsCoQr make assumptions on the
annotated database. In the following theorem, we restrict the structure of th⋆e. CWQe also
make the mild assumption that the semiring is⊗-monotone, that is, the function  is monotone
for every ∈  , where  ∶  →  is defined by   ( ) =  ⊗  .</p>
      <p>Theorem 3. Let (, ⊕, ⊗, 0̄, 1̄) be a ⊗-monotone commutative semiring, and (,⃗⋆, )⃗ a
freeconnex CQ⋆ with no disruptive trio. If every atom of  contains either all variables of ⃗ or none of
them, then direct access for  is in ⟨loglinear, log⟩.</p>
      <p>As an example, the CQ⋆ ( , , ⋆,  , ) ∶− ( , ), (,  , ),  ( , )
databases annotated with the numerical semiring.
is in ⟨loglinear, log⟩ over</p>
      <p>The final result that we explain in this section holds under two assumptions that we explain
next. For the first assumption, consider the aggregate query</p>
      <p>( Max( ), ,  ) ∶− (,  ), ( )
When translating into a CQ⋆, we obtain(⋆, ,  ) ∶− (), ( ) overℚ-databases annotated by
the max tropical semiring(ℚ∪{−∞}, max, +, −∞, 0). Hence, according to Theorem2, we translate
the problem into an intractable one. Nevertheless, direct access fboyr(⋆, ,  ) is, in fact, in
⟨loglinear, log⟩. This discrepancy stems from the fact that the hardness in Theore2mrelies on
the annotation of tuples from bot h and . Yet, in our translation, al-lfacts are annotated b1y.
The resulting -database is such that every fact is annotated b1̄y(the multiplicative identity),
with the exception of one relation. We call such a-database -annotated.</p>
      <p>The second assumption is that the operatio⊕n is idempotent, that is, for ever y in the domain
 we have that ⊕  =  . This is the case when we start with the aggregate functions min and
max, for example.</p>
      <p>The following theorem states that, under the above assumptions, we can efectively classify
every CQ⋆ without self-joins into two categories:</p>
      <sec id="sec-3-1">
        <title>1. CQ⋆s with direct access in⟨loglinear, log⟩;</title>
        <p>2. CQ⋆s where direct access is not in⟨loglinear, log⟩ under standard complexity assumptions
and under the assumption that the domai n contains the domain of natural numbers.
(In fact, it sufices that we can generate an infinite increasing sequence of elements in .)
This is done by converting the C Q⋆  into an ordinary C Q0, so that direct access for the two
is computationally equivalent. We can then use previous resu4l]tsto[ determine the feasibility
of eficient direct access.</p>
        <p>Theorem 4. Let (, ⊕, ⊗, 0̄, 1̄) be a ⊕-idempotent commutative semiring. There exists a
polynomialtime algorithm that takes as input a free-connex CQ⋆  without self-joins and a relation symbol 
of  , and produces a full acyclic CQ  0 without self-joins, so that the following hold.
1. If  0 has no disruptive trio, then direct access for  over  -annotated  -databases is in
⟨loglinear, log⟩.
2. Otherwise, in the case where ℕ ⊆  , direct access for  over  -annotated  -databases is not
in ⟨loglinear, log⟩, assuming the SparseBMM hypothesis.</p>
      </sec>
      <sec id="sec-3-2">
        <title>As an example, consider the following C⋆:Q</title>
        <p>(⋆,  1,  2,  3) ∶− ( 1,  3,  3), ( 2,  3),  ( 3,  1),  ( 1,  2)
The translation of theore4mreduces direct access for on  -annotated -databases to direct
access for the following CQ:</p>
        <p>0( ,  1,  2,  3) ∶−  ′( 1,  3,  ),  ′( 2,  3,  ),  ′( 3,  )
Since  0 does not have a disruptive trio, direct access for0 is in ⟨loglinear, log⟩, and therefore,
so is direct access fo r on  -annotated -databases.</p>
        <p>See the full version of this paper11[] for more details about the results described here.
The work of Idan Eldar and Benny Kimelfeld was supported by the Israel Science
Foundation (ISF), Grant 768/19, and the German Research Foundation (DFG) Project 412400621 (DIP
program).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Brault-Baron</surname>
          </string-name>
          , De la pertinence de l'
          <article-title>énumération : complexité en logiques propositionnelle et du premier ordre</article-title>
          , Theses, Université de Caen,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carmeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zeevi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berkholz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Conte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kimelfeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schweikardt</surname>
          </string-name>
          ,
          <article-title>Answering (unions of) conjunctive queries using random access and random-order enumeration</article-title>
          ,
          <source>ACM Trans. Database Syst</source>
          .
          <volume>47</volume>
          (
          <year>2022</year>
          ) 9:
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          :
          <fpage>49</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bringmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Carmeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mengel</surname>
          </string-name>
          ,
          <article-title>Tight fine-grained bounds for direct access on join queries</article-title>
          , in: PODS, ACM,
          <year>2022</year>
          , pp.
          <fpage>427</fpage>
          -
          <lpage>436</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carmeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tziavelis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gatterbauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kimelfeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedewald</surname>
          </string-name>
          ,
          <article-title>Tractable orders for direct access to ranked answers of conjunctive queries</article-title>
          , in: PODS, ACM,
          <year>2021</year>
          , pp.
          <fpage>325</fpage>
          -
          <lpage>341</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ré</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Suciu</surname>
          </string-name>
          ,
          <article-title>The trichotomy of HAVING queries on a probabilistic database</article-title>
          ,
          <source>VLDB J</source>
          .
          <volume>18</volume>
          (
          <year>2009</year>
          )
          <fpage>1091</fpage>
          -
          <lpage>1116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Khamis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Curtin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Moseley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. Q.</given-names>
            <surname>Ngo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Olteanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schleich</surname>
          </string-name>
          ,
          <article-title>Functional aggregate queries with additive inequalities</article-title>
          ,
          <source>ACM Trans. Database Syst</source>
          .
          <volume>45</volume>
          (
          <year>2020</year>
          )
          <volume>17</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          :
          <fpage>41</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Karvounarakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tannen</surname>
          </string-name>
          ,
          <article-title>Provenance semirings</article-title>
          , in: PODS, PODS '07,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2007</year>
          , p.
          <fpage>31</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Étienne</given-names>
            <surname>Grandjean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jachiet</surname>
          </string-name>
          ,
          <article-title>Which arithmetic operations can be performed in constant time in the ram model with addition</article-title>
          ?,
          <year>2023</year>
          .arXiv:
          <volume>2206</volume>
          .
          <fpage>13851</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gajentaan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Overmars</surname>
          </string-name>
          ,
          <article-title>On a class of o(n2) problems in computational geometry</article-title>
          ,
          <source>Computational Geometry</source>
          <volume>5</volume>
          (
          <year>1995</year>
          )
          <fpage>165</fpage>
          -
          <lpage>185</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Patrascu</surname>
          </string-name>
          ,
          <article-title>Towards polynomial lower bounds for dynamic problems</article-title>
          ,
          <source>in: Proceedings of the Forty-Second ACM Symposium on Theory of Computing</source>
          , STOC '10,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2010</year>
          , p.
          <fpage>603</fpage>
          -
          <lpage>610</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>I.</given-names>
            <surname>Eldar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Carmeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kimelfeld</surname>
          </string-name>
          ,
          <article-title>Direct access for answers to conjunctive queries with aggregation, 2023a</article-title>
          .rXiv:
          <volume>2303</volume>
          .
          <fpage>05327</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>