<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Declarative BigData Algorithms via Aggregates and Relational Database Dependencies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carlo Zaniolo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohan Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Interlandi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ariyam Das</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Shkapsky</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tyson Condie</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>zaniolo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ariyam</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>shkapsky</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>tcondie}@cs.ucla.edu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>matteo.interlandi@microsoft.com</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of California at Los Angeles</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The ability of using aggregates in recursive Datalog queries is making possible a simple declarative expression for important algorithms and it is conducive to their parallel implementations with scalability and performance that often surpass those of their formulations in GraphX and Scala. These recent advances were made possible by the notion of Pre-Mappability (PreM) that, along with a highly optimized seminaive-based operational semantics, guarantees their formal non-monotonic semantics for the programs expressing these declarative algorithms. However, proving that these programs have the PreM property can be too difficult for everyday programmers. Therefore in this paper, we introduce basic templates that facilitate and automate this formal task. These templates are based on simple extensions of Functional and Multivalued Dependencies (FDs and MVDs) whereby properties such as the mixed transitivity of MVDs and FDs are used to prove the validity of these powerful declarative algorithms.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The surge of interest in BigData has brought a renaissance of interest in Datalog and
its ability to specify, declaratively, advanced data-intensive applications that execute
efficiently over different systems and parallel architectures [
        <xref ref-type="bibr" rid="ref1 ref10 ref6 ref7 ref9">1, 6, 7, 9, 10</xref>
        ]. While
efficient and scalable support for non-recursive SQL queries was realized as far back
as the 1980s [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the much more powerful queries that use negation or aggregates
in recursion proved much more difficult, as the challenge of providing formal
nonmonotonic semantics and efficient implementations remained unanswered for many
years. Recently however, significant progress on both the implementation and semantic
fronts has been achieved by the UCLA BigDatalog project that, using the notion
of Pre-Mappability (PreM), can support a wide spectrum of declarative algorithms,
with scalability and performance levels that often surpass those of other Datalog
implementations, and those of Apache Spark application packages such as GraphX
[
        <xref ref-type="bibr" rid="ref10 ref7">7, 10</xref>
        ]. The notion of PreM was introduced in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] it is shown that
PreM enables the formulation of a wide range of applications under stable model
semantics. In this paper, we summarize those findings and delve deeper on how to verify
PreM by using simple templates based on extensions of functional and multivalued
dependencies.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Pre-Mappable Constraints</title>
      <p>
        This section contains a brief summary of results from [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In Example 1,
below, rule r3 computes the least distance from node a to the remaining nodes in
a directed graph by applying the constraint is_min((Y); D) to the pth(Y; D) atoms
produced by rules r1 and r2.
      </p>
      <p>Example 1 (Finding the minimal distance of nodes from a).</p>
      <p>r1 : pth(Y; D)
r2 : pth(Y; D)
r3 : qpth(Y; D)
arc(a; Y; D):
pth(X; Dx); arc(X; Y; Dxy); D = Dx + Dxy;
pth(Y; D); is_min((Y); D):
In our goal is_min((Y); D) , we will refer to (Y) as the group-by argument (group-by
arguments can consist of zero or more variables), and to D as the cost argument (cost
argument consists of a single variable). This goal states the constraint that for a pair
(Y; D) no other pair exists having the same Y-value and a smaller D-value. Thus the
formal meaning of our constraint is defined by replacing r3 with r4:
r4 : qpth(Y; D)</p>
      <p>pth(Y; D); :smlr_pth(Y; D):
where the goal is_min((Y); D) has been replaced by the negated goal :smlr_pth(Y; D),
where smlr_pth(Y; D) is defined as:
r5 : smlr_pth(Y; D)</p>
      <p>
        pth(Y; D); pth(Y; D1); D1 &lt; D:
The program consisting of rules r1, r2, r4, and r5 is stratified w.r.t. negation, with
pth occupying the lower stratum and qpth occupying the higher stratum. Thus, the
program has a perfect model semantics [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, the iterated fixpoint procedure
proposed in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is transfinite and very inefficient and even unsafe in practice—for
the example at hand it does not terminate when the graph has cycles. To solve these
problems, [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] introduces the PreM condition, under which qpth can be computed
safely and efficiently by pre-mapping the min goal into the rules defining pth, whereby
the following program is obtained:
Example 2 (The endo-min version of Example 1).
      </p>
      <p>
        r01 : pth(Y; D)
r02 : pth(Y; D)
r03 : qpth(Y; D)
arc(a; Y; D); is_min((Y); D):
pth(X; Dx); arc(X; Y; Dxy); D = Dx + Dxy; is_min((Y); D):
pth(Y; D):
Thus we have seen two formulations for this algorithm: the program in Example 2,
with min in recursion will be called the endo-min version, and the original program
of Example 1 that will be called its exo-min version. The PreM condition defined
next establishes a clear semantics relationship between the two versions: the exo-min
version defines its abstract perfect-model semantics, whereas the endo-min version
defines its optimized concrete semantics that assures more efficient computation and
termination in situations where the iterated fixpoint of the exo-min version would be
very inefficient or even non-terminating, as it is in fact the case when the graph defined
by arc contains directed cycles. PreM is indeed a very powerful condition that allows
us to express declaratively a large number of basic algorithms, while assuring that (i)
they have a rigorous non-monotonic semantics [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ], and (ii) they are amenable to
very efficient implementations of superior scalability [
        <xref ref-type="bibr" rid="ref10 ref7">7, 10</xref>
        ]. In the rest of this section,
we define PreM and its formal semantics, following [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. Then in the rest of the
paper we focus on defining simple templates that greatly simplify the task of proving
that the Datalog program at hand has the PreM property—as needed to allow everyday
programmers to take full advantage of this powerful new formal tool.
      </p>
      <p>
        Fixpoint and PreM Constraints. We will consider stratified programs, such as those of
Example 1, having a perfect model semantics computed by strata. At the lower stratum
we find the minimal model of the rules defined by (i) interpreted predicates, such as
comparison and arithmetic predicates, and (ii) positive rules such as r1 and r2 defining
the pth predicate. If T is the Immediate Consequence Operator (ICO) for the program
defined by these positive rules, then its unique minimal model is defined by the
leastfixpoint of T , which can be computed as T "w (0/ ) (a.k.a. the naive fixpoint computation).
The subset of this minimal model obtained by removing from T "w (0/ ) all the pth(Y; D)
that do not satisfy the constraint g = is_min((Y); D) will be called the extreme subset
of T "w (0/ ) defined by g. Then, the perfect model of the program in our example, can
be obtained by adding to T "w (0/ ) the pth atoms obtained by simply copying pth atoms
under the name qpth. We next formally define the notion of PreM [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] which allows
us to transform programs such as those of Example 1 into that of Example 2 where
the min or max constraints have been pushed (or more precisely transferred) into the
recursive rules.
      </p>
      <sec id="sec-2-1">
        <title>Definition 1 (The PreM Property). In a given Datalog program, let P be the rules</title>
        <p>defining a (set of mutually) recursive predicate(s). Also let T be the ICO defined by</p>
      </sec>
      <sec id="sec-2-2">
        <title>P. Then, the constraint g will be said to be PreM to T (and to P) when, for every</title>
        <p>interpretation I of P, we have that: g(T (I)) = g(T (g(I))).</p>
        <p>
          The importance of this property follows from the fact that if I = T (I) is a fixpoint for T ,
then we also have that g(I) = g(T (I)), and when g is PreM to T then: g(I) = g(T (I)) =
g(T (g(I))). Now, let Tg denote the application of T followed by g, i.e., Tg (I) = g(T (I)).
If I is a fixpoint for T and I0 = g(I), then the above equality can be rewritten as: I0 =
g(I) = g(T (g(I))) = Tg (I0). Thus, when g is PreM, the fact that I is a fixpoint for T
implies that I0 = g(I) is a fixpoint for Tg (I). In many programs of practical interest, the
transfer of constraints under PreM produces optimized programs for the naive fixpoint
computation that are safe and terminating even when the original programs were not.
Thus we focus on programs where, for some integer n, Tg"n(0/ ) = T "n+1(0/ ), i.e., the
g
fixpoint iteration converges after a finite number of steps n. As proven in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], the
fixpoint Tg"n(0/ ) so obtained is in fact a minimal fixpoint for Tg , where g denotes a min
or max constraint:
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Theorem 1. If g is PreM to a positive program P with ICO T and, for some integer</title>
        <p>n, Tg"n(0/ ) = Tg"n+1(0/ ), then:
(i) Tg"n(0/ ) = Tg"n+1(0/ ) is a minimal fixpoint for Tg , and
(ii) Tg"n(0/ ) = g(T "w(0/ )):
Therefore, when the PreM property holds, declarative exo-min (or exo-max) programs
are transformed into endo-min (or endo-max) programs having highly optimized
seminaive-fixpoint based operational semantics. For instance, consider Example 1 on
the following facts:</p>
      </sec>
      <sec id="sec-2-4">
        <title>Example 3 (arc facts for Example 1).</title>
        <p>arc(a; b; 6): arc(a; c; 10):
arc(b; c; 2): arc(c; d; 3):
arc(d; c; 1):</p>
        <p>In this example, while the computation of T "w (0/ ) will never terminate, the
computation of T "n(0/ ) produces the following pth atoms at each step of the
g
computation. We refer to these as cost-atoms. In the example below, we have aligned
in columns the cost atoms sharing the same group-by value, and used a bar to separate
cost atoms from their successors that, because of their lower costs, replaced them at the
next step.</p>
        <sec id="sec-2-4-1">
          <title>Example 4 (Computing Tg"n(0/ ) for Example 1 on facts in Example 3).</title>
          <p>Step 1: pth(b; 6), pth(c; 10).</p>
          <p>Step 2: pth(b; 6), pth(c; 8), pth(d; 13).</p>
          <p>Step 3: pth(b; 6), pth(c; 8), pth(d; 11).</p>
          <p>Step 4: pth(b; 6), pth(c; 8), pth(d; 11).</p>
          <p>
            Thus, we have derived the pth atoms pth(b; 6), pth(c; 8), pth(d; 11) which constitute
the extreme subset of the cost atoms in T "w (0/ ). Since the PreM property holds for the
recursive rules in Example 1, these extreme atoms, renamed qpth, along with the atoms
in T "w (0/ ) constitute the perfect model for the original program. Moreover, in [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] we
have shown that the program of Example 2, and most of the endo-min or endo-max
programs that satisfy PreM have a stable model semantics [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ].1 Besides its obvious
theoretical interest, the existence of a stable model dovetails with our experience that
programmers rather than writing exo-min and exo-max versions often write the
endomin or endo-max versions of their algorithms directly, inasmuch as these preserve the
spirit and intuition of the procedural formulations of the same algorithms.
          </p>
          <p>The obvious theoretical interest of PreM notwithstanding, the concept would
remain of limited practical interest, until we can provide programmers with simple tools
that they can use to verify that their declarative algorithms satisfy PreM. The rest of
the paper focuses on providing formal tools that answer this very practical requirement.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Declarative Algorithms and the PreM Property</title>
      <p>
        A wide spectrum of declarative algorithms of practical interest can be expressed by
programs that satisfy PreM. For some programs (e.g., those disussed in Section 3.4),
the proof that PreM holds is relatively simple, but for many others, including our
1 In general it can be shown that endo-min or endo-max programs have a total stable model
whenever there is no cyclic derivation in their cost atoms. This is in fact the case for Example 2,
when the graph defined by arc has no directed cycle, or its arcs have positive length [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
running example, i.e., Example 2, a direct proof can be quite challenging. To handle
these programs we propose simple tools that can be used by the programmer, or the
compiler, to prove PreM by drawing from the classical theory of Functional and
Multivalued Dependencies (FDs and MVDs) used in relational DB schema design.
      </p>
      <p>A first observation to be made is that PreM always holds for exit rules (since these
are used at the first step of T "w (0/ ), i.e., they are applied to the empty set). For recursive
rules, the validity of PreM can be illustrated by adding an additional goal to the rule
to express the pre-application of the extrema constraint g onto Tg (I), whereby g is first
applied to I. For instance, the recursive rule of our Example 2 should be re-written with
the insertion into the rule of the additional goalnis_min((X);Dx)=, which we will call the
drop-in goal, producing:
r020 : pth(Y; D)</p>
      <p>pth(X; Dx);nis_min((X);Dx)=; arc(X; Y; Dxy); D=Dx+Dxy; is_min((Y); D):
Observe that adding the drop-in goal corresponds to pre-applying the g constraint to
the argument I in (Tg (I)). For PreM to hold, we must have Tg (g(I)) = Tg (I) which
states that the drop-in goal has not changed the ICO mapping specified by the original
recursive endo-min rule. For Example 2, consider the g constraint applied to the pair
(Y; D) that produces the head of the rule, pth(Y; D): this constraint is is_min((Y); D),
which specifies that the second argument of pth is minimized for each value of the
first argument of pth. Therefore, the drop-in goal inserted after the goal is pth(X; Dx)
is nis_min((X);Dx)=. Then PreM is proven once we prove that the addition of this goal
does not change the ICO mapping defined by the rule. Providing such a proof can be
difficult in many cases, including the one of our running example. Therefore, we will
next identify common patterns that guarantee the validity of PreM for such complex
examples.
3.1 Inferring PreM from Relational DB Dependencies and Extrema
In many cases PreM can be inferred from the properties of extrema goals and the
multivalued dependencies that hold in the equivalent relational views of the
(functionfree) predicates in the rule body. Let I be an interpretation of our program P. Then,
Rq = f(x1; : : : ; xn) j q(x1; : : : ; xn) 2 Ig will be said to be the relational view of the
predicate q for the given I.</p>
      <sec id="sec-3-1">
        <title>Definition 2 (Functional Dependencies on tuples). Let R(W ) be a relation, and X W</title>
        <p>and A 2 W X . Then, we say that a given tuple t 2 R satisfies the FD X ! A if R does
not contain any tuple having the same X -value but a different A-value.
Now if the domain of A is totally ordered, X ! A holds for a tuple t if R contains no
tuple having the same X -value and an A-value that is either larger or smaller than the
A-value of t. We next define the concept of min-constraint and max-constraint that only
excludes the presence of smaller tuples and larger tuples, respectively.</p>
        <sec id="sec-3-1-1">
          <title>Definition 3. We will say that a tuple t 2 R satisfies the min-constraint is_min((X); A)</title>
          <p>and write X min* A when R contains no tuple having the same X -value and a
smaller A-value. Symmetrically, we say that the tuple t 2 R satisfies the max-constraint
is_max((X); A) and write X max+ A when R contains no tuple with the same X -value
and a larger A-value.
Informally X min* A and X max+ A can be viewed as "half-FDs", since both must
hold before we can conclude that X ! A. Moreover, while min-constraints and
maxconstraints on single tuples are much weaker than regular FDs, they preserve some of
their important formal properties including the ones discussed next that also involve
MVDs on single tuples, which we define next.</p>
          <p>Definition 4. Let t 2 R(W ), and X , Y , and Z be subsets of W where Z = W (X [ Y ).</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Then a tuple t 2 R satisfies the MVD X !! Y when the following property holds: if t0 is</title>
          <p>a tuple in R that is equal to t in its X -values, then R also contains (i) some tuple that is
identical to t in the X [ Y values and identical to t0 in the X [ Z values, and (ii) some
tuple that is identical to t0 in the X [ Y values and identical to t in the X [ Z values.
Observe that these definitions of tuple-based FDs and MVDs are consistent with the
standard ones used for whole relations since a relation satisfies a given set of MVDs
and FDs (with one attribute on their right side) iff these MVDs and FDs are satisfied by
each of its tuples. Therefore, the following properties hold for tuple constraints (i.e., for
min-constraints, max-constraints, and tuple MVDs) and also illustrate the appeal of the
arrow-based notation:</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Min/Max Augmentation: If X mi*n A and Z W , then X [ Z mi*n A.</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>If X ma+x A and Z W , then X [ Z ma+x A.</title>
        </sec>
        <sec id="sec-3-1-5">
          <title>MVD Augmentation: If X !! Y , Z W and Z W , then X [ W !! Y [ Z.</title>
          <p>Mixed Transitivity: If Y !! Z and Z mi*n A , with A 2= Z, then Y mi*n A.</p>
          <p>If Y !! Z and Z ma+x A, with A 2= Z, then Y ma+x A.</p>
          <p>
            The augmentation property for min and max constraints, follows directly from the
definition, while the mixed-transitivity property is proven in [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ].
3.2
          </p>
          <p>Declarative Algorithms
We will now show how advanced declarative algorithms can be expressed using
recursive programs with aggregates via the PreM condition that can be easily proven
using the properties of min/max constraints displayed above.</p>
          <p>Bill of Materials. A classical recursive application for traditional databases is Bill of
Materials (BOM), where we have a Directed Acyclic Graph (DAG) of parts-subparts,
assbl(Part; Subpart; Qty) describing how a given part is assembled using various
subparts, each in a given quantity. Not all subparts are assembled, since basic parts
are instead supplied by external suppliers in a given number of days, as per the facts
basic(Part; Days). Simple assemblies, such as bicycles, can be put together the very
same day in which the last basic part arrives. Thus, the time needed to produce the
assembly is the maximum number of days required by the basic parts it uses.
deliv(Part; Days)
deliv(Part; Days)
basic(Part; Days); is_max((Part); Days):
deliv(Sub; Days); assbl(Part; Sub); is_max((Part); Days):
Now, to determine if PreM holds, we must study the mapping of our rule transformed
by the drop-in goal as follows:
deliv(Part; Days)
deliv(Sub; Days);nis_max((Sub);Days)=;
assbl(Part; Sub); is_max((Part); Days):
Thus we want to prove that the drop-in goal does not change the mapping defined by
this rule. In our proof we will refer to is_max((Sub); Days) as the drop-in constraint
and to is_max((Part); Days) as the original constraint. Let R(Sub; Days; Part) be the
natural join of the relational views of deliv(Sub; Days) and assbl(Part; Sub). Then
the following MVDs hold for all tuples in R: Sub !! Days and Sub !! Part. Now for
any tuple in R that satisfies the constraint Part max+ Days, it also satisfies Sub max+
Days by Mixed Transitivity. Thus if R0 is the set of tuples in R that have the property
Part ma+x Days, R0 also satisfies Sub ma+x Days. Thus the drop-in constraint does not
change R0 or the mapping defined by it, since applying a constraint to a relation that
already satisfied it does not change the relation.</p>
          <p>Connected Components in a Graph. In this application we have an undirected graph,
where an edge connecting, say, a and b is represented by the pairs edge(a; b) and
edge(b; a). Then, if we represent the nodes by integers, we can select the node with the
lowest integer to serve as the representative for its clique.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Example 5 (Connected Components in an undirected graph.).</title>
        <p>cc(X; X)
cc(X; Z)
edge(X; _):
cc(X; Y); edge(Z; Y); is_min((Z); X):</p>
        <p>Here we can observe that in R(X; Y; Z) obtained as the natural join of cc(X; Y) and
edge(Z; Y) the following MVD holds: Y !! Z. Thus for tuples that satisfy the constraint
Z min* X, Y min* X also holds. This is the drop-in constraint, which therefore does not
change the mapping defined by our rule.</p>
        <p>Minimal Distances in Directed Graph. Let us now return to our Example 2, and let us
re-write its recursive rule with drop-in goal into the following equivalent one:
pth(Y; Dx+Dxy)</p>
        <p>pth(X; Dx);nis_min((X);Dx)=; arc(X; Y; Dxy); is_min((Y); Dx+Dxy):
Given a relation R(X ;Y1; : : :Yn; Z), the fact that Y1; : : : ;Yn !! X and Y1; : : : ;Yn !! Z
hold guarantees that the tuples that satisfy the condition is_min((Y1 : : : ; Yn); X + Z)
are exactly those that satisfy both conditions Y1 : : : ;Yn min* X and Y1 : : : ;Yn min* Z.
Therefore, for the example at hand, let R be the natural join of the relational views of
path(X; Dx) and arc(X; Y; Dxy). Then we can reason as follows:
Step 1 [Find MVDs in R]: X !! Dx and X !! Y; Dxy.</p>
        <p>Step 2 [Augment MVDs as needed to distribute distribute min]</p>
        <p>X; Y !! Dx and X; Y !! Y; Dxy hold in R. Then tuples that satisfy X; Y mi*n Dx+Dxy
also satisfy X; Y min* Dx and X; Y min* Dxy.</p>
        <p>Step 3 [Augment left sides for mixed transitivity and apply it]: From X !! X; Y; Dxy and
X; Y min* Dx augmented into X; Y; Dxy min* Dx, we infer our drop-in constraint:
X min* Dx.</p>
        <p>As illustrated by the next example, these patterns can be applied mechanically, whereby
PreM can be verified by compilers and users who do not know DB theory.
Non-linear Shortest paths. Shortest paths can also be computed with non-linear rules:</p>
      </sec>
      <sec id="sec-3-3">
        <title>Example 6 (Shortest paths à la Floyd).</title>
        <p>r0 : qsp(X; Y; Vxy)
r1 : qsp(X; Z; V)
arc(X; Y; Dxy); is_min((X; Y); Vxy):
qsp(X; Y; Vxy); qsp(Y; Z; Vyz);</p>
        <p>V = Vxy+Vyz; is_min((X; Z); V):
Here we must check PreM for two drop-in constraints, as follows:
qsp(X; Z; V)
qsp(X; Y; Vxy);nis_min((X;Y);Vxy)=; qsp(Y; Z; Vyz);nis_min((Y;Z);Vyz)=;</p>
        <p>V = Vxy + Vyz; is_min((X; Z); V):
So let R(X; Y; Vxy; Z; Vyz) be the natural join on the column Y of qsp(X; Y; Vxy) and
qsp(Y; Z; Vyz), we have:
Step 1 [Find MVDs in R] Y !! X; Vxy and Y !! Z; Vyz.</p>
        <p>Step 2 [Augment MVDs and min-constraints to match and distribute]</p>
        <p>X; Y; Z !! Vxy and X; Y; Z !! Vyz hold in R. Also augment X; Z min* Vxy + Vyz to
X; Y; Z mi*n Vxy + Vyz, from which we derive X; Y; Z mi*n Vxy and X; Y; Z mi*n Vyz.
Step 3 [Augment min constraint and apply mixed transitivity] From the second MVD in
Step 1 and the first min-constraint in Step 2, we infer: X; Y !! X; Y; Z; Vyz and
X; Y; Z; Vyz mi*n Vxy. From these two we infer: X; Y mi*n Vxy. Symmetrically, from
Y; Z !! X; Y; Z; Vxy and X; Y; Z; Vxy min* Vyz, we infer Y; Z min* Vyz. tu
3.3</p>
        <p>
          The Aggregates Count and Sum in Recursive Rules
The traditional count and sum aggregates can be viewed as the maximized versions
of their cumulative (a.k.a. progressive) versions, which we will, respectively, denote by
mcount and msum [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. For instance, the following example shows how the progressive
count of items can be defined using Horn clauses.
        </p>
        <p>Example 7 (Progressive, monotonic count of elements item).</p>
        <p>mcnt(1; [IT])
mcnt(J1; [ITjAllpr])
notin(IT1; [IT2 j Rest])
notin(IT; [ ]):
item(IT):
item(IT); mcnt(J; Allpr);
notin(IT; Allpr); J1 = J+1:</p>
        <p>
          IT1 6= IT2; notin(IT2; Rest):
The above program progressively enumerates and counts all the items in every possible
order; the program is clearly monotonic in the lattice of set-containment, whereby the
standard least-fixpoint semantics applies. The mcount aggregate is supported in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] as
an efficient built-in aggregate that only visits and progressively counts the items in the
order in which they are stored. Moreover, the standard count aggregate is the maximum
value returned by the progressive count, whereby count can be used instead mcount in
programs where PreM is satisfied for max.
        </p>
        <p>For instance, in the following example, in addition to the organizers, we include
people who see that three or more of their friends have joined the event, using the
mcount built-in, which takes three arguments: the first is the group-by argument, the
second is the item being counted, and the third is the result, i.e., the progressive count
being returned.</p>
        <p>Example 8 (Joining the event).</p>
        <p>jnd(X; 0)
jnd(Y; Ycnt)
result(Y; Ycnt)
organizer(X):
jnd(X; Cx); Cx
jnd(Y; Ycnt):</p>
        <p>3; friend(Y; X); mcount((Y); X; Ycnt):
Thus, if tom has joined the event along with five of his/her friends, the use of
mcount produces the following progressive result: result(tom; 1), result(tom; 2),
result(tom; 3), result(tom; 4), result(tom; 5). If we are only interested in the
actual count, i.e., to get back result(tom; 5), then we can add the additional goal
is_max((Y); Ycnt) to the final rule. Then a natural question to arise is whether
is_max((Y); Ycnt) can actually be pre-mapped into the recursive rule, which would
become the one below:
jnd(Y; Ycnt)
jnd(X; Cx); Cx 3; friend(Y; X);
mcount((Y); X; Ycnt); is_max((Y); Ycnt):
where mcount((Y); X; Ycnt); is_max((Y); Ycnt) can then be evaluated by the regular
count count((Y); X; Ycnt).</p>
        <p>For that, we will have to show that a drop-in goal is_max((X); Cx) does not change
the mapping defined by this rule. Indeed, for a given X-value, there exists some Cx value
such that Cx 3 if this inequality is satisfied by the largest of those Cx-values. Thus,
PreM holds and mcount can be optimized into the regular count. Observe that unlike
the examples in Section 3.2, PreM is established without exploiting the relationships
between the cost arguments of the original is_max goal and the drop-in goal. Also
observe that PreM no longer holds if the condition Cx 3 is replaced by Cx = 3 or
Cx &lt; 3.</p>
        <p>
          From Monotonic Sum to Regular Sum. The notion of monotonic sum (msum) for
positive numbers was introduced in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] using the fact that the semantics of msum and sum
for positive numbers can be easily reduced to that of mcount and count, respectively
by the fact that, given a set of integers S, their sum is equal to the count of the pairs
f(N; j)jN 2 S and 1 j Ng. Thus the rule patterns for which PreM holds and
thus the maximized msum can be evaluated as a regular sum are basically those where
mcount can be evaluated as a regular count. These patterns actually apply to a large
number of examples discussed in the literature, including, e.g., counting the paths in
a directed graph, Viterbi algorithm, Company Control, and other algorithms that were
covered in [
          <xref ref-type="bibr" rid="ref3 ref4 ref7">3, 4, 7</xref>
          ]. Furthermore, as discussed in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], besides positive integers, arbitrary
floating-point numbers can also be used in the sum.
Recent work on Datalog programs using aggregates in recursion has delivered formal
semantics [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and efficient implementations [
          <xref ref-type="bibr" rid="ref10 ref7">7,10</xref>
          ] for programs that satisfy the PreM
condition, which can be used to specify concisely and declaratively a wide spectrum of
declarative algorithms. In this paper, we have addressed the problem of making PreM
easier to use and thereby attractive to programmers. For that, we have proposed simple
arrow-based patterns that will allow the user, or the compiler, to infer that PreM holds
for their program. Simple extensions of the dependency theory of relational DBs have
been used to prove the correctness of these patterns.
        </p>
        <p>Besides exploring how to apply these advances to other query languages, including
SQL, our current research seeks to understand and characterize the power and
limitations of this approach in supporting declarative algorithms. In particular we would
like to (i) characterize which algorithms can be expressed by PreM programs using
aggregates in recursion (including the four discussed here and other aggregates as
well), and (ii) how the pattern-based tools we have presented here can be generalized to
simplify the life of programmers and the task of compilers in, respectively, specifying
and supporting powerful declarative algorithms.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Aref</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ten
            <surname>Cate</surname>
          </string-name>
          , T. J.
          <string-name>
            <surname>Green</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Kimelfeld</surname>
          </string-name>
          , et al.
          <article-title>Design and implementation of the LogicBlox system</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <fpage>1371</fpage>
          -
          <lpage>1382</lpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Gelfond</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Lifschitz</surname>
          </string-name>
          .
          <article-title>The stable model semantics for logic programming</article-title>
          .
          <source>In ICLP</source>
          , pages
          <fpage>1070</fpage>
          -
          <lpage>1080</lpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>M.</given-names>
            <surname>Mazuran</surname>
          </string-name>
          , E. Serra, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          .
          <article-title>A declarative extension of Horn clauses, and its significance for Datalog and its applications</article-title>
          .
          <source>TPLP</source>
          ,
          <volume>13</volume>
          (
          <issue>4-5</issue>
          ):
          <fpage>609</fpage>
          -
          <lpage>623</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Mazuran</surname>
          </string-name>
          , E. Serra, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          .
          <article-title>Extending the power of Datalog recursion</article-title>
          .
          <source>The VLDB Journal</source>
          ,
          <volume>22</volume>
          (
          <issue>4</issue>
          ):
          <fpage>471</fpage>
          -
          <lpage>493</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Przymusinski</surname>
          </string-name>
          .
          <article-title>Perfect model semantics</article-title>
          .
          <source>In ICLP/SLP</source>
          , pages
          <fpage>1081</fpage>
          -
          <lpage>1096</lpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Seo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Lam</surname>
          </string-name>
          .
          <article-title>Distributed SociaLite: a Datalog-based language for large-scale graph analysis</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>6</volume>
          (
          <issue>14</issue>
          ):
          <fpage>1906</fpage>
          -
          <lpage>1917</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Shkapsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Interlandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Condie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          .
          <article-title>Big data analytics with Datalog queries on Spark</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <fpage>1135</fpage>
          -
          <lpage>1149</lpage>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Teradata</surname>
          </string-name>
          .
          <article-title>Data Bases Computer Concepts and Facilities</article-title>
          .
          <source>Teradata: Document Number CO2- 0001-00</source>
          ,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Balazinska</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Halperin</surname>
          </string-name>
          .
          <article-title>Asynchronous and fault-tolerant recursive Datalog evaluation in shared-nothing engines</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>8</volume>
          (
          <issue>12</issue>
          ):
          <fpage>1542</fpage>
          -
          <lpage>1553</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>M. Yang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Shkapsky</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Zaniolo</surname>
          </string-name>
          .
          <article-title>Scaling up the performance of more powerful Datalog systems on multicore machines</article-title>
          .
          <source>The VLDB Journal</source>
          ,
          <volume>26</volume>
          (
          <issue>2</issue>
          ):
          <fpage>229</fpage>
          -
          <lpage>248</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>C. Zaniolo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Shkapsky</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Condie</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Interlandi</surname>
          </string-name>
          .
          <article-title>Fixpoint semantics and optimization of recursive Datalog programs with aggregates</article-title>
          .
          <source>TPLP</source>
          ,
          <volume>17</volume>
          (
          <issue>5-6</issue>
          ):
          <fpage>1048</fpage>
          -
          <lpage>1065</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>C. Zaniolo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Shkapsky</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Condie</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Interlandi</surname>
          </string-name>
          .
          <article-title>Declarative algorithms in Datalog with aggregates: their formal semantics simplified. submitted for publication</article-title>
          , pages
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>C. Zaniolo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Interlandi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Shkapsky</surname>
            , and
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Condie</surname>
          </string-name>
          .
          <article-title>Declarative algorithms by aggregates in recursive queries: their formal semantics simplified</article-title>
          .
          <source>Report no. 180001</source>
          , Computer Science Department, UCLA, April,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>