<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the CALM Principle for BSP Computation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matteo Interlandi</string-name>
          <email>minterlandi@cs.ucla.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Letizia Tanca</string-name>
          <email>letizia.tanca@polimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Milano</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of California</institution>
          ,
          <addr-line>Los Angeles</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In recent times, considerable emphasis has been given to two apparently disjoint research topics: data-parallel and eventually consistent, distributed systems. In this paper we propose a study on an eventually consistent, dataparallel computational model, the keystone of which is provided by the recent finding that a class of programs exists that can be computed in an eventually consistent, coordination-free way: monotonic programs. This principle is called CALM and has been proven by Ameloot et al. for distributed, asynchronous settings. We advocate that CALM should be employed as a basic theoretical tool also for data-parallel systems, wherein computation usually proceeds synchronously in rounds and where communication is assumed to be reliable. We deem this problem relevant and interesting, especially for what concerns parallel workflow optimization, and make the case that CALM does not hold in general for dataparallel systems if the techniques developed by Ameloot et al. are directly used. In this paper we sketch how, using novel techniques, the satisfiability of the if direction of the CALM principle can still be obtained, although just for a subclass of monotonic queries.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Recent research has explored ways to exploit different levels of consistency in order to
improve the performance of distributed systems w.r.t. specific tasks and network
configurations, while maintaining correctness [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. A topic strictly related to consistency
is coordination, usually informally interpreted as a mechanism to accomplish a
distributed agreement on some system property [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Indeed, coordination can be used to
enforce consistency when, in the natural execution of a system, this is not guaranteed
in general. In this paper we sketch some theoretical problems springing from the use of
eventually consistent, coordination-free computation over synchronous systems with
reliable communication (rsync). Informally, such systems have the following properties:
(i) a global clock is defined and accessible by every node; (ii) the relative difference
between the time clock values of any two nodes is bounded; and (iii) the results
emitted by a node arrive at destination at most after a certain bounded physical time (the
so-called bounded delay guarantee).
      </p>
      <p>
        Rsync is a common setting in modern data-parallel frameworks - such as
MapReduce - in which computation is usually performed in rounds, where each task is blocked
and cannot start the new round until a synchronization barrier is reached, i.e., every
other task has completed its local computation. In this work we consider
synchronization (barrier) and coordination as two different, although related entities: the former is
a mechanism enforcing the rsync model, the latter a property of executions.
Identifying under what circumstances eventually consistent, coordination-free computation can
be employed over rsync systems would enable us to “stretch” the declarativeness of
parallel programs, freeing execution plans of the restriction to follow predefined
(synchronous) patterns. In fact, all recent high-level data-parallel languages suffer from this
limitation, for instance both Hive [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and Pig [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] sacrifice pipelining in order to fit
query plans into MapReduce workflows. Our aim is then to understand when a
synchronous “blocking” computation is actually required by the program semantics – and
therefore must be strictly enforced by the system – and when, instead, a pipelined
execution can be performed as optimization. For batch parallel processing, the benefits
of understanding where the former can be replaced by the latter are considerable [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]:
thanks to the fact that data is processed as soon as it is produced, online computation is
possible, i.e., the final result can be refined during the execution; as a consequence, new
data can be incrementally added to the input, making continuous computation possible.
Overall, pipelining is highly desirable in the Big Data context, where full
materialization is often problematic.
      </p>
      <p>
        Recently, a class of programs that can be computed in an eventually consistent,
coordination-free way has been identified: monotonic programs [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]; this principle is
called CALM (Consistency and Logical Monotonicity) and has been proven in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
While CALM was originally proposed to simplify the specification of distributed
(asynchronous) data management systems, in this paper we advocate that CALM should be
employed as a basic theoretical tool also for the declarative specification of data-parallel
(synchronous) systems. As a matter of fact, CALM permits to link a property of the
execution (coordination-freedom) with a class of programs (monotonic queries). But to
which extent CALM can be applied over data-parallel systems? Surprisingly enough,
the demonstration of the CALM principle in rsync systems is not trivial and, with the
communication model and the notion of coordination as defined in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the CALM
principle does not hold in general in rsync settings (cf: Example 3). Thus, in order
to extend CALM over data-parallel synchronous computation, in this paper we sketch
a new generic parallel computation model leveraging previous works on synchronous
Datalog [
        <xref ref-type="bibr" rid="ref10 ref12">10, 12</xref>
        ] and transducer networks [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and grounding rsync computation on the
well-known Bulk Synchronous Parallel (BSP) model [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] equipped with content-based
addressing. With BSP, computation proceeds as a series of global rounds, each
composed by three phases: (i) a computation phase, in which nodes parallely perform local
computations; (ii) a communication phase, where data are exchanged among the nodes;
and (iii) the synchronization barrier. Exploiting this new type of transducer network,
we will then show that the CALM principle is satisfied for synchronous and reliable
systems under a new definition of coordination-freedom, although, surprisingly enough,
just for a subclass of monotonic queries, i.e., the chained monotonic queries (cf:
Definition 5.7). When defining coordination-freedom we will take advantage of recent results
describing how knowledge can be acquired in synchronous systems [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
Organization: The paper is organized as follows: Section 2 introduces some
preliminary notation. Section 3 defines our model of synchronous and reliable parallel system,
and shows that the CALM principle is not satisfied for systems of this type. Section
3.2 proposes a new computational model based on hashing, while Section 4 introduces
the new definition of coordination. Finally, Section 5 discusses CALM under the new
setting. The paper ends with some concluding remarks. We refer the reader to [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for
proofs and more detailed discussions.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Relational Transducers</title>
      <p>
        In this paper we expect the reader to be familiar with the basic notions of database
theory and relational transducer (networks). In this section we use some example to set
forth our notation, which is close to that of [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>We employ a transducer (resp. a transducer network) as an abstraction modeling the
behavior of a single computing node (resp. a network of computing nodes): this abstract
computational model permits us to make our results as general as possible without
having to rely on a particular framework, since transducers and transducer networks
can be easily imposed over any modern data-parallel system. We consider each node
to be equipped with an immutable database and a memory used to store useful data
between any two consecutive computation steps. In addition, a node can produce an
output for the user and can also communicate some data to other nodes (the concept of
data communication in a transducer network will appear clearer in Section 3). Finally,
an internal time, and system data are kept mainly for configuration purposes. Every
node executes a program that operates on (input) instances of the database, the memory
and the communication channel, and produces new instances that are either saved in
memory, or directly output to the user, or addressed to other nodes.</p>
      <p>Example 1. A first example of relational transducer is the following UCQ-transducer
T , with schema , that computes the ternary relation Q as the join between two binary
relations R and T :</p>
      <p>Schema: db = fR(2); T (2)g; mem = ;; com = ;; out = fQ(3)g</p>
      <p>Program: Qout(u; v; w) R(u; v); T (v; w):
Let I be an initial instance over which we want to compute the join. Then, let us define
Idb = I as an instance over the database schema db. A transition I ! J for T is
such that I = I [ Isys, Ircv and Jsnd are empty (no communication query exists), and
J = I [ Iout [ Isys, where Iout is the result of the query Qout, i.e., the join between
R and T . Note that the subscript in Qout means that this is an output query, that is, it
specifies the final result of the whole computation.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Computation in rsync</title>
      <p>
        In order to allow query evaluation in parallel settings, we will sketch a novel transducer
network [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], where computation is synchronous, and communication is reliable. This
permits us to define how a set of relational transducers can be assembled to obtain an
abstract computational model for distributed data-parallel systems. To be consistent
with [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we will assume broadcasting as the addressing model.
      </p>
      <p>Example 2. Assume we want to compute a distributed version of the join of Example 1.
We can implement it using a broadcasting synchronous transducer network which emits
one of the two relations, say T , and then joins R with the received facts over T . Note
that the sent facts will be used just starting from the successive round, and the program
will then employ two rounds to compute the distributed join. UCQ is again expressive
enough. The transducer network can be written as follows – where Ssnd denotes a
communication query and this time schema com is non-empty because communication
is needed:</p>
      <sec id="sec-3-1">
        <title>Program: Ssnd(u; v)</title>
        <p>Schema: db = fR(2); T (2)g; com = fS(2)g; out = fQ(3)g</p>
        <p>T (u; v):
Qout(u; v; w)</p>
        <p>R(u; v); S(u; w):</p>
      </sec>
      <sec id="sec-3-2">
        <title>Synchronous specifications have the required expressive power:</title>
        <p>Lemma 1 Let L be a language containing UCQ and contained in DATALOG:. Every
query expressible in L can be distributively computed in 2 rounds by a broadcasting
L-transducer network.</p>
        <p>
          The above lemma permits us to draw the following conclusion: under the rsync
semantics, monotonic and non-monotonic queries behave in the same way: two rounds
are needed in both cases. This is due to the fact that, contrary to what happens in the
asynchronous case of [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], starting from the second round we are guaranteed – by the
reliability of the communication and the synchronous assumption – that every node will
compute the query over every emitted instance. Conversely, in the asynchronous case,
as a result of the non-determinism of the communication, we are never guaranteed,
without coordination, that every sent fact will be actually received.
3.1
        </p>
        <sec id="sec-3-2-1">
          <title>The CALM Conjecture</title>
          <p>
            The CALM conjecture [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] specifies that a well-defined class of programs can be
distributively computed in an eventually consistent, coordination-free way: monotonic
programs. CALM has been proven in this (revisited) form for asynchronous systems [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]:
Conjecture 1 A query can be distributively computed by a coordination-free
transducer network if and only if it is monotonic.
          </p>
          <p>
            The concept of coordination suggests that all the nodes in a network must exchange
information and wait until an agreement is reached about a common property of interest.
Following this intuition, Ameloot et al. established that a specification is
coordinationfree if communication is not strictly necessary to obtain a consistent final result.
Surprisingly enough, under this definition of coordination-freedom, CALM does not hold
in rsync settings under the broadcasting communication model:
Example 3. Let Qout be the “emptiness” query of [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]: given a nullary database relation
R(0) and a nullary output relation T (0), Qout outputs true (i.e., a nullary fact over T )
iff IR is empty. The query is non-monotonic: if IR is initially empty, then T is
produced, but if just one fact is added to R, T is not derived, i.e., IT must be empty. A
FO-transducer network N can be easily generated to distributively compute Qout: first
every node emits R if its local partition is not empty, and then each node locally
evaluates the emptiness of R. Since the whole initial instance is installed on every node when
R is checked for emptiness, T is true only if R is actually empty on the initial instance.
The complete specification follows.
          </p>
          <p>Schema: db = fR(0)g; mem = fReady(0)g; com = fS(0)g; out = fT (0)g:
Program: Ssnd() R():</p>
          <p>Readyins()</p>
          <p>:Ready():
Tout()</p>
          <p>
            :S(); Ready():
One can show [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] that, if communication is switched off, the above transducer is still
able to obtain the correct result if, for example, I is installed on every node. That is,
a partitioning exists, making communication not strictly necessary to reach the proper
result. Note that the same query requires coordination in asynchronous settings: since
emitted facts are non-deterministically received, the only way to compute the correct
result is that nodes coordinate to understand if the input instance is globally empty.
          </p>
          <p>
            The result we have is indeed interesting although expected: when we move from the
general asynchronous model to the more restrictive rsync setting, we no longer have a
complete understanding of which queries can be computed without coordination, and
which ones, instead, do require coordination. It turns out that both the communication
model and the definition of coordination proposed in [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] are not strong enough to work
in general for synchronous systems. As the reader may have realized, this is due to the
fact that, in broadcasting synchronous systems, coordination – as defined by Ameloot
et al. – is already “baked” into the model. In the next sections we will see that our
definition of coordination-freedom guarantees eventually consistent computation for those
queries that do not rely on broadcasting in order to progress. That is, the
discriminating condition for eventual consistency is not monotonicity, but the fact that it is not
necessary to send a fact to all the nodes composing a network.
3.2
          </p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Hashing Transducer Networks</title>
          <p>
            Broadcasting specifications are not really convenient from a practical perspective.
Following other parallel programming models such as MapReduce, in this section we are
going to introduce hashing transducer networks: i.e., synchronous networks of
relational transducers equipped with a content-based communication model founded on
hashing. Under this new model, the node to which an emitted fact must be addressed is
derived using a hash function applied to a subset of its terms called keys.
Example 4. This program is the hashed version of Example 2, where every tuple
emitted over S and U is hashed on the first term (this is specified by the schema definition
S(1;2) and U (1;2), where the pair (1; 2) means that the related relation has arity 2 and
the first term is the key-term). In this way we are assured that, for each pair of joining
tuples, at least a node exists containing the pair. This because S and U are joined over
their key-terms, and hence the joining tuples are addressed to the same node.
We have seen in Section 3.1 that, for rsynch systems, a particular notion of
coordinationfreedom is needed. In fact we have shown that, under such model, certain
nonmonotonic queries – Example 3 – requiring coordination under the asynchronous model
can be computed in a coordination-free way. The key-point is that, as observed in [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ],
in asynchronous systems coordination-freedom is directly related to
communicationfreedom under ideal partitioning. That is, if the partitioning is correct, no
communication is required to correctly compute a coordination-free query because (i) no data
must be sent (the partition is correct), and (ii) no “control message” is required to
obtain a consistent result (the query is coordination-free). However, due to its synchronous
nature, in rsync settings non-monotonic queries can be computed in general without
resorting to coordination because coordination is already “baked” into the rsync model:
each node is synchronized with every other one, hence “control messages” are somehow
implicitly assumed. In this section we introduce a novel knowledge-oriented
perspective linking coordination with the way in which explicit and implicit information flows
in the network. Under this perspective, we will see that coordination is needed if, to
maintain consistency, a node must have some form of information exchange with all
the other nodes.
Achieving coordination in asynchronous systems is a costly task. A necessary condition
for coordination in such systems is the existence of primitives that enforce some
control over the ordering of events. In a seminal paper [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ], Lamport proposed a
synchronization algorithm based on the relation of potential causality (!) over asynchronous
events. According to Lamport, given two events e; e0, we have that e ! e0 if e
happens before e0 and e might have caused e0. From a high-level perspective, the
potential causality relation models how information flows among processes, and therefore
can be employed as a tool to reason on the patterns which cause coordination in
asynchronous systems. A question now arises: what is the counterpart of the potential
causality relation for synchronous systems? Synchronous potential causality (syncausality in
short) has been recently proposed [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] to generalize Lamport’s potential causality to
synchronous systems. Using syncausality we are able to model how information flows
among nodes with the passing of time. Consider a parallel execution trace – called a
run – and two points in this execution ( i; t), ( j ; t0) for (possibly not distinct) nodes
i; j, identifying the local state for i, j at time t and t0 respectively. We say that ( j ; t0)
causally depends on ( i; t) if either i = j and t t0 – i.e., a local state depends on the
previous one – or a tuple has been emitted by node i at time t, addressed to node j, with
t &lt; t01. We refer to these two types of dependencies as direct.
          </p>
          <p>Definition 4.1. Given a run , we say that two points ( i; t), ( j ; t0) are related by a
direct potential causality relation !, if one of the following is true:
1. t0 = t + 1 and i = j;
2. t0 t + 1 and node i sent a tuple at time t addressed to j;
3. there is a point ( k; t00) s.t. ( i; t) ! ( k; t00) and ( k; t00) ! ( j ; t0).
Note that direct dependencies define precisely Lamport’s happen-before relation – and
hence we maintain the same signature !.</p>
          <p>
            Differently from asynchronous systems, we however have that a point on node j
can occasionally indirectly depend on another point on node i even if no fact addressed
to j is actually sent by i. This is because j can still draw some conclusion simply
as a consequence of the bounded delay guarantee of synchronous systems. That is,
each node can use the common knowledge that every sent tuple is received at most
after a certain bounded delay to reason about the state of the system. The bounded
delay guarantee can be modelled as an imaginary N U LL fact, like in [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. Under this
perspective, indirect dependencies appear the same as the direct ones, although, instead
of a flow generated by “informative” facts, with the indirect relationship we model the
flow of “non-informative”, N U LL facts.
          </p>
          <p>Definition 4.2. Given a run , we say that two points ( i; t), ( j ; t0) are related by
an indirect potential causality relation 99K, if i 6= j, t0 t + 1 and a N U LLiR fact
addressed to node j has been (virtually) sent by node i at round t.</p>
          <p>An interesting fact about the bounded delay guarantee is that it can be employed to
specify when negation can be safely applied to a predicate. In general, negation can be
applied to a literal R(u) when the content of R is sealed for what concerns the current
round. In local settings, we have that such condition holds for a predicate at round t0 if
its content has been completely generated at round t, with t0 &gt; t. In distributed settings,
we have that if R is a communication relation, being in a new round t0 is not enough,
in general, for establishing that its content is sealed. This is because tuples can still
be floating, and therefore, until we are assured that every tuple has been delivered, the
above condition does not hold. The result is that negation cannot be applied safely. We
can reason in the same way also for every other negative literal depending on R. We will
then model the fact that the content of a communication relation R is stable because
of the bounded delay guarantee, by having every node i emit a fact N U LLiR at round t,
for every communication relation R, which will be delivered at node j exactly by the
1 Note that a point in a synchronous system is what Lamport defines as an event in an
asynchronous system.
next round. We then have that the content of R is stable once j has received a N U LLiR
fact from every node i contained in the set N of nodes composing the network. The
sealing of a communication relation at a certain round is then ascertained only when
jN j N U LLR facts have been counted. Recall that not necessarily the N U LLiR facts
must be physically sent. This in particular is true under our rsync model, where the
strike of a new round automatically seals all the communication relations. Example 5
shows one situation in which this applies.</p>
          <p>Example 5. Consider the hashing version of the program of Example 3. Let I be an
initial instance. At round t + 1 we have that the relation S is stable, and hence negation
can be applied. Note that if R is empty in the initial instance, no fact is sent. Despite
this, every node can still conclude at round t + 1 that the content of S is stable. In this
situation we clearly have an indirect potential causality relation.</p>
          <p>We are now able to introduce the definition of syncausality: a generalization of
Lamport’s happen-before relation which considers not only the direct information flow,
but also the flow generated by indirect dependencies.</p>
          <p>Definition 4.3. Let be a run. The syncausality relation ; is the smallest relation s.t.:
1. if ( i; t) ! ( j ; t0), then ( i; t) ; ( j ; t0);
2. if ( i; t) 99K ( j ; t0), then ( i; t) ; ( j ; t0); and
3. if ( i; t) ; ( j ; t0) and ( j ; t0) ; ( k; t00), then ( i; t) ; ( k; t00).
4.2</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>From Syncausality to Coordination</title>
          <p>We next propose the predicate-level syncausality relationship, modeling causal relations
at the predicate level. That is, instead of considering how (direct and indirect)
information flows between nodes, we introduce a more fine-grained relationship modelling the
flows between predicates and nodes.</p>
          <p>Definition 4.4. Given a run , we say that two points ( i; t), ( j ; t0) are linked by a
relation of predicate-level syncausality ;R, if any of the following holds:
1. i = j, t0 = t + 1 and a tuple over R 2 mem [ out has been derived by a query
in Qins [ Qout at time t0;
2. R 2 com and node i sends a tuple over R at time t addressed to node j, with
t0 t + 1;
3. R 2 com and node i (virtually) sends a N U LLiR fact at time t addressed to node
j, with t0 t + 1;
4. there is a point ( k; t00) s.t. ( i; t) ;R ( k; t00) and ( k; t00) ;R ( j ; t0).
We are now able to specify a condition for achieving coordination. Informally, we have
that coordination exists when all the nodes of a network reach a common agreement
that some event happened. But the only way to reach such an agreement is that a (direct
or indirect) information flow exists between the node in which the event actually
occurs, and every other node. This is a sufficient and necessary condition because of the
reliability and bounded-delay guarantee of rsync systems. Formalizing this intuition by
means of the (predicate level) syncausality relationship we have that:
( i; t) ;R ( j ; t0).</p>
          <p>
            Definition 4.5. Let N be a set of nodes. We say that a synchronous relational
transducer network manifests the coordination pattern if, for all possible initial instances
I 2 inst( db), whichever run we select, a point ( i; t) and a communication
relation R exist so that 8j 2 N there is a predicate-level syncausality relation such that
We call node i the coordination master. A pattern with a similar role has been named
broom in [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ].
          </p>
          <p>
            Remark: The reader can now appreciate to which extent coordination was already
“baked” inside the broadcasting synchronous specifications of Section 3. Note that
broadcasting, in rsync, brings coordination. This is not true in asynchronous systems.
Intuitively, the coordination master is where the event occurs. If a broadcasting of
(informative or non-informative) fact occurs, then such event will become common
knowledge [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] among the nodes. On the contrary, if broadcasting is not occurring, common
knowledge cannot be obtained and therefore, if the correct final outcome is still reached,
this is obtained without coordination. That is, if at least a non-trivial configuration
exists s.t. the coordination pattern doesn’t manifest itself, we have coordination-freedom.
5
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>CALM in rsync Systems</title>
      <p>The original version of the CALM principle is not satisfiable in rsync systems because
a monotonic class of queries exists—i.e., unchained queries, introduced next—which
is not coordination-free. Informally, a query is chained if every relation is connected
through a join-path with every other relation composing the same query.
Definition 5.6. Let body(qR) be a conjunction of literals defining the body of a query
qR. We say that two different positive litteral occurrences Ri(ui), Rj (uj ) 2 body(qR)
are chained in qR if either:
– ui \ uj 6= ;; or
– a third relation Rk 2 qR different from Ri, Rj exists such that Ri is chained with</p>
      <p>Rk, and Rk is chained with Rj .</p>
      <p>Definition 5.7. A query Qout is said chained if, for every rule qR 2 Qout, each
relation occurrence Ri 2 body(qR) is chained with every other relation occurrence
Rj 2 body(qR).</p>
      <sec id="sec-4-1">
        <title>Remark: Nullary relations are not chained by definition.</title>
        <p>Example 6. Assume two relations R(2) and T (1), and the following query Qout
returning the full R-instance if T is nonempty.</p>
        <p>Q(u; v)</p>
        <p>R(u; v); T ( ):
The query is clearly monotonic. Let T be the following broadcasting UCQ-transducer
program computing Qout.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Program: Ssnd(u; v)</title>
        <p>Assume now we want to make the above transducer a hashing one. We have that,
whichever key we chose, the related specification might be no more consistent. Indeed,
consider an initial instance I and a set of keys spanning all the terms of S and U .
Assume I such that adom(IR) adom(IT ), and a network composed by a large number
of nodes. In this situation, it may happen that a nonempty set of facts over R is hashed
to a certain node i, while no fact over T is hashed to i. This because a constant may
exist in adom(IR) that is not in adom(IT ) and for which the hashing function returns a
node i not returned by hashing any constant in adom(IT ). Hence no tuple emitted to i
will ever appear in the output, although they do appear in Qout(I). Thus this transducer
is not eventually consistent.</p>
        <p>
          From the above example we can intuitively see that, for rsync, a final consistent
result can be obtained without coordination only for queries that are chained and
monotonic. That is, the following restricted version of the CALM conjecture holds for rsync
systems:
Theorem 1 A query can be parallelly computed by a coordination-free transducer
network if it is chained and monotonic [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>We will leave for future works the investigation on whether every monotone and chained
query is also coordination-free.</p>
        <p>
          Remark: For the readers familiar with the works [
          <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
          ] our result state that under the
rsync model, a query is computable in a coordination-free way if monotonic and
distributing over components.
6
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper the CALM principle is analyzed under synchronous and reliable settings.
By exploiting CALM, in fact, we would be able to break the synchronous cage of
modern parallel computation models, and provide pipelined coordination-free executions
when allowed by the program logic. In order to reach our goal, we have introduced a
new abstract model emulating BSP computation, and a novel interpretation of
coordination with sound logical foundations in distributed knowledge reasoning. By exploiting
such techniques, we have shown that the if direction of the CALM principle indeed
holds also in rsync settings, but just for the subclass of monotonic queries defined as
chained.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hull</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Vianu</surname>
          </string-name>
          .
          <source>Foundations of Databases. Addison-Wesley</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Ameloot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ketsman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Neven</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Zinn</surname>
          </string-name>
          .
          <article-title>Weaker forms of monotonicity for declarative networking: a more fine-grained answer to the calm-conjecture</article-title>
          .
          <source>In PODS</source>
          , pages
          <fpage>64</fpage>
          -
          <lpage>75</lpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Ameloot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ketsman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Neven</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Zinn</surname>
          </string-name>
          .
          <article-title>Datalog queries distributing over components</article-title>
          .
          <source>In ICDT. ACM</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Ameloot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Neven</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Van Den Bussche.</surname>
          </string-name>
          <article-title>Relational transducers for declarative networking</article-title>
          .
          <source>J. ACM</source>
          ,
          <volume>60</volume>
          (
          <issue>2</issue>
          ):
          <volume>15</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          :
          <fpage>38</fpage>
          , May
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ben-Zvi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Moses</surname>
          </string-name>
          .
          <article-title>Beyond lamport's happened-before: On the role of time bounds in synchronous systems</article-title>
          .
          <source>In N. A. Lynch and A. A</source>
          . Shvartsman, editors,
          <source>DISC</source>
          , volume
          <volume>6343</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>421</fpage>
          -
          <lpage>436</lpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ben-Zvi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Moses</surname>
          </string-name>
          .
          <article-title>On interactive knowledge with bounded communication</article-title>
          .
          <source>Journal of Applied Non-Classical Logics</source>
          ,
          <volume>21</volume>
          (
          <issue>3-4</issue>
          ):
          <fpage>323</fpage>
          -
          <lpage>354</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Condie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Conway</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Alvaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Hellerstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Elmeleegy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Sears</surname>
          </string-name>
          .
          <article-title>Mapreduce online</article-title>
          .
          <source>In Proceedings of the 7th USENIX conference on Networked systems design and implementation</source>
          ,
          <source>NSDI'10</source>
          , pages
          <fpage>21</fpage>
          -
          <lpage>21</lpage>
          , Berkeley, CA, USA,
          <year>2010</year>
          . USENIX Association.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fagin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Halpern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Moses</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Vardi</surname>
          </string-name>
          .
          <article-title>Reasoning About Knowledge</article-title>
          . MIT Press, Cambridge, MA, USA,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Hellerstein</surname>
          </string-name>
          .
          <article-title>The declarative imperative: experiences and conjectures in distributed logic</article-title>
          .
          <source>SIGMOD Rec</source>
          .,
          <volume>39</volume>
          :
          <fpage>5</fpage>
          -
          <lpage>19</lpage>
          ,
          <year>September 2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Interlandi</surname>
          </string-name>
          .
          <article-title>Reasoning about knowledge in distributed systems using datalog</article-title>
          .
          <source>In Datalog</source>
          , pages
          <fpage>99</fpage>
          -
          <lpage>110</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Interlandi</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Tanca</surname>
          </string-name>
          .
          <article-title>On the calm principle for bulk synchronous parallel computation</article-title>
          .
          <source>arXiv:1405</source>
          .
          <fpage>7264</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Interlandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tanca</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Bergamaschi</surname>
          </string-name>
          .
          <article-title>Datalog in time and space, synchronously</article-title>
          . In L. Bravo and M. Lenzerini, editors,
          <source>AMW</source>
          , volume
          <volume>1087</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lamport</surname>
          </string-name>
          . Time, clocks, and
          <article-title>the ordering of events in a distributed system</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>21</volume>
          (
          <issue>7</issue>
          ):
          <fpage>558</fpage>
          -
          <lpage>565</lpage>
          ,
          <year>July 1978</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lamport</surname>
          </string-name>
          .
          <article-title>Using time instead of timeout for fault-tolerant distributed systems</article-title>
          .
          <source>ACM Trans. Program. Lang. Syst.</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ):
          <fpage>254</fpage>
          -
          <lpage>280</lpage>
          , Apr.
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Olston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Reed</surname>
          </string-name>
          , U. Srivastava,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Tomkins</surname>
          </string-name>
          .
          <article-title>Pig latin: a not-so-foreign language for data processing</article-title>
          .
          <source>In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, SIGMOD '08</source>
          , pages
          <fpage>1099</fpage>
          -
          <lpage>1110</lpage>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Thusoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Sarma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chakka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Anthony</surname>
          </string-name>
          , H. Liu,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wyckoff</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Murthy</surname>
          </string-name>
          .
          <article-title>Hive: A warehousing solution over a map-reduce framework</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .,
          <volume>2</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1626</fpage>
          -
          <lpage>1629</lpage>
          , Aug.
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L. G.</given-names>
            <surname>Valiant</surname>
          </string-name>
          .
          <article-title>A bridging model for parallel computation</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>33</volume>
          (
          <issue>8</issue>
          ):
          <fpage>103</fpage>
          -
          <lpage>111</lpage>
          , Aug.
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>W.</given-names>
            <surname>Vogels</surname>
          </string-name>
          .
          <article-title>Eventually consistent</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>52</volume>
          (
          <issue>1</issue>
          ):
          <fpage>40</fpage>
          -
          <lpage>44</lpage>
          , Jan.
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>