Differential Logical Relations?
                 Part II: Increments and Derivatives

                    Ugo Dal Lago1,2 and Francesco Gavazzo1,2
                               1
                                 University of Bologna, Italy
                           2
                               INRIA Sophia Antipolis, France


        Abstract. We study the deep relations existing between differential log-
        ical relations and incremental computing, by showing how self-differences
        in the former precisely correspond to derivatives in the latter. We also
        show how differential logical relations can be seen as a powerful meta-
        theoretical tool in the analysis of incremental computations, enabling an
        easy proof of soundness of differentiation.


1     Introduction

One of the major challenges programming language theory is facing these days
is the development of adequate abstractions to deal with the (highly) increas-
ing complexity and heterogeneity of modern software systems. Indeed, since the
very birth of the discipline, researchers have been focused on the design of com-
positional techniques for software analysis, whereby one can study the overall
behavior of a system by inspecting its constituent parts in isolation. A prime
example of the successfulness of such an analysis is given by the theory of pro-
gram equivalence. Notwithstanding its successful history, program equivalence
is revealing some weaknesses when applied to nowadays software, where exact
reasoning about components is, either because of the very nature of the soft-
ware involved or because of the cost of such an analysis, oftentimes not feasible.
Examples witnessing such weaknesses are given by the fields of probabilistic com-
puting, where small perturbations in probabilities break equivalence, numerical
computing, where program implementing numerical functions can be optimized
at the price of introducing an acceptable error in the output, and, more generally,
approximate computing [24] where accuracy of the result is partially sacrificed
to gain efficiency.
    The common pattern behind all the aforementioned examples is the shift from
an exact analysis of software, whereby equivalent pieces of software are inter-
changeable within any system, to an approximate analysis of software, whereby
non-equivalent pieces of software are replaced within a system at the price of
producing an acceptable error, and thus an approximately correct result. Moving
from an exact to an approximate analysis of programs poses several challenges
?
    Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
to programming language semantics, the main one arguably concerning compo-
sitionality. In fact, once we replace a program P with a non-equivalent one Q
in a system C[−], then C[−] may amplify the error introduced by the replace-
ment of P with Q, this way invalidating compositionality of the analysis. This
point becomes evident when studying (higher-order) program metric or program
distance [27, 9, 14, 10]: if the distance between P and Q is upper bounded by a
number ε > 0, then it may not be so for C[P ] and C[Q].
    Motivated by these general observations, researchers are showing an increas-
ing interest in quantitative analysis of programs, with a special focus on differen-
tial properties of programs. Although the expression differential has no precise
meaning in this context, we may identify it with the collection of properties
relating local and global changes in software. Thus, for instance, we can think
of the (error produced by the) replacement of P with Q as a local change, and
investigate its relationship with the global change we observe between C[P ] and
C[Q].
    The study of such differential properties has led, oftentimes abusing terminol-
ogy, to the development of several notions of a program derivative. Among those,
some of the main ones one encounters when looking at the relevant literature
are the following.
  • Those coming from the field of automatic differentiation [4], which aim to
    extend the notion of a derivative one finds in mathematical analysis [30] to
    arbitrary programs. Examples are given by [6, 1, 29] (as well as by references
    therein).
  • Those coming from the differential λ-calculus [13], whose original motivations
    rely on quantitative semantics [19] and linear logic [18].
  • Those coming from incremental computing [26, 25], which aim to find ways
    to incrementally compute outputs as inputs changes.
  • Those coming from differential logical relations [21] via the notion of a self-
    difference, which aim to provide context-sensitive compositional distances
    between programs.
    It is thus natural to ask whether there are connections between such notions.
Although for some of the aforementioned cases the answer seems to be negative
(for instance, the derivatives one finds in incremental computing are generaliza-
tions of finite differences [28], whereas the ones found in calculi for automatic
differentiation are actual derivatives), others have conceptual similarities. This
is the case for differential logical relations and incremental computing, as both
study the relationship between input and output changes.
    In this paper we study such similarities and establish a formal connection
between differential logical relations and incremental computing, by showing how
self-differences in the former precisely correspond to derivatives in the latter. In
fact, as we will see, the derivative of a program P can be seen as a way to
quantify how much changes in the input of P influence changes in its output,
this way acting as a self-difference for P .
    Besides its conceptual implications, the advantages of such a correspondence
are twofold. On the one hand, differential logical relations qualify as a lightweight
operational technique for incremental computing: we witness that by giving a
proof of soundness of differentiation for the incremental λ-calculus of Cai et al.
[7]. On the other hand, we can use results coming from incremental computing
to go beyond the current theory of differential logical relations. For instance, it is
possible to read the work by Giarrusso, Régis-Gianas, and Schuster [17], where
step-indexed logical relations are introduced for proving correctness of untyped
program derivatives, to define a form of step-indexed differential logical relations,
this way giving differential semantics to calculi with full recursion (something
not possible in the original formulation of differential logical relations [21]).

Structure of the Paper Section 2 introduces the target calculus of this work,
as well as differential logical relations and the incremental λ-calculus. In Sec-
tion 3, we establish a formal connection between differential logical relations
and the incremental λ-calculus by showing that program derivatives (in the
sense of incremental computing) are self-distances (in the sense of differential
logical relations). Additionally, we give a differential logical relation-based proof
of soundness of differentiation, the main result in the theory of incremental com-
puting.


2     Preliminaries: Differential Logical Relations and the
      Incremental λ-calculus
In this section we shortly review the main ideas behind differential logical rela-
tions (DLRs) and the incremental λ-calculus. In order to do so, we introduce the
vehicle calculus of (the first part) of this work, namely a call-by-value simply
typed λ-calculus with a primitive type for real numbers, which we denote by
STR . The calculus is standard and it is essentially the same calculus used in [21].
We summarize the syntax and static semantics of STR in Figure 1, where we
assume to have constants ϕ for any function3 ϕ : Rn → R and primitives r for
any real number r.
    We use letters t, s, . . . to range over terms, and v, w, . . . to range over values.
Additionally, we follow standard syntactic conventions as in, e.g., [3]. In partic-
ular, we work with terms modulo renaming of bound variables, and denote by
t[v/x] the capture-avoiding substitution of v for x in t. Finally, we introduce the
following notation and refer to terms in Λ• as closed terms or programs. Simi-
larly, we refer to values in V • as closed values, and we use notations such as Λσ
and Vσ• with their natural meanings.
    The dynamics of STR is given by a standard call-by-value operational seman-
tics, defined in Figure 2, where for a function ϕ : Rn → R and a number r ∈ R
we write ϕr : Rn−1 → R for the mapping (r1 , . . . , rn−1 ) 7→ ϕ(r, r1 , . . . , rn−1 ).
Notice, in particular, that ϕ r1 · · · rn eventually reduces to ϕ(r1 , . . . , rn ).
    Since STR is simply-typed, standard reducibility [20] suffices to show that
STR is strongly normalizing. In particular, any program t evaluates to a (unique)
3
    When dealing with standard arithmetic operator, such as +, we will overload the
    notation and write + in place of +.
                                         t, s ::= x | r | ϕ | ht, si | λx.t | out1 t | out2 t | ts
    σ, τ ::= R | σ × τ | σ → τ
                                        v, w ::= x | r | ϕ | hv, wi | λx.t

                  Γ, x : σ ` x : σ        Γ `r:R         Γ ` ϕ : |R → ·{z
                                                                        · · → R} → R
                                                                         n

              Γ ` t1 : σ1 Γ ` t2 : σ2            Γ ` t : σ1 × σ2        Γ ` t : σ1 × σ2
               Γ ` ht1 , t2 i : σ1 × σ2          Γ ` out1 t : σ1        Γ ` out2 t : σ2
                          Γ, x : σ ` t : τ         Γ `t:σ→τ Γ `s:σ
                         Γ ` λx.t : σ → τ               Γ ` ts : τ


                               Fig. 1. Syntax and Statics of STR


closed value v—that we indicate as nf(t)—in a finite number of →-steps (notation
t ⇓ v). We write t ⇓n v, for n ∈ N, to state that t evaluates to v in n number to
→-steps and →∗ for the reflexive and transitive closure of →.


                                                                           t → t0         s → s0
      (λx.t)v → t[v/x]       ϕ r → ϕr          outi hv1 , v2 i → vi       ts → t0 s      vs → vs0

                        t → t0                  s → s0                t → t0
                   ht, si → ht0 , si       hv, si → hv, s0 i     outi t → outi t0


                                       Fig. 2. Dynamics of STR


2.1     Program Equivalence and Program Distance: an Overview


Despite its simplicity, STR allows us to justify the shift from program equiva-
lence—that is the family of notions concerning equality between programs—to
program distance, in general, and to differential logical relations, in particular.
    First, let us define a suitable notion of program equivalence for STR programs.
Due to its simple nature, STR allows us to choose among a large family of no-
tions of program equivalence, ranging from denotationally-based equivalences to
operationally-based ones. Here we choose extensional or applicative equivalence 4 .

4
    This is nothing but a simplification of Abramsky’s applicative bisimilarity [2] relying
    on strong normalization of STR and its simple type system.
Definition 1. Define the type-indexed family of relations (∼
                                                           =Λσ ⊆ Λ•σ × Λ•σ , ∼
                                                                             =Vσ ⊆
Vσ × Vσ )σ as follows (where i ∈ {1, 2}):
 •    •


                                          t∼=Λσ t0 ⇐⇒ nf(t) ∼=Vσ nf(t0 )
                   hv1 , v2 i ∼
                              =Vσ1 ×σ2                        ∼V w i
                                         hw1 , w2 i ⇐⇒ ∀i. vi =     σi
                                        r∼=VR r0 ⇐⇒ r = r0
                                      v∼
                                       =V σ→τ v 0 ⇐⇒ ∀w ∈ V • . vw ∼
                                                                   =Λ v 0 w
                                                                   σ         τ

    It is well-known that extensional equivalence is a congruence relation, this
way enabling compositional reasoning about program behaviors: s ∼   =σ s0 entails
t[s/x] ∼       0
        =τ t[s /x] for any term x : σ ` t : τ . Unfortunately, one soon realizes
that due to the presence of (constants for) real-numbers, program equivalence
is a too coarse notion for reasoning about STR programs. For it is desirable to
regard two programs of type R whose outputs are ε apart to be themselves ε
apart, rather than just state that the two are not equivalent.5
    The natural way to overcome this problem is to refine ∼ =σ into a map δσ :
Λ•σ × Λ•σ → R following Lawvere’s correspondence between ordered sets and
(generalized) metric spaces [22]. Accordingly, we obtain the following maps:

    δRV (r, r0 ) , r0 − r                   δσV1 ×σ2 (hv1 , v2 i, hw1 , w2 i) , max δσVi (vi , wi )
                                                                              i∈{1,2}
            0                     0
      Λ            V
    δσ (t, t ) , δσ (nf(t), nf(t ))                         δσ→τ (v, v ) , sup δτΛ (vw, v 0 w)
                                                              V          0
                                                                              w∈Vσ•


which can be easily proved to be generalized metrics6 [22]. Of course, in order for
δ to serve its purpose, we also need it to support the (quantitative refinement of
the) compositionality principle ensured by ∼  =. As compositionality of ∼
                                                                        = took the
form of a congruence property, it is easy to realize to that compositionality of δ
takes the form of non-expansiveness: for all terms s, s0 ∈ Λ•σ and x : σ ` t : τ
we must have δσ (s, s0 ) ≥ δτ (t[s/x], t[s0 /x]). That is, contexts cannot amplify
distances.
    Unfortunately, we immediately see that δ fails to be non-expansive, and thus
compositional. Even worse, any reasonable non-expansive metric-like map is
bound to trivialize, meaning that it collapses to a congruence relation. Roughly,
given two terms s, s0 which are ε 6= 0 apart, for any positive real number c it is
always possible to find an open term x : σ ` t : τ such that t[s] and t[s0 ] are c
apart. For it is sufficient to take a term t using its input x enough times: once
the terms t[s] and t[s0 ] are evaluated, any time t uses x the distance between s
5
  A similar argument can be formulated for any language/calculus exhibiting, either in
  its syntax or in its semantics, some quantitative behavior. Typical examples of such
  behaviors are given by types for quantitative objects, such as numeric types, but
  also by probabilistic primitives, the latter making relevant semantic notions, such as
  termination, quantitative [8, 10–12, 15, 27, 5].
6
  Additionally, by replacing r0 − r and supr1 ,...,rn ψ(r1 , . . . , rn ) − ϕ(r1 , . . . , rn ) with
  |r0 −r| and supr1 ,...,rn |ψ(r1 , . . . , rn )−ϕ(r1 , . . . , rn )|, respectively, δσ becomes a pseu-
  dometric [31].
and s0 is detected and added to the one previously measured. Remarkably, this
holds even if any ϕ is non-expansive (i.e. 1-Lipschitz).


2.2   Differential Logical Relations

The failure of non-expansiveness of quantitative refinements of notions of pro-
gram equivalence has led researchers to propose several notions of program dis-
tance [9, 27, 14] in recent years, all aimed to restore compositionality. All the
notions proposed share a common feature: they all impose calculi linearity con-
straints, this way providing static information on the number of times a program
can use its input (and thus on how much the program can amplifies distances).
The notions of program distance thus obtained are indeed compositional, but
still have two major drawbacks. First, they are tailored for linear calculi and are
not very informative when applied to non-linear calculi (relying, e.g., on stan-
dard translations [18]). Second, and most important, they do not account for
the role of the environment in determining distances.
     Let us expand on this last point by means of an example. Consider the
(linear) programs t , λx.x and s , λx.sin x for the identity and sine function,
respectively. It is easy to see that measuring the distance between t and s as
we did when defining δ, we are forced to conclude such a distance to be ∞. In
fact, for r → ∞ we have |r − sin(r)| → ∞. This is rather unsatisfactory, as such
distance does not take into account which input the environment will actually
pass to t and s. For instance, if the environment feeds t and s with an input v
close to zero, then the distance between tv and sv should be close to 0 too, and
thus we would like to conclude that in all such cases the distance between t and
s is itself close to zero.
     Summing up, ordinary notions of program distance are not sensitive to the
context in which programs are used. This ultimately relies on the fact that mea-
suring the distance between two programs (regarded as functions) as just one
single number there is no way to give information on how such programs interact
with the environments in which they are used. Differential logical relations have
been introduced in [21] as a way to define a context-sensitive notion of program
distance on non-linear calculi. The main novelty of differential logical relations
(which was previously theorized by Westbrook and Chaudhuri [32] in the set-
ting of approximate program transformations [23]) is to consider richer notions
of distance (also called differences) between programs, whereby the difference
between two programs is, in general, not a number, but a function describing
how differences between inputs turn into differences between outputs.
     Differential logical relations take the form of (type-indexed) ternary relations
Dσ relating pairs of programs together with differences between them. When
dealing with programs of type σ → τ , differences take the form of functions
mapping input programs of type σ and differences for such programs to differ-
ences for programs of type τ . This is why here we consider a computationally-
oriented notion of difference whereby differences between programs are defined
as programs themselves (cf. [32]) rather than as semantical objects.
   We formalize these ideas by assigning to each type σ a type ∆σ whose in-
habitants are terms acting as differences between terms of type σ.
Definition 2. For any type σ, we define the type ∆σ of σ-differences as follows:

        ∆R , R;        ∆(σ × τ ) , ∆σ × ∆τ ;            ∆(σ → τ ) , σ → ∆σ → ∆τ.

    Notice, in particular, that a difference between two programs of type σ → τ
is a program taking an input of type σ and a σ-difference, and returning a
τ -difference.
    Obviously, given two programs t, t0 of type σ, not all programs of type ∆σ
can act as (meaningful) differences between t and t0 . Differential logical relations
(DLRs for short) are ternary relations specifically designed to isolate meaningful
differences between programs. More precisely, a DLR is a type-indexed family
of ternary relations D , (DΛσ , DVσ )σ , where DΛσ ⊆ Λ•∆σ × Λ•σ × Λ•σ and DVσ ⊆
V∆σ
  •
      × Vσ• × Vσ• , such that DΛσ (dt, t, t0 ) holds if and only if dt is a difference7
between t and t0 (and similarly for values).
Definition 3 (Asymmetric DLRs). A differential logical relation is a type-
indexed family of ternary relations (DΛσ ⊆ Λ•∆σ × Λ•σ × Λ•σ , DVσ ⊆ V∆σ
                                                                     •
                                                                        × Vσ• × Vσ• )σ
such that:

          DVR (dr, r, r0 ) ⇐⇒ r0 − r = dr
    DVσ1 ×σ2 (dv, v, v 0 ) ⇐⇒ ∀i ∈ {1, 2}. DVσi (outi dv, outi v, outi v 0 )
     DVσ→τ (dv, v, v 0 ) ⇐⇒ ∀dw, w, w0 . DVσ (dw, w, w0 ) =⇒ DΛτ (dv w dw, vw, v 0 w0 )
         DΛσ (dt, t, t0 ) ⇐⇒ DVσ (dv, v, v 0 ) where dt ⇓ dv, t ⇓ v, t0 ⇓ v 0 .

Remark 1. Contrary to the original formulation of DLRs [21], here we work with
asymmetric DLRs: if dt is a difference between t and t0 , then dt may not be a
difference between t0 and t. For instance, 3 is a difference between 2 and 5, as
by adding 3 to 2 we reach 5. Yet, according to such a reading, it is not true that
3 is a difference between 5 and 2 (the desired difference being, in fact, −3).
Example 1 ([21]). Let t , λx.sin x and t0 , λx.x. Then dt , λx.λdx.x + dx −
sin x is a difference between t and t0 . For, proving DVR→R (dt, t, t0 ) requires to prove
that for all dr, r, r0 such that DVR (dr, r, r0 ) (meaning that r + dr = r0 ), we have
DVR (r + dr − sin r, sin r, r0 ), i.e. r + dr − sin r + sin r = r0 , which is indeed the
case. Observe how dt x ε evaluates to a real number which is indeed small when
the two arguments are themselves close to 0.
    As already remarked, DLRs have been introduced with the goal of developing
a compositional theory of program distance. This goal is achieved by the so-called
Fundamental Lemma [21].
Lemma 1 (Fundamental Lemma, Version 1). For any program t ∈ Λ•σ
there exists a self-difference dt for it. That is, Dσ (dt, t, t).
7
    Following conventions in, e.g., [7], we use the notation dt, ds, . . . for term differences—
    i.e. terms of type ∆σ.
     Lemma 1 enables compositional reasoning on program differences. Informally,
by regarding a context x : σ ` t : τ as a term λx.t we are guaranteed a self-
difference dt for t to exist, so that given two programs s, s0 of type σ with
difference ds between them, one can compute the difference between t[s/x] and
t[s0 /x] starting from s, ds, and dt alone.

2.3   The Incremental λ-calculus and Program Derivatives
Albeit enabling compositional reasoning on program differences, Lemma 1 has
the major drawback of guaranteeing the existence of self-distances without giving
any explicit information on how to construct them. As we will see in the next
section, the self-distances of Lemma 1 are precisely the program derivatives used
in the incremental λ-calculus.
    The incremental λ-calculus is a formalism introduced by Cai et al. [7] as a
foundational calculus for incremental computation [26, 25]. Roughly, suppose we
are given a program f regarded as a function, and an input a (think, for instance,
of a as a database). Let us now suppose to compute f (a) and then to modify the
input a by a change da, this way obtaining a new input a ⊕ da (for instance, we
may add a new entry to the database a). Incremental computing seeks for ways to
obtain the result of f (a ⊕ da) without computing f on the new input a ⊕ da from
scratch. In fact, sometimes it is indeed possible to obtain such a result in terms
of f (a) and f 0 (a, da), for a suitable function8 f 0 . For instance, let f (x) , x2 and
suppose we have computed f (a), for some a. Let us now change a to a+da. When
asked to compute f (a + da) we can take advantage of having already computed
f (a) = a2 by observing that f (a + da) = a2 + 2ada + da2 = f (a) + f 0 (a, da),
where f 0 (x, dx) , 2xdx + dx2 .
    In order to provide a formal foundation for higher-order incremental com-
putation, Cai et al. [7] studied incrementalization of a simply-typed λ-calculus
similar to the one introduced in the previous section. More precisely, for any
type σ a type of σ-changes coinciding with ∆σ is introduced, as well as an op-
erator ⊕ (called change update) building an expression t ⊕ dt ∈ Λσ from an
expression t ∈ Λσ and a change dt ∈ Λ∆σ . In order to account for incremental-
ization, they also introduced the so-called derivative 9 Dt ∈ Λ∆σ of an expression
t ∈ Λσ , and showed that for all terms t ∈ Λ•σ→τ , s ∈ Λ•σ , and σ-change ds, one
has: t (s ⊕ ds) ≡ (ts) ⊕ (Dt s ds), where ≡ stands for denotational equality. All
the aforementioned results are proved by means of denotational semantics, al-
though some operationally-based proofs employing techniques resembling DLRs
are given in Giarrusso’s PhD thesis [16].
    In the next section, we will show how we can easily prove such results using
DLRs, and, dually, how by identifying differences with changes we can improve
8
  Obviously, to be practically useful, the map f 0 should be such that computing
  f 0 (a, da) is cheaper than computing f (a ⊕ da).
9
  The terminology is misleading. For instance, we should not think of the derivative
  Dϕ of a term ϕ : R → R as the syntactic counterpart of the derivative of ϕ : R → R.
  Rather, Dϕ represents the finite difference [28] of ϕ, i.e. the map ∆ϕ : R × R → R
  defined by ∆ϕ(x, dx) , ϕ(x + dx) − ϕ(x).
on the current theory of program differences. In order to do so, however, we first
need to formally introduce the update operator ⊕ and the notion of a program
derivative. We begin recalling a couple of basic definitions from finite difference
calculus.

Definition 4. Given functions ϕ : Rn → R and dϕ : (R × R)n → R, define the
maps ϕ ⊕ dϕ : Rn → R and ∆ϕ : (R × R)n → R by:

            (ϕ ⊕ dϕ)(x1 , . . . , xn ) , ϕ(x1 , . . . , xn ) + dϕ((x1 , 0), . . . , (xn , 0));
   ∆ϕ((x1 , dx1 ), . . . , (xn , dxn )) , ϕ(x1 + dx1 , . . . , xn + dxn ) − ϕ(x1 , . . . , xn ).

   The map ∆ϕ is known as the finite difference of ϕ. Notice that ϕ ⊕ ∆ϕ = ϕ.

Definition 5. Define the partial operator ⊕ : Λ → Λ → Λ as follows:

           x ⊕ dx , x;                           (outi t) ⊕ (outi dt) , outi (t ⊕ dt);
           r ⊕ dr , r + dr;                      (λx.t) ⊕ (λx.λdx.dt) , λx.(t ⊕ dt);
          ϕ ⊕ dϕ , ϕ ⊕ dϕ;                              (ts) ⊕ (dt s ds) , (t ⊕ dt) (s ⊕ ds).
 ht, si ⊕ hdt, dsi , ht ⊕ dt, s ⊕ dsi;

    The definition of t ⊕ dt may appear weird at first, but it will become clear
once the notion of a derivative is introduced. Intuitively, given a term t and a
change dt, we can see t ⊕ dt as the term obtained by changing t according to
dt. Clearly, this is possible only if dt has the ‘right’ structure (for instance, it
would be meaningless to do something like changing a function according to a
number). We immediately notice that if v is a value and v ⊕ dv is defined, then
dv and v ⊕ dv are values too. Moreover, the following typing rule is admissible,
whenever t ⊕ dt is defined:
                                 Γ ` t : σ dΓ, Γ ` dt : ∆σ
                                        Γ ` t ⊕ dt : σ

where for Γ = x1 : σ1 , . . . , xn : σn , we have dΓ , dx1 : ∆σ1 , . . . , dxn : ∆σn .
   Next, we define the notion of a derivative of a term.

Definition 6. The derivative Dt of a term t is thus defined:

             Dx , dx;                        Dr , 0;                      D(ts) , Dt s Ds.
             Dϕ , ∆ϕ;                   Dht, si , hDt, Dsi;
     D(outi t) = outi Dt;              D(λx.t) , λx.λdx.Dt;

    Observe that we can indeed think of Dt as the generalization of finite dif-
ferences to arbitrary programs. Moreover, we easily see that if Γ ` t : σ, then
Γ, dΓ ` Dt : ∆σ and that t ⊕ Dt is defined and equal to t itself.
3    Bridging the Gap
In this section we relate DLRs with the incremental λ-calculus we introduced in
the previous section. We do so acting on two orthogonal axes. On the one hand,
we show that derivatives are precisely the self-distances of Lemma 1. That is, for
any program t, Dt is a self-distance for t. This result allows us to strength the
fundamental lemma of DLRs (Lemma 1), as now self-differences can be effectively
computed. On the other hand, we prove by means of DLRs a major result on
the incremental λ-calculus, namely soundness of differentiation [7]. To the best
of the authors’ knowledge, all proofs of such a result rely on either denotational
semantics or logical relations tailored for such purpose (see Remark 2).
    Let us begin proving that derivatives are actually self-differences. In order
to achieve such a result, we have to first extend the notion of a DLR to open
terms [21]. Given an environment Γ , we denote by S(Γ ) the collection of Γ -
substitutions, i.e. the collection of maps ρ mapping variables (x : σ) ∈ Γ to closed
values ρ(x) ∈ Vσ• . In particular, we use the notation dρ to denote substitutions
in S(dΓ ).
Definition 7. We extend the notion of a DLR to substitutions over an envi-
ronment Γ as follows: DΓ (dρ, ρ, ρ0 ) ⇐⇒ ∀(x : σ) ∈ Γ. DVσ (dρ(dx), ρ(x), ρ0 (x)),
where ρ, ρ0 ∈ S(Γ ) and dρ ∈ S(dΓ ).
   As it is customary, we write t[ρ] for the application of the substitution ρ to
the term t, and ρ[x 7→ v] for the substitution mapping x to v and behaving as ρ
otherwise. Before proving our refinement of Lemma 1, let us observe that DLRs
are closed under reduction, in the following sense.
Lemma 2. The following holds for all closed terms:

                 DΛσ (dt, t, t0 ) ∧ t →∗ s ∧ t0 →∗ s0 =⇒ DΛσ (dt, s, s0 );
                 DΛσ (dt, s, s0 ) ∧ s →∗ t ∧ s0 →∗ t0 =⇒ DΛσ (dt, t, t0 ).

    We are now ready to prove our new version of the Fundamental Lemma.
Lemma 3 (Fundamental Lemma, Version 2). For any program t ∈ Λ•σ we
have Dσ (Dt, t, t).
Proof (sketch). The thesis follows from the stronger statement: for any term
Γ ` t : σ and value Γ ` v : σ we have:

∀dρ, ρ, ρ0 . DΓ (dρ, ρ, ρ0 ) =⇒ DVσ (Dv[ρ, dρ], v[ρ], v[ρ0 ]) ∧ DΛσ (Dt[ρ, dρ], t[ρ], t[ρ0 ]).

The proof of the latter is a routine induction on t and v.                                   t
                                                                                             u
    Notice how Lemma 3 improves the compositionality principle of DLRs. Given
a term x : σ ` t : τ and two values ` v, v 0 : σ such that DVσ (dv, v, v 0 ) the impact
of replacing v with v 0 in t can be computed as Dt[v/x, dv/dx]. Next, we show
how the incremental λ-calculus can benefit from DLRs by showing how the latter
support an easy proof of soundness of differentiation.
Theorem 1 (Soundness of Differentiation [7, 17]). For all t ∈ Λ•σ→τ and
values v, v 0 , dv such that DVσ (dv, v, v 0 ), we have: tv 0 ∼
                                                              = (tv) ⊕ (Dt v dv).

    Our proof of Theorem 1 is based on the following result which states that
changes indeed behave as such. Recall that ∼  = extends to open terms by stipu-
lating that for Γ ` t, t0 : σ we have t ∼Λ 0
                                        =σ t iff t[ρ] ∼
                                                      =Λσ t0 [ρ], for any substitution
ρ ∈ S(Γ ) (and similarly for values).

Proposition 1. The following holds for all (possibly open) terms t, t0 , dt and
values v, v 0 , dv such that t ⊕ dt and v ⊕ dv is defined.

         DΛσ (dt, t, t0 ) =⇒ t0 ∼
                                = t ⊕ dt;     DVσ (dv, v, v 0 ) =⇒ v 0 ∼
                                                                       = v ⊕ dv.

Proof (sketch). The proof is by induction on σ, the relevant case being the one of
values. We show how to handle the case for arrow types. Assume Γ ` t, t0 : σ → τ .
We have to show that for any ρ ∈ S(Γ ), t0 [ρ] ∼     = (t ⊕ dt)[ρ]. First, observe that
we have the following general result (by induction on t), where (Dρ)(dx) ,
Dρ(x): (t ⊕ dt)[ρ] = t[ρ] ⊕ dt[ρ, Dρ]. By Lemma 3, we have DΓ (Dρ, ρ, ρ), hence
DVσ→τ (dt[ρ, Dρ], t[ρ], t0 [ρ]). In particular, we must have t[ρ] = λx.s, t0 [ρ] = λx.s0 ,
and dt[ρ, Dρ] = λx.λdx.ds, for some s, s0 , and ds. Since (λx.s) ⊕ (λx.λdx.ds) =
λx.(s ⊕ ds) (notice that this term is indeed defined, as it is obtained from t ⊕ dt,
which is defined by hypothesis, replacing variables y, dy with closed values v,
Dv), in order to prove the thesis we have to show s0 [v/x] ∼   = (s ⊕ ds)[v/x] for any
closed value v of type σ. Since (s ⊕ ds)[v/x] = s[v/x] ⊕ ds[v/x, Dv/dx] we obtain
the thesis from DVσ→τ (dt[ρ, Dρ], t[ρ], t0 [ρ]) and DVσ (Dv, v, v), the latter being a
consequence of Lemma 3.                                                                t
                                                                                       u

    We can finally prove soundness of differentiation.

Proof (Theorem 1). Assume DVσ (dv, v, v 0 ). By Lemma 3 we have Dσ→τ (Dt, t, t),
and thus Dτ (Dt s ds, ts, ts0 ), by Lemma 2. We conclude that tv 0 ∼
                                                                   = (tv)⊕(Dt v dv)
from Proposition 1.                                                              t
                                                                                 u

Remark 2. To the best of the authors’ knowledge, all proofs of Theorem 1 in
the literature are based on either denotational semantics or on logical relations
resembling DLRs, but specifically extended with a clause requiring t ⊕ dt = t0
for all related terms dt, t, t0 , at any type [17, 16]. Notice the use of syntactic
equality: the reason behind such a choice is that the logical relation obtained
is meant to relate only programs with their derivative (in which case we indeed
have t ⊕ Dt = t), rather than as a tool to reason about program differences.


4    Related Work

Differential logical relations have been introduced by the authors and Yoshimizu
[21], building over intuitions by Westbrook and Chaudhuri [32] and are cur-
rently under investigation. Differently from the ones considered in this work,
the first formulation of differential logical relations [21] is symmetric and con-
siders semantical difference spaces, so that differences between programs are
semantical objects (such as numbers and functions), rather than programs them-
selves. Whereas we have found that working with asymmetric DLRs makes proofs
clearer (besides, asymmetry is in line with Lawvere’s analysis of the notion of a
distance [22]), working with syntactic difference spaces does not really affect our
results. In fact, we could consider semantic-based difference spaces and show that
the denotation of a derivative of a program is a self-difference for the program.
    The incremental λ-calculus has been introduced by Cai et al. [7] as a simply-
typed calculus, and by Giarrusso et al. [17] as an untyped calculus. The former
work introduces the notions of a program derivative and change update, and
gives a denotational proof of soundness of differentiation. Operationally-based
proofs of the same result are given in Giarrusso PhD’s thesis [17, 16] by means
of logical relations (see Remark 2). Remarkably, both Giarrusso’s thesis [16] and
the work by Giarrusso et al. [17] use ternary logical relations nearly identical
to differential logical relations to relate programs with changes between them.
Moreover, the logical relations introduced in the aforementioned papers have
been mechanized in CoQ. The authors believe it is important to stress how
essentially the same technique has independently emerged in different fields (and
with different purposes) to prove two different kinds of differential properties of
programs.


5   Conclusion

In this work we have established a formal connection between differential logi-
cal relations and the incremental λ-calculus of Cai et al. [7], whereby the self-
differences of the former are identified with the program derivatives of the lat-
ter. Albeit the results proved here are not technically involved, by establishing a
formal connection between two different fields they improve the current under-
standing of differential properties of programs, such an understanding being still
in its infancy. The fact that essentially the same technique has been indepen-
dently developed in different fields, one looking at software optimization and the
other studying semantical notions of distance between programs, witnesses that,
at least in the authors’ opinion, the technique deserves to be further investigated.
    In addition to its conceptual relevance, the connection established in the
present work also allows us to obtain technical improvements both on the the-
ory of incremental λ-calculus and on the one of differential logical relations.
Concerning the former, we have showed how differential logical relations consti-
tute a lightweight operational technique for incremental computing, and we have
witnessed that by giving a new, relatively easy proof of soundness of differenti-
ation. Concerning the latter, we have strengthened the fundamental lemma of
DLRs [21] by showing how program derivatives constitute self-differences, this
way reaching an higher level of compositionality. A further consequence of such
a connection is the extension of DLRs to calculi with full recursion by means of
the step-indexed logical relations of Giarrusso, Régis-Gianas, and Schuster [17].
Acknowledgment The authors are supported by the ERC Consolidator Grant
DLV-818616 DIAPASoN as well as by the ANR project 16CE250011 REPAS.


References
 1. Martı́n Abadi and Gordon D. Plotkin. A simple differentiable programming lan-
    guage. PACMPL, 4(POPL):38:1–38:28, 2020.
 2. Samson Abramsky. The lazy lambda calculus. In D. Turner, editor, Research
    Topics in Functional Programming, pages 65–117. Addison Wesley, 1990.
 3. Hendrik Pieter Barendregt. The lambda calculus - its syntax and semantics, volume
    103 of Studies in logic and the foundations of mathematics. North-Holland, 1985.
 4. Michael Bartholomew-Biggs, Steven Brown, Bruce Christianson, and Laurence
    Dixon. Automatic differentiation of algorithms. Journal of Computational and
    Applied Mathematics, 124(1):171 – 190, 2000. Numerical Analysis 2000. Vol. IV:
    Optimization and Nonlinear Equations.
 5. Frank Van Breugel and James Worrell. A behavioural pseudometric for probabilis-
    tic transition systems. Theoretical Computer Science, 331(1):115–142, 2005.
 6. Aloı̈s Brunel, Damiano Mazza, and Michele Pagani. Backpropagation in the simply
    typed lambda-calculus with linear negation. PACMPL, 4(POPL):64:1–64:27, 2020.
 7. Yufei Cai, Paolo G. Giarrusso, Tillmann Rendel, and Klaus Ostermann. A theory
    of changes for higher-order languages: incrementalizing λ-calculi by static differen-
    tiation. In Proc. of PLDI 2014, pages 145–155, 2014.
 8. Konstantinos Chatzikokolakis, Daniel Gebler, Catuscia Palamidessi, and Lili Xu.
    Generalized bisimulation metrics. In Proc. of CONCUR 2014, pages 32–46, 2014.
 9. Raphaëlle Crubillé and Ugo Dal Lago. Metric reasoning about λ-terms: The general
    case. In Proc. of ESOP 2017, pages 341–367, 2017.
10. Arthur Azevedo de Amorim, Marco Gaboardi, Justin Hsu, Shin-yaKatsumata, and
    Ikram Cherigui. A semantic account of metric preservation. In Proc. of POPL 2017,
    pages 545–556, 2017.
11. Josee Desharnais, Vineet Gupta, Radha Jagadeesan, and Prakash Panangaden.
    Metrics for labelled markov processes. Theoretical Computer Science, 318(3):323–
    354, 2004.
12. Wenjie Du, Yuxin Deng, and Daniel Gebler. Behavioural pseudometrics for non-
    deterministic probabilistic systems. In Proc. of SETTA 2016, pages 67–84, 2016.
13. Thomas Ehrhard and Laurent Regnier. The differential lambda-calculus. Theoret-
    ical Computer Science, 309(1-3):1–41, 2003.
14. Francesco Gavazzo. Quantitative behavioural reasoning for higher-order effectful
    programs: Applicative distances. In Proc. of LICS 2018, pages 452–461, 2018.
15. Daniel Gebler, Kim G. Larsen, and Simone Tini. Compositional bisimulation met-
    ric reasoning with probabilistic process calculi. Logical Methods in Computer Sci-
    ence, 12(4), 2016.
16. Paolo G. Giarrusso. Optimizing and incrementalizing higher-order collection
    queries by AST transformation. PhD thesis, University of Tbingen, 2018.
17. Paolo G. Giarrusso, Yann Régis-Gianas, and Philipp Schuster. Incremental
    lambda-calculus in cache-transfer style - static memoization by program trans-
    formation. In Proc. of ESOP 2019, pages 553–580, 2019.
18. Jean-Yves Girard. Linear logic. Theoretical Computer Science, 50:1–102, 1987.
19. Jean-Yves Girard. Normal functors, power series and λ-calculus. Annals of Pure
    and Applied Logic, 37(2):129–177, 1988.
20. Jean-Yves Girard, Yves Lafont, and Paul Taylor. Proofs and Types. Cambridge
    Tracts in Theoretical Computer Science. Cambridge University Press, 1989.
21. Ugo Dal Lago, Francesco Gavazzo, and Akira Yoshimizu. Differential logical rela-
    tions, part I: the simply-typed case. In Proc. of ICALP 2019, pages 111:1–111:14,
    2019.
22. F. William Lawvere. Metric spaces, generalized logic, and closed categories. Ren-
    diconti del Seminario Matematico e Fisico di Milano, 43:135–166, 1973.
23. Sasa Misailovic, Daniel M. Roy, and Martin C. Rinard. Probabilistically accurate
    program transformations. In In Proc. of SAS 2011, pages 316–333, 2011.
24. Sparsh Mittal. A survey of techniques for approximate computing. ACM Comput-
    ing Surveys, 48(4):62:1–62:33, 2016.
25. Robert Paige and Shaye Koenig. Finite differencing of computable expressions.
    ACM Trans. Program. Lang. Syst., 4(3):402–454, 1982.
26. Ganesan Ramalingam and Thomas W. Reps. A categorized bibliography on incre-
    mental computation. In Proc. of POPL 1993, pages 502–510, 1993.
27. Jason Reed and Benjamin C. Pierce. Distance makes the types grow stronger: a
    calculus for differential privacy. In Proc. of ICFP 2010, pages 157–168, 2010.
28. Clarence Hudson Richardson. An Introduction to the Calculus of Finite Differences.
    New York, 1954.
29. Amir Shaikhha, Andrew Fitzgibbon, Dimitrios Vytiniotis, and Simon Peyton
    Jones. Efficient differentiable programming in a functional array-processing lan-
    guage. PACMPL, 3(ICFP):97:1–97:30, 2019.
30. Michael Spivak. Calculus On Manifolds: A Modern Approach To Classical Theo-
    rems Of Advanced Calculus. Avalon Publishing, 1971.
31. Lynn Arthur Steen and Jr. J. Arthur Seebach. Counterexamples in Topology. Dover
    books on mathematics. Dover Publications, 1995.
32. Edwin M. Westbrook and Swarat Chaudhuri. A semantics for approximate pro-
    gram transformations. CoRR, abs/1304.5531, 2013. URL: http://arxiv.org/
    abs/1304.5531.