Differential Logical Relations? Part II: Increments and Derivatives Ugo Dal Lago1,2 and Francesco Gavazzo1,2 1 University of Bologna, Italy 2 INRIA Sophia Antipolis, France Abstract. We study the deep relations existing between differential log- ical relations and incremental computing, by showing how self-differences in the former precisely correspond to derivatives in the latter. We also show how differential logical relations can be seen as a powerful meta- theoretical tool in the analysis of incremental computations, enabling an easy proof of soundness of differentiation. 1 Introduction One of the major challenges programming language theory is facing these days is the development of adequate abstractions to deal with the (highly) increas- ing complexity and heterogeneity of modern software systems. Indeed, since the very birth of the discipline, researchers have been focused on the design of com- positional techniques for software analysis, whereby one can study the overall behavior of a system by inspecting its constituent parts in isolation. A prime example of the successfulness of such an analysis is given by the theory of pro- gram equivalence. Notwithstanding its successful history, program equivalence is revealing some weaknesses when applied to nowadays software, where exact reasoning about components is, either because of the very nature of the soft- ware involved or because of the cost of such an analysis, oftentimes not feasible. Examples witnessing such weaknesses are given by the fields of probabilistic com- puting, where small perturbations in probabilities break equivalence, numerical computing, where program implementing numerical functions can be optimized at the price of introducing an acceptable error in the output, and, more generally, approximate computing [24] where accuracy of the result is partially sacrificed to gain efficiency. The common pattern behind all the aforementioned examples is the shift from an exact analysis of software, whereby equivalent pieces of software are inter- changeable within any system, to an approximate analysis of software, whereby non-equivalent pieces of software are replaced within a system at the price of producing an acceptable error, and thus an approximately correct result. Moving from an exact to an approximate analysis of programs poses several challenges ? Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). to programming language semantics, the main one arguably concerning compo- sitionality. In fact, once we replace a program P with a non-equivalent one Q in a system C[−], then C[−] may amplify the error introduced by the replace- ment of P with Q, this way invalidating compositionality of the analysis. This point becomes evident when studying (higher-order) program metric or program distance [27, 9, 14, 10]: if the distance between P and Q is upper bounded by a number ε > 0, then it may not be so for C[P ] and C[Q]. Motivated by these general observations, researchers are showing an increas- ing interest in quantitative analysis of programs, with a special focus on differen- tial properties of programs. Although the expression differential has no precise meaning in this context, we may identify it with the collection of properties relating local and global changes in software. Thus, for instance, we can think of the (error produced by the) replacement of P with Q as a local change, and investigate its relationship with the global change we observe between C[P ] and C[Q]. The study of such differential properties has led, oftentimes abusing terminol- ogy, to the development of several notions of a program derivative. Among those, some of the main ones one encounters when looking at the relevant literature are the following. • Those coming from the field of automatic differentiation [4], which aim to extend the notion of a derivative one finds in mathematical analysis [30] to arbitrary programs. Examples are given by [6, 1, 29] (as well as by references therein). • Those coming from the differential λ-calculus [13], whose original motivations rely on quantitative semantics [19] and linear logic [18]. • Those coming from incremental computing [26, 25], which aim to find ways to incrementally compute outputs as inputs changes. • Those coming from differential logical relations [21] via the notion of a self- difference, which aim to provide context-sensitive compositional distances between programs. It is thus natural to ask whether there are connections between such notions. Although for some of the aforementioned cases the answer seems to be negative (for instance, the derivatives one finds in incremental computing are generaliza- tions of finite differences [28], whereas the ones found in calculi for automatic differentiation are actual derivatives), others have conceptual similarities. This is the case for differential logical relations and incremental computing, as both study the relationship between input and output changes. In this paper we study such similarities and establish a formal connection between differential logical relations and incremental computing, by showing how self-differences in the former precisely correspond to derivatives in the latter. In fact, as we will see, the derivative of a program P can be seen as a way to quantify how much changes in the input of P influence changes in its output, this way acting as a self-difference for P . Besides its conceptual implications, the advantages of such a correspondence are twofold. On the one hand, differential logical relations qualify as a lightweight operational technique for incremental computing: we witness that by giving a proof of soundness of differentiation for the incremental λ-calculus of Cai et al. [7]. On the other hand, we can use results coming from incremental computing to go beyond the current theory of differential logical relations. For instance, it is possible to read the work by Giarrusso, Régis-Gianas, and Schuster [17], where step-indexed logical relations are introduced for proving correctness of untyped program derivatives, to define a form of step-indexed differential logical relations, this way giving differential semantics to calculi with full recursion (something not possible in the original formulation of differential logical relations [21]). Structure of the Paper Section 2 introduces the target calculus of this work, as well as differential logical relations and the incremental λ-calculus. In Sec- tion 3, we establish a formal connection between differential logical relations and the incremental λ-calculus by showing that program derivatives (in the sense of incremental computing) are self-distances (in the sense of differential logical relations). Additionally, we give a differential logical relation-based proof of soundness of differentiation, the main result in the theory of incremental com- puting. 2 Preliminaries: Differential Logical Relations and the Incremental λ-calculus In this section we shortly review the main ideas behind differential logical rela- tions (DLRs) and the incremental λ-calculus. In order to do so, we introduce the vehicle calculus of (the first part) of this work, namely a call-by-value simply typed λ-calculus with a primitive type for real numbers, which we denote by STR . The calculus is standard and it is essentially the same calculus used in [21]. We summarize the syntax and static semantics of STR in Figure 1, where we assume to have constants ϕ for any function3 ϕ : Rn → R and primitives r for any real number r. We use letters t, s, . . . to range over terms, and v, w, . . . to range over values. Additionally, we follow standard syntactic conventions as in, e.g., [3]. In partic- ular, we work with terms modulo renaming of bound variables, and denote by t[v/x] the capture-avoiding substitution of v for x in t. Finally, we introduce the following notation and refer to terms in Λ• as closed terms or programs. Simi- larly, we refer to values in V • as closed values, and we use notations such as Λσ and Vσ• with their natural meanings. The dynamics of STR is given by a standard call-by-value operational seman- tics, defined in Figure 2, where for a function ϕ : Rn → R and a number r ∈ R we write ϕr : Rn−1 → R for the mapping (r1 , . . . , rn−1 ) 7→ ϕ(r, r1 , . . . , rn−1 ). Notice, in particular, that ϕ r1 · · · rn eventually reduces to ϕ(r1 , . . . , rn ). Since STR is simply-typed, standard reducibility [20] suffices to show that STR is strongly normalizing. In particular, any program t evaluates to a (unique) 3 When dealing with standard arithmetic operator, such as +, we will overload the notation and write + in place of +. t, s ::= x | r | ϕ | ht, si | λx.t | out1 t | out2 t | ts σ, τ ::= R | σ × τ | σ → τ v, w ::= x | r | ϕ | hv, wi | λx.t Γ, x : σ ` x : σ Γ `r:R Γ ` ϕ : |R → ·{z · · → R} → R n Γ ` t1 : σ1 Γ ` t2 : σ2 Γ ` t : σ1 × σ2 Γ ` t : σ1 × σ2 Γ ` ht1 , t2 i : σ1 × σ2 Γ ` out1 t : σ1 Γ ` out2 t : σ2 Γ, x : σ ` t : τ Γ `t:σ→τ Γ `s:σ Γ ` λx.t : σ → τ Γ ` ts : τ Fig. 1. Syntax and Statics of STR closed value v—that we indicate as nf(t)—in a finite number of →-steps (notation t ⇓ v). We write t ⇓n v, for n ∈ N, to state that t evaluates to v in n number to →-steps and →∗ for the reflexive and transitive closure of →. t → t0 s → s0 (λx.t)v → t[v/x] ϕ r → ϕr outi hv1 , v2 i → vi ts → t0 s vs → vs0 t → t0 s → s0 t → t0 ht, si → ht0 , si hv, si → hv, s0 i outi t → outi t0 Fig. 2. Dynamics of STR 2.1 Program Equivalence and Program Distance: an Overview Despite its simplicity, STR allows us to justify the shift from program equiva- lence—that is the family of notions concerning equality between programs—to program distance, in general, and to differential logical relations, in particular. First, let us define a suitable notion of program equivalence for STR programs. Due to its simple nature, STR allows us to choose among a large family of no- tions of program equivalence, ranging from denotationally-based equivalences to operationally-based ones. Here we choose extensional or applicative equivalence 4 . 4 This is nothing but a simplification of Abramsky’s applicative bisimilarity [2] relying on strong normalization of STR and its simple type system. Definition 1. Define the type-indexed family of relations (∼ =Λσ ⊆ Λ•σ × Λ•σ , ∼ =Vσ ⊆ Vσ × Vσ )σ as follows (where i ∈ {1, 2}): • • t∼=Λσ t0 ⇐⇒ nf(t) ∼=Vσ nf(t0 ) hv1 , v2 i ∼ =Vσ1 ×σ2 ∼V w i hw1 , w2 i ⇐⇒ ∀i. vi = σi r∼=VR r0 ⇐⇒ r = r0 v∼ =V σ→τ v 0 ⇐⇒ ∀w ∈ V • . vw ∼ =Λ v 0 w σ τ It is well-known that extensional equivalence is a congruence relation, this way enabling compositional reasoning about program behaviors: s ∼ =σ s0 entails t[s/x] ∼ 0 =τ t[s /x] for any term x : σ ` t : τ . Unfortunately, one soon realizes that due to the presence of (constants for) real-numbers, program equivalence is a too coarse notion for reasoning about STR programs. For it is desirable to regard two programs of type R whose outputs are ε apart to be themselves ε apart, rather than just state that the two are not equivalent.5 The natural way to overcome this problem is to refine ∼ =σ into a map δσ : Λ•σ × Λ•σ → R following Lawvere’s correspondence between ordered sets and (generalized) metric spaces [22]. Accordingly, we obtain the following maps: δRV (r, r0 ) , r0 − r δσV1 ×σ2 (hv1 , v2 i, hw1 , w2 i) , max δσVi (vi , wi ) i∈{1,2} 0 0 Λ V δσ (t, t ) , δσ (nf(t), nf(t )) δσ→τ (v, v ) , sup δτΛ (vw, v 0 w) V 0 w∈Vσ• which can be easily proved to be generalized metrics6 [22]. Of course, in order for δ to serve its purpose, we also need it to support the (quantitative refinement of the) compositionality principle ensured by ∼ =. As compositionality of ∼ = took the form of a congruence property, it is easy to realize to that compositionality of δ takes the form of non-expansiveness: for all terms s, s0 ∈ Λ•σ and x : σ ` t : τ we must have δσ (s, s0 ) ≥ δτ (t[s/x], t[s0 /x]). That is, contexts cannot amplify distances. Unfortunately, we immediately see that δ fails to be non-expansive, and thus compositional. Even worse, any reasonable non-expansive metric-like map is bound to trivialize, meaning that it collapses to a congruence relation. Roughly, given two terms s, s0 which are ε 6= 0 apart, for any positive real number c it is always possible to find an open term x : σ ` t : τ such that t[s] and t[s0 ] are c apart. For it is sufficient to take a term t using its input x enough times: once the terms t[s] and t[s0 ] are evaluated, any time t uses x the distance between s 5 A similar argument can be formulated for any language/calculus exhibiting, either in its syntax or in its semantics, some quantitative behavior. Typical examples of such behaviors are given by types for quantitative objects, such as numeric types, but also by probabilistic primitives, the latter making relevant semantic notions, such as termination, quantitative [8, 10–12, 15, 27, 5]. 6 Additionally, by replacing r0 − r and supr1 ,...,rn ψ(r1 , . . . , rn ) − ϕ(r1 , . . . , rn ) with |r0 −r| and supr1 ,...,rn |ψ(r1 , . . . , rn )−ϕ(r1 , . . . , rn )|, respectively, δσ becomes a pseu- dometric [31]. and s0 is detected and added to the one previously measured. Remarkably, this holds even if any ϕ is non-expansive (i.e. 1-Lipschitz). 2.2 Differential Logical Relations The failure of non-expansiveness of quantitative refinements of notions of pro- gram equivalence has led researchers to propose several notions of program dis- tance [9, 27, 14] in recent years, all aimed to restore compositionality. All the notions proposed share a common feature: they all impose calculi linearity con- straints, this way providing static information on the number of times a program can use its input (and thus on how much the program can amplifies distances). The notions of program distance thus obtained are indeed compositional, but still have two major drawbacks. First, they are tailored for linear calculi and are not very informative when applied to non-linear calculi (relying, e.g., on stan- dard translations [18]). Second, and most important, they do not account for the role of the environment in determining distances. Let us expand on this last point by means of an example. Consider the (linear) programs t , λx.x and s , λx.sin x for the identity and sine function, respectively. It is easy to see that measuring the distance between t and s as we did when defining δ, we are forced to conclude such a distance to be ∞. In fact, for r → ∞ we have |r − sin(r)| → ∞. This is rather unsatisfactory, as such distance does not take into account which input the environment will actually pass to t and s. For instance, if the environment feeds t and s with an input v close to zero, then the distance between tv and sv should be close to 0 too, and thus we would like to conclude that in all such cases the distance between t and s is itself close to zero. Summing up, ordinary notions of program distance are not sensitive to the context in which programs are used. This ultimately relies on the fact that mea- suring the distance between two programs (regarded as functions) as just one single number there is no way to give information on how such programs interact with the environments in which they are used. Differential logical relations have been introduced in [21] as a way to define a context-sensitive notion of program distance on non-linear calculi. The main novelty of differential logical relations (which was previously theorized by Westbrook and Chaudhuri [32] in the set- ting of approximate program transformations [23]) is to consider richer notions of distance (also called differences) between programs, whereby the difference between two programs is, in general, not a number, but a function describing how differences between inputs turn into differences between outputs. Differential logical relations take the form of (type-indexed) ternary relations Dσ relating pairs of programs together with differences between them. When dealing with programs of type σ → τ , differences take the form of functions mapping input programs of type σ and differences for such programs to differ- ences for programs of type τ . This is why here we consider a computationally- oriented notion of difference whereby differences between programs are defined as programs themselves (cf. [32]) rather than as semantical objects. We formalize these ideas by assigning to each type σ a type ∆σ whose in- habitants are terms acting as differences between terms of type σ. Definition 2. For any type σ, we define the type ∆σ of σ-differences as follows: ∆R , R; ∆(σ × τ ) , ∆σ × ∆τ ; ∆(σ → τ ) , σ → ∆σ → ∆τ. Notice, in particular, that a difference between two programs of type σ → τ is a program taking an input of type σ and a σ-difference, and returning a τ -difference. Obviously, given two programs t, t0 of type σ, not all programs of type ∆σ can act as (meaningful) differences between t and t0 . Differential logical relations (DLRs for short) are ternary relations specifically designed to isolate meaningful differences between programs. More precisely, a DLR is a type-indexed family of ternary relations D , (DΛσ , DVσ )σ , where DΛσ ⊆ Λ•∆σ × Λ•σ × Λ•σ and DVσ ⊆ V∆σ • × Vσ• × Vσ• , such that DΛσ (dt, t, t0 ) holds if and only if dt is a difference7 between t and t0 (and similarly for values). Definition 3 (Asymmetric DLRs). A differential logical relation is a type- indexed family of ternary relations (DΛσ ⊆ Λ•∆σ × Λ•σ × Λ•σ , DVσ ⊆ V∆σ • × Vσ• × Vσ• )σ such that: DVR (dr, r, r0 ) ⇐⇒ r0 − r = dr DVσ1 ×σ2 (dv, v, v 0 ) ⇐⇒ ∀i ∈ {1, 2}. DVσi (outi dv, outi v, outi v 0 ) DVσ→τ (dv, v, v 0 ) ⇐⇒ ∀dw, w, w0 . DVσ (dw, w, w0 ) =⇒ DΛτ (dv w dw, vw, v 0 w0 ) DΛσ (dt, t, t0 ) ⇐⇒ DVσ (dv, v, v 0 ) where dt ⇓ dv, t ⇓ v, t0 ⇓ v 0 . Remark 1. Contrary to the original formulation of DLRs [21], here we work with asymmetric DLRs: if dt is a difference between t and t0 , then dt may not be a difference between t0 and t. For instance, 3 is a difference between 2 and 5, as by adding 3 to 2 we reach 5. Yet, according to such a reading, it is not true that 3 is a difference between 5 and 2 (the desired difference being, in fact, −3). Example 1 ([21]). Let t , λx.sin x and t0 , λx.x. Then dt , λx.λdx.x + dx − sin x is a difference between t and t0 . For, proving DVR→R (dt, t, t0 ) requires to prove that for all dr, r, r0 such that DVR (dr, r, r0 ) (meaning that r + dr = r0 ), we have DVR (r + dr − sin r, sin r, r0 ), i.e. r + dr − sin r + sin r = r0 , which is indeed the case. Observe how dt x ε evaluates to a real number which is indeed small when the two arguments are themselves close to 0. As already remarked, DLRs have been introduced with the goal of developing a compositional theory of program distance. This goal is achieved by the so-called Fundamental Lemma [21]. Lemma 1 (Fundamental Lemma, Version 1). For any program t ∈ Λ•σ there exists a self-difference dt for it. That is, Dσ (dt, t, t). 7 Following conventions in, e.g., [7], we use the notation dt, ds, . . . for term differences— i.e. terms of type ∆σ. Lemma 1 enables compositional reasoning on program differences. Informally, by regarding a context x : σ ` t : τ as a term λx.t we are guaranteed a self- difference dt for t to exist, so that given two programs s, s0 of type σ with difference ds between them, one can compute the difference between t[s/x] and t[s0 /x] starting from s, ds, and dt alone. 2.3 The Incremental λ-calculus and Program Derivatives Albeit enabling compositional reasoning on program differences, Lemma 1 has the major drawback of guaranteeing the existence of self-distances without giving any explicit information on how to construct them. As we will see in the next section, the self-distances of Lemma 1 are precisely the program derivatives used in the incremental λ-calculus. The incremental λ-calculus is a formalism introduced by Cai et al. [7] as a foundational calculus for incremental computation [26, 25]. Roughly, suppose we are given a program f regarded as a function, and an input a (think, for instance, of a as a database). Let us now suppose to compute f (a) and then to modify the input a by a change da, this way obtaining a new input a ⊕ da (for instance, we may add a new entry to the database a). Incremental computing seeks for ways to obtain the result of f (a ⊕ da) without computing f on the new input a ⊕ da from scratch. In fact, sometimes it is indeed possible to obtain such a result in terms of f (a) and f 0 (a, da), for a suitable function8 f 0 . For instance, let f (x) , x2 and suppose we have computed f (a), for some a. Let us now change a to a+da. When asked to compute f (a + da) we can take advantage of having already computed f (a) = a2 by observing that f (a + da) = a2 + 2ada + da2 = f (a) + f 0 (a, da), where f 0 (x, dx) , 2xdx + dx2 . In order to provide a formal foundation for higher-order incremental com- putation, Cai et al. [7] studied incrementalization of a simply-typed λ-calculus similar to the one introduced in the previous section. More precisely, for any type σ a type of σ-changes coinciding with ∆σ is introduced, as well as an op- erator ⊕ (called change update) building an expression t ⊕ dt ∈ Λσ from an expression t ∈ Λσ and a change dt ∈ Λ∆σ . In order to account for incremental- ization, they also introduced the so-called derivative 9 Dt ∈ Λ∆σ of an expression t ∈ Λσ , and showed that for all terms t ∈ Λ•σ→τ , s ∈ Λ•σ , and σ-change ds, one has: t (s ⊕ ds) ≡ (ts) ⊕ (Dt s ds), where ≡ stands for denotational equality. All the aforementioned results are proved by means of denotational semantics, al- though some operationally-based proofs employing techniques resembling DLRs are given in Giarrusso’s PhD thesis [16]. In the next section, we will show how we can easily prove such results using DLRs, and, dually, how by identifying differences with changes we can improve 8 Obviously, to be practically useful, the map f 0 should be such that computing f 0 (a, da) is cheaper than computing f (a ⊕ da). 9 The terminology is misleading. For instance, we should not think of the derivative Dϕ of a term ϕ : R → R as the syntactic counterpart of the derivative of ϕ : R → R. Rather, Dϕ represents the finite difference [28] of ϕ, i.e. the map ∆ϕ : R × R → R defined by ∆ϕ(x, dx) , ϕ(x + dx) − ϕ(x). on the current theory of program differences. In order to do so, however, we first need to formally introduce the update operator ⊕ and the notion of a program derivative. We begin recalling a couple of basic definitions from finite difference calculus. Definition 4. Given functions ϕ : Rn → R and dϕ : (R × R)n → R, define the maps ϕ ⊕ dϕ : Rn → R and ∆ϕ : (R × R)n → R by: (ϕ ⊕ dϕ)(x1 , . . . , xn ) , ϕ(x1 , . . . , xn ) + dϕ((x1 , 0), . . . , (xn , 0)); ∆ϕ((x1 , dx1 ), . . . , (xn , dxn )) , ϕ(x1 + dx1 , . . . , xn + dxn ) − ϕ(x1 , . . . , xn ). The map ∆ϕ is known as the finite difference of ϕ. Notice that ϕ ⊕ ∆ϕ = ϕ. Definition 5. Define the partial operator ⊕ : Λ → Λ → Λ as follows: x ⊕ dx , x; (outi t) ⊕ (outi dt) , outi (t ⊕ dt); r ⊕ dr , r + dr; (λx.t) ⊕ (λx.λdx.dt) , λx.(t ⊕ dt); ϕ ⊕ dϕ , ϕ ⊕ dϕ; (ts) ⊕ (dt s ds) , (t ⊕ dt) (s ⊕ ds). ht, si ⊕ hdt, dsi , ht ⊕ dt, s ⊕ dsi; The definition of t ⊕ dt may appear weird at first, but it will become clear once the notion of a derivative is introduced. Intuitively, given a term t and a change dt, we can see t ⊕ dt as the term obtained by changing t according to dt. Clearly, this is possible only if dt has the ‘right’ structure (for instance, it would be meaningless to do something like changing a function according to a number). We immediately notice that if v is a value and v ⊕ dv is defined, then dv and v ⊕ dv are values too. Moreover, the following typing rule is admissible, whenever t ⊕ dt is defined: Γ ` t : σ dΓ, Γ ` dt : ∆σ Γ ` t ⊕ dt : σ where for Γ = x1 : σ1 , . . . , xn : σn , we have dΓ , dx1 : ∆σ1 , . . . , dxn : ∆σn . Next, we define the notion of a derivative of a term. Definition 6. The derivative Dt of a term t is thus defined: Dx , dx; Dr , 0; D(ts) , Dt s Ds. Dϕ , ∆ϕ; Dht, si , hDt, Dsi; D(outi t) = outi Dt; D(λx.t) , λx.λdx.Dt; Observe that we can indeed think of Dt as the generalization of finite dif- ferences to arbitrary programs. Moreover, we easily see that if Γ ` t : σ, then Γ, dΓ ` Dt : ∆σ and that t ⊕ Dt is defined and equal to t itself. 3 Bridging the Gap In this section we relate DLRs with the incremental λ-calculus we introduced in the previous section. We do so acting on two orthogonal axes. On the one hand, we show that derivatives are precisely the self-distances of Lemma 1. That is, for any program t, Dt is a self-distance for t. This result allows us to strength the fundamental lemma of DLRs (Lemma 1), as now self-differences can be effectively computed. On the other hand, we prove by means of DLRs a major result on the incremental λ-calculus, namely soundness of differentiation [7]. To the best of the authors’ knowledge, all proofs of such a result rely on either denotational semantics or logical relations tailored for such purpose (see Remark 2). Let us begin proving that derivatives are actually self-differences. In order to achieve such a result, we have to first extend the notion of a DLR to open terms [21]. Given an environment Γ , we denote by S(Γ ) the collection of Γ - substitutions, i.e. the collection of maps ρ mapping variables (x : σ) ∈ Γ to closed values ρ(x) ∈ Vσ• . In particular, we use the notation dρ to denote substitutions in S(dΓ ). Definition 7. We extend the notion of a DLR to substitutions over an envi- ronment Γ as follows: DΓ (dρ, ρ, ρ0 ) ⇐⇒ ∀(x : σ) ∈ Γ. DVσ (dρ(dx), ρ(x), ρ0 (x)), where ρ, ρ0 ∈ S(Γ ) and dρ ∈ S(dΓ ). As it is customary, we write t[ρ] for the application of the substitution ρ to the term t, and ρ[x 7→ v] for the substitution mapping x to v and behaving as ρ otherwise. Before proving our refinement of Lemma 1, let us observe that DLRs are closed under reduction, in the following sense. Lemma 2. The following holds for all closed terms: DΛσ (dt, t, t0 ) ∧ t →∗ s ∧ t0 →∗ s0 =⇒ DΛσ (dt, s, s0 ); DΛσ (dt, s, s0 ) ∧ s →∗ t ∧ s0 →∗ t0 =⇒ DΛσ (dt, t, t0 ). We are now ready to prove our new version of the Fundamental Lemma. Lemma 3 (Fundamental Lemma, Version 2). For any program t ∈ Λ•σ we have Dσ (Dt, t, t). Proof (sketch). The thesis follows from the stronger statement: for any term Γ ` t : σ and value Γ ` v : σ we have: ∀dρ, ρ, ρ0 . DΓ (dρ, ρ, ρ0 ) =⇒ DVσ (Dv[ρ, dρ], v[ρ], v[ρ0 ]) ∧ DΛσ (Dt[ρ, dρ], t[ρ], t[ρ0 ]). The proof of the latter is a routine induction on t and v. t u Notice how Lemma 3 improves the compositionality principle of DLRs. Given a term x : σ ` t : τ and two values ` v, v 0 : σ such that DVσ (dv, v, v 0 ) the impact of replacing v with v 0 in t can be computed as Dt[v/x, dv/dx]. Next, we show how the incremental λ-calculus can benefit from DLRs by showing how the latter support an easy proof of soundness of differentiation. Theorem 1 (Soundness of Differentiation [7, 17]). For all t ∈ Λ•σ→τ and values v, v 0 , dv such that DVσ (dv, v, v 0 ), we have: tv 0 ∼ = (tv) ⊕ (Dt v dv). Our proof of Theorem 1 is based on the following result which states that changes indeed behave as such. Recall that ∼ = extends to open terms by stipu- lating that for Γ ` t, t0 : σ we have t ∼Λ 0 =σ t iff t[ρ] ∼ =Λσ t0 [ρ], for any substitution ρ ∈ S(Γ ) (and similarly for values). Proposition 1. The following holds for all (possibly open) terms t, t0 , dt and values v, v 0 , dv such that t ⊕ dt and v ⊕ dv is defined. DΛσ (dt, t, t0 ) =⇒ t0 ∼ = t ⊕ dt; DVσ (dv, v, v 0 ) =⇒ v 0 ∼ = v ⊕ dv. Proof (sketch). The proof is by induction on σ, the relevant case being the one of values. We show how to handle the case for arrow types. Assume Γ ` t, t0 : σ → τ . We have to show that for any ρ ∈ S(Γ ), t0 [ρ] ∼ = (t ⊕ dt)[ρ]. First, observe that we have the following general result (by induction on t), where (Dρ)(dx) , Dρ(x): (t ⊕ dt)[ρ] = t[ρ] ⊕ dt[ρ, Dρ]. By Lemma 3, we have DΓ (Dρ, ρ, ρ), hence DVσ→τ (dt[ρ, Dρ], t[ρ], t0 [ρ]). In particular, we must have t[ρ] = λx.s, t0 [ρ] = λx.s0 , and dt[ρ, Dρ] = λx.λdx.ds, for some s, s0 , and ds. Since (λx.s) ⊕ (λx.λdx.ds) = λx.(s ⊕ ds) (notice that this term is indeed defined, as it is obtained from t ⊕ dt, which is defined by hypothesis, replacing variables y, dy with closed values v, Dv), in order to prove the thesis we have to show s0 [v/x] ∼ = (s ⊕ ds)[v/x] for any closed value v of type σ. Since (s ⊕ ds)[v/x] = s[v/x] ⊕ ds[v/x, Dv/dx] we obtain the thesis from DVσ→τ (dt[ρ, Dρ], t[ρ], t0 [ρ]) and DVσ (Dv, v, v), the latter being a consequence of Lemma 3. t u We can finally prove soundness of differentiation. Proof (Theorem 1). Assume DVσ (dv, v, v 0 ). By Lemma 3 we have Dσ→τ (Dt, t, t), and thus Dτ (Dt s ds, ts, ts0 ), by Lemma 2. We conclude that tv 0 ∼ = (tv)⊕(Dt v dv) from Proposition 1. t u Remark 2. To the best of the authors’ knowledge, all proofs of Theorem 1 in the literature are based on either denotational semantics or on logical relations resembling DLRs, but specifically extended with a clause requiring t ⊕ dt = t0 for all related terms dt, t, t0 , at any type [17, 16]. Notice the use of syntactic equality: the reason behind such a choice is that the logical relation obtained is meant to relate only programs with their derivative (in which case we indeed have t ⊕ Dt = t), rather than as a tool to reason about program differences. 4 Related Work Differential logical relations have been introduced by the authors and Yoshimizu [21], building over intuitions by Westbrook and Chaudhuri [32] and are cur- rently under investigation. Differently from the ones considered in this work, the first formulation of differential logical relations [21] is symmetric and con- siders semantical difference spaces, so that differences between programs are semantical objects (such as numbers and functions), rather than programs them- selves. Whereas we have found that working with asymmetric DLRs makes proofs clearer (besides, asymmetry is in line with Lawvere’s analysis of the notion of a distance [22]), working with syntactic difference spaces does not really affect our results. In fact, we could consider semantic-based difference spaces and show that the denotation of a derivative of a program is a self-difference for the program. The incremental λ-calculus has been introduced by Cai et al. [7] as a simply- typed calculus, and by Giarrusso et al. [17] as an untyped calculus. The former work introduces the notions of a program derivative and change update, and gives a denotational proof of soundness of differentiation. Operationally-based proofs of the same result are given in Giarrusso PhD’s thesis [17, 16] by means of logical relations (see Remark 2). Remarkably, both Giarrusso’s thesis [16] and the work by Giarrusso et al. [17] use ternary logical relations nearly identical to differential logical relations to relate programs with changes between them. Moreover, the logical relations introduced in the aforementioned papers have been mechanized in CoQ. The authors believe it is important to stress how essentially the same technique has independently emerged in different fields (and with different purposes) to prove two different kinds of differential properties of programs. 5 Conclusion In this work we have established a formal connection between differential logi- cal relations and the incremental λ-calculus of Cai et al. [7], whereby the self- differences of the former are identified with the program derivatives of the lat- ter. Albeit the results proved here are not technically involved, by establishing a formal connection between two different fields they improve the current under- standing of differential properties of programs, such an understanding being still in its infancy. The fact that essentially the same technique has been indepen- dently developed in different fields, one looking at software optimization and the other studying semantical notions of distance between programs, witnesses that, at least in the authors’ opinion, the technique deserves to be further investigated. In addition to its conceptual relevance, the connection established in the present work also allows us to obtain technical improvements both on the the- ory of incremental λ-calculus and on the one of differential logical relations. Concerning the former, we have showed how differential logical relations consti- tute a lightweight operational technique for incremental computing, and we have witnessed that by giving a new, relatively easy proof of soundness of differenti- ation. Concerning the latter, we have strengthened the fundamental lemma of DLRs [21] by showing how program derivatives constitute self-differences, this way reaching an higher level of compositionality. A further consequence of such a connection is the extension of DLRs to calculi with full recursion by means of the step-indexed logical relations of Giarrusso, Régis-Gianas, and Schuster [17]. Acknowledgment The authors are supported by the ERC Consolidator Grant DLV-818616 DIAPASoN as well as by the ANR project 16CE250011 REPAS. References 1. Martı́n Abadi and Gordon D. Plotkin. A simple differentiable programming lan- guage. PACMPL, 4(POPL):38:1–38:28, 2020. 2. Samson Abramsky. The lazy lambda calculus. In D. Turner, editor, Research Topics in Functional Programming, pages 65–117. Addison Wesley, 1990. 3. Hendrik Pieter Barendregt. The lambda calculus - its syntax and semantics, volume 103 of Studies in logic and the foundations of mathematics. North-Holland, 1985. 4. Michael Bartholomew-Biggs, Steven Brown, Bruce Christianson, and Laurence Dixon. Automatic differentiation of algorithms. Journal of Computational and Applied Mathematics, 124(1):171 – 190, 2000. Numerical Analysis 2000. Vol. IV: Optimization and Nonlinear Equations. 5. Frank Van Breugel and James Worrell. A behavioural pseudometric for probabilis- tic transition systems. Theoretical Computer Science, 331(1):115–142, 2005. 6. Aloı̈s Brunel, Damiano Mazza, and Michele Pagani. Backpropagation in the simply typed lambda-calculus with linear negation. PACMPL, 4(POPL):64:1–64:27, 2020. 7. Yufei Cai, Paolo G. Giarrusso, Tillmann Rendel, and Klaus Ostermann. A theory of changes for higher-order languages: incrementalizing λ-calculi by static differen- tiation. In Proc. of PLDI 2014, pages 145–155, 2014. 8. Konstantinos Chatzikokolakis, Daniel Gebler, Catuscia Palamidessi, and Lili Xu. Generalized bisimulation metrics. In Proc. of CONCUR 2014, pages 32–46, 2014. 9. Raphaëlle Crubillé and Ugo Dal Lago. Metric reasoning about λ-terms: The general case. In Proc. of ESOP 2017, pages 341–367, 2017. 10. Arthur Azevedo de Amorim, Marco Gaboardi, Justin Hsu, Shin-yaKatsumata, and Ikram Cherigui. A semantic account of metric preservation. In Proc. of POPL 2017, pages 545–556, 2017. 11. Josee Desharnais, Vineet Gupta, Radha Jagadeesan, and Prakash Panangaden. Metrics for labelled markov processes. Theoretical Computer Science, 318(3):323– 354, 2004. 12. Wenjie Du, Yuxin Deng, and Daniel Gebler. Behavioural pseudometrics for non- deterministic probabilistic systems. In Proc. of SETTA 2016, pages 67–84, 2016. 13. Thomas Ehrhard and Laurent Regnier. The differential lambda-calculus. Theoret- ical Computer Science, 309(1-3):1–41, 2003. 14. Francesco Gavazzo. Quantitative behavioural reasoning for higher-order effectful programs: Applicative distances. In Proc. of LICS 2018, pages 452–461, 2018. 15. Daniel Gebler, Kim G. Larsen, and Simone Tini. Compositional bisimulation met- ric reasoning with probabilistic process calculi. Logical Methods in Computer Sci- ence, 12(4), 2016. 16. Paolo G. Giarrusso. Optimizing and incrementalizing higher-order collection queries by AST transformation. PhD thesis, University of Tbingen, 2018. 17. Paolo G. Giarrusso, Yann Régis-Gianas, and Philipp Schuster. Incremental lambda-calculus in cache-transfer style - static memoization by program trans- formation. In Proc. of ESOP 2019, pages 553–580, 2019. 18. Jean-Yves Girard. Linear logic. Theoretical Computer Science, 50:1–102, 1987. 19. Jean-Yves Girard. Normal functors, power series and λ-calculus. Annals of Pure and Applied Logic, 37(2):129–177, 1988. 20. Jean-Yves Girard, Yves Lafont, and Paul Taylor. Proofs and Types. Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, 1989. 21. Ugo Dal Lago, Francesco Gavazzo, and Akira Yoshimizu. Differential logical rela- tions, part I: the simply-typed case. In Proc. of ICALP 2019, pages 111:1–111:14, 2019. 22. F. William Lawvere. Metric spaces, generalized logic, and closed categories. Ren- diconti del Seminario Matematico e Fisico di Milano, 43:135–166, 1973. 23. Sasa Misailovic, Daniel M. Roy, and Martin C. Rinard. Probabilistically accurate program transformations. In In Proc. of SAS 2011, pages 316–333, 2011. 24. Sparsh Mittal. A survey of techniques for approximate computing. ACM Comput- ing Surveys, 48(4):62:1–62:33, 2016. 25. Robert Paige and Shaye Koenig. Finite differencing of computable expressions. ACM Trans. Program. Lang. Syst., 4(3):402–454, 1982. 26. Ganesan Ramalingam and Thomas W. Reps. A categorized bibliography on incre- mental computation. In Proc. of POPL 1993, pages 502–510, 1993. 27. Jason Reed and Benjamin C. Pierce. Distance makes the types grow stronger: a calculus for differential privacy. In Proc. of ICFP 2010, pages 157–168, 2010. 28. Clarence Hudson Richardson. An Introduction to the Calculus of Finite Differences. New York, 1954. 29. Amir Shaikhha, Andrew Fitzgibbon, Dimitrios Vytiniotis, and Simon Peyton Jones. Efficient differentiable programming in a functional array-processing lan- guage. PACMPL, 3(ICFP):97:1–97:30, 2019. 30. Michael Spivak. Calculus On Manifolds: A Modern Approach To Classical Theo- rems Of Advanced Calculus. Avalon Publishing, 1971. 31. Lynn Arthur Steen and Jr. J. Arthur Seebach. Counterexamples in Topology. Dover books on mathematics. Dover Publications, 1995. 32. Edwin M. Westbrook and Swarat Chaudhuri. A semantics for approximate pro- gram transformations. CoRR, abs/1304.5531, 2013. URL: http://arxiv.org/ abs/1304.5531.