Enhanced Regular Corecursion for Data Streams? Davide Ancona, Pietro Barbieri, and Elena Zucca DIBRIS, University of Genova Abstract. We propose a simple calculus for processing data streams (in- finite flows of data series), represented by finite sets of equations built on stream operators. Furthermore, functions defining streams are regularly corecursive, that is, cyclic calls are detected, avoiding non-termination as happens with ordinary recursion in the call-by-value evaluation strat- egy. As we illustrate by several examples, the combination of such two mechanisms provides a good compromise between expressive power and decidability. Notably, we provide an algorithm to check that the stream returned by a function call is represented by a well-defined set of equa- tions which actually admits a unique solution, hence access to an arbi- trary element of the returned stream will never diverge. Keywords: Operational semantics, stream programming, regular terms. 1 Introduction Applications often deal with data structures which are conceptually infinite, among those data streams (infinite flows of data series) are a mainstream exam- ple: as we venture deeper into the Internet of Things (IoT) era, stream processing is becoming increasingly important. Indeed, all main IoT platforms provide em- bedded and integrated engines for real time analysis of potentially infinite flowing data series; such a process occurs before the data is stored for efficiency and, as often happens in Computer Science, there is a trade-off between the expressive power of the language, the efficiency of its implementation and the decidability of properties important to guarantee reliability and tractability. Another related important problem is data stream generation, which is es- sential to test complex distributed IoT systems; the deterministic simulation of sensor data streams through a suitable language offers a practical solution to IoT testing and favors early detection of some kinds of bugs that can be fixed more easily before the deployment of the whole system. A well-established solution to data stream generation and processing is lazy evaluation, as supported, e.g., in Haskell, and most stream libraries offered by mainstream languages, as java.util.stream. In this approach, conceptually infi- nite data streams are the result of a function or method call, which is evaluated ? Copyright c 2021 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 D. Ancona et al. according to the call-by-need strategy. For instance, in Haskell we can define one_two = 1:2:one_two, or even represent the list of natural numbers as from 0, where from n = n:from(n+1). However, such a great expressive power comes at a cost; let us consider, for instance, the definition bad_stream = 0:tail bad_stream. The Haskell compiler does not complain about this definition, and no problem arises at runtime as long as the manipulation of bad_stream requires only its first element to be accessed; anyway, any operation which needs to inspect bad_stream at a deeper level is deemed to diverge. Unfortunately, it is not decidable to check, even at runtime, whether the stream returned by a Haskell function is well- defined, that is, all of its elements can be computed1 ; indeed, the full expressive power of Haskell can be used to define streams by means of recursive functions. For similar reasons, it is not decidable to check at runtime whether the streams returned by two Haskell functions are equal. More recently, a complementary approach has been considered in different programming paradigms — functional [11], logic [17,1,7], and object-oriented [2] — based on the following two ideas: – Infinite streams can be finitely represented by finite sets of equations involv- ing only the stream constructor, e.g., x = 1 : 2 : x. Such a representation corresponds to what has been called by Courcelle in its seminal paper [6] a regular, a.k.a. rational, tree, that is, a tree with possibly infinite depth but a finite set of subtrees. – Functions are regularly corecursive, that is, execution keeps track of pending function calls, so that, when the same call is considered the second time, this is detected, avoiding non-termination as happens with ordinary recursion in the call-by-value evaluation strategy. In this way, the Haskell stream one_two can be equivalently obtained by the call2 one_two(), with the function one_two defined by one_two() = 1:2:one_two(). In- deed, with regular corecursion the result of this call is the value corresponding to the unique solution of the equation x = 1 : 2 : x. On the other hand, since the expressive power is limited to regular streams, it is not possible to define a core- cursive function whose call returns the stream of natural numbers, as happens for the from 0 Haskell example. However, there exist procedures for checking well-defined streams and their equality, even with tractable algorithms. In this paper, we propose a simple calculus of numeric streams which supports regular corecursion and goes beyond regular streams by extending equations with other typical stream operators besides the stream constructor: tail and pointwise operators can be contained in stream equations and are therefore not evaluated. In this way, we are able to achieve a good compromise between expressive power and decidability. Notably: – the extended shape of equations allows the definition of functions which return non-regular streams; for instance, it is possible to obtain the stream of 1 This is what is also known as a productive corecursive definition [5]. 2 Differently from Haskell, for simplicity in our calculus functions are uncurried, hence they take as arguments possibly empty tuples, delimited by parentheses. Enhanced Regular Corecursion for Data Streams 3 natural numbers as from(0), by defining from(n)=n:(from(n)[+]repeat(1)), with [+] the pointwise addition on numeric streams and repeat the function defined by repeat(n)=n:repeat(n); – there exists a decidable procedure to dynamically check whether the stream returned by a corecursive function is well-defined; – however, it is not possible to express all streams computable with the lazy evaluation approach, but only those which have a specific structure (that is, can be expressed as the unique solution of a set of equations built with the above mentioned operators). In Sect. 2 we define the calculus, in Sect. 3 we show examples, and in Sect. 4 we provide an operational characterization of well-defined streams, proved sufficient and necessary for an access to an arbitrary index to never diverge. In Sect. 5 we discuss related and further work. An extended version includ- ing more examples of derivations and complete proofs can be found at http: //arxiv.org/abs/2108.00281. 2 Stream calculus Fig. 1 shows the syntax of the calculus. fd :: = fd1 . . . fdn program fd :: = f(x) = se function declaration e :: = se | ne | be expression se :: = x | if be then se1 else se2 | ne : se | seˆ | se1 [op]se2 | f(e) stream expression ne :: = x | se(ne) | ne1 op ne2 | 0 | 1 | 2 | ... numeric expression be :: = x | true | false | ... boolean expression op :: = + | − | ∗ | / numeric operation Fig. 1. Stream calculus: syntax A program is a sequence of (mutually recursive) function declarations, for simplicity assumed to only return streams. Stream expressions are variables, conditional expressions, expressions built by stream operators, and function calls. We consider the following stream operators: constructor (prepending a numeric element), tail, and pointwise arithmetic operations. Numeric expressions include the access to the i-th3 element of a stream. We use fd to denote a sequence fd1 , . . . , fdn of function declarations, and analogously for other sequences. The operational semantics, given in Fig. 2, is based on two key ideas: 1. (some) infinite streams are represented in a finite way 2. evaluation keeps trace of already considered function calls 3 For simplicity, here indexing and numeric expressions coincide, even though indexes are expected to be natural numbers, while values in streams can range over a larger numeric domain. 4 D. Ancona et al. c :: = f(v) (evaluated) call v :: = s | n | b value s :: = x | n : s | sˆ | s1 [op]s2 (open) stream value i, n :: = 0 | 1 | 2 | ... index, numeric value b :: = true | false boolean value τ :: = c1 7→ x1 . . . cn 7→ xn (n ≥ 0) call trace ρ :: = x1 7→ s1 . . . xn 7→ sn (n ≥ 0) environment be, ρ, τ ⇓ (true, ρ) se1 , ρ, τ ⇓ (s, ρ0 ) be, ρ, τ ⇓ (false, ρ) se2 , ρ, τ ⇓ (s, ρ0 ) (val) (if-t) (if-f) v, ρ, τ ⇓ (v, ρ) if be then se1 else se2 , ρ, τ ⇓ (s, ρ0 ) if be then se1 else se2 , ρ, τ ⇓ (s, ρ0 ) ne, ρ, τ ⇓ (n, ρ) se, ρ, τ ⇓ (s, ρ0 ) se, ρ, τ ⇓ (s, ρ0 ) se1 , ρ, τ ⇓ (s1 , ρ1 ) se2 , ρ, τ ⇓ (s2 , ρ2 ) (cons) (tail) (pw) ne : se, ρ, τ ⇓ (n : s, ρ0 ) seˆ, ρ, τ ⇓ (sˆ, ρ0 ) se1 [op]se2 , ρ, τ ⇓ (s1 [op]s2 , ρ1 t ρ2 ) ei , ρ, τ ⇓ (vi , ρi ) ∀i ∈ 1..n f(v), ρb, τ ⇓ (s, ρ0 ) e = e1 , . . . , en not of shape v (args) v = vF1 , . . . , vn f(e), ρ, τ ⇓ (s, ρ0 ) ρb = i∈1..n ρi f(v) 6∈ dom(τ≈ρ ) se[v/x], ρ, τ {f(v) 7→ x} ⇓ (s, ρ0 ) x fresh (invk) (corec) τ≈ρ (f (v)) = x f(v), ρ, τ ⇓ (x, ρ0 {x 7→ s}) fbody(f) = (x, se) f(v), ρ, τ ⇓ (x, ρ) wd (ρ0 , x, s) se, ρ, τ ⇓ (s, ρ0 ) ne, ρ, τ ⇓ (i, ρ) (at) at ρ0 (s, i) = n se(ne), ρ, τ ⇓ (n, ρ) at ρ (ρ(x), i) = n0 at ρ (s, i − 1) = n0 (at-var) (at-cons-0) (at-cons-n) i>0 at ρ (x, i) = n0 at ρ (n : s, 0) = n at ρ (n : s, i) = n0 at ρ (s, i + 1) = n at ρ (s1 , i) = n1 at ρ (s2 , i) = n2 (at-tail) (at-pw) at ρ (sˆ, i) = n at ρ (s1 [op]s2 , i) = n1 op n2 Fig. 2. Stream calculus: operational semantics To obtain (1), our approach is inspired by capsules [10], which are essentially expressions supporting cyclic references. That is, the result of the evaluation of a stream expression is a pair (s, ρ), where s is an (open) stream value, built on top of stream variables, numeric values, the stream constructor, the tail destructor and the pointwise arithmetic operators, and ρ is an environment mapping a finite set of variables into stream values. In this way, cyclic streams can be obtained: for instance, (x, x 7→ n : x) denotes the stream constantly equal to n. We denote by vars(ρ) the set of variables occurring in ρ, by fv(ρ) the set of its free variables, that is, vars(ρ) \ dom(ρ), and say that ρ is closed if fv(ρ) = ∅, open otherwise, and analogously for a result (v, ρ). Enhanced Regular Corecursion for Data Streams 5 To obtain point (2) above, evaluation has an additional parameter which is a call trace, a map from function calls where arguments are values (dubbed calls for short in the following) into variables. Altogether, the semantic judgment has shape e, ρ, τ ⇓ (v, ρ0 ), where e is the expression to be evaluated, ρ the current environment defining possibly cyclic stream values that can occur in e, τ the call trace, and (v, ρ0 ) the result. The se- mantic judgments should be indexed by an underlying (fixed) program, omitted for sake of simplicity. Rules use the following auxiliary definitions: – ρ t ρ0 is the union of two environments, which is well-defined if they have disjoint domains; ρ{x 7→ s} is the environment which gives s on x, coincides with ρ elsewhere; we use analogous notations for call traces. – se[v/x] is obtained by parallel substitution of variables x with values v. – fbody(f) returns the pair of the parameters and the body of the declaration of f, if any, in the assumed program. Moreover, the rules are parametric in the following other judgments, for which different definitions will be discussed in Sect. 4: – wd (ρ, x, s), that is, by adding the association x 7→ s to the (well-defined) environment ρ, we still get a well-defined environment. – v ≈ρ v0 , that is, the two values are equivalent in the environment4 ρ. Then, τ≈ρ is the extension of τ up to equivalence in ρ: τ≈ρ (f (v1 , . . . , vn )) = x iff there exist v01 , . . . , v0n such that τ (f (v01 , . . . , v0n )) = x and vi ≈ρ v0i for i ∈ 1..n. Intuitively, a closed result (s, ρ) is well-defined if it denotes a unique stream (infinite sequence of numeric values), and a closed environment ρ is well-defined if, for each x ∈ dom(ρ), (x, ρ) is well-defined. In other words, the corresponding set of equations admits a unique solution. For instance, the environment {x 7→ x} is not well-defined, since it is undetermined (any stream satisfies the equation x = x); the environment {x 7→ x[+]y, y 7→ 1 : y} is not well-defined as well, since it is undefined (the two equations x = x 7→ x[+]y, y = 1 : y admit no solutions for x). Finally, two stream values s and s0 such that the results (s, ρ) and (s0 , ρ) are closed and well-defined are equivalent if they denote the same stream. These notions can be generalized to open results and environments, assuming that free variables denote unique streams, as will be formalized in Sect. 4. Rules for values and conditional are straightforward. In rules (cons), (tail) and (pw), arguments are evaluated, while the stream operator is applied without any further evaluation; the fact that the tail and pointwise operators are treated as the stream constructor : is crucial to get results which denote non-regular streams as shown in Sect. 3. However, when non-constructors are allowed to occur in values, ensuring well-defined results become more challenging, because the usual simple syntactic constraints that can be safely used for constructors [5] no longer work (see more details in Sect. 4 and 5). The rules for function call use a mechanism of cycle detection, similar to that in [2]. They are given in a modular way. That is, evaluation of arguments is handled by a separate rule (args). 4 This equivalence is assumed to be the identity on numeric and boolean values. 6 D. Ancona et al. Rule (invk) is applied when a call is considered for the first time, as expressed by the first side condition. The body is retrieved by using the auxiliary function fbody, and evaluated in a call trace where the call has been mapped into a fresh variable. Then, it is checked that adding the association from such variable to the result of the evaluation of the body keeps the environment well-defined. If the check succeeds, then the final result consists of the variable associated with the call and the updated environment. For simplicity, here execution is stuck if the check fails; an implementation should raise a runtime error instead. Rule (corec) is applied when a call is considered for the second time, as expressed by the first side condition (note that cycle detection takes place up to equivalence in the environment). The variable x is returned as result. However, there is no associated value in the environment yet; in other words, the result (x, ρ) is open at this point. This means that x is undefined until the environment is updated with the corresponding value in rule (invk). However, x can be safely used as long as the evaluation does not require x to be inspected; for instance, x can be safely passed as an argument to a function call. For instance, if we consider f()=g() g()=1:f(), then the judgment f(), ∅, ∅ ⇓ (x, ρ), with ρ = {x 7→ y, y 7→ 1 : x}, is derivable; however, while the final result (x, ρ) is closed, the derivation contains also judgments with open results, as, e.g., f(), ∅, {f() 7→ x, g() 7→ y} ⇓ (x, ∅) and g(), ∅, {f() 7→ x} ⇓ (y, {y 7→ 1 : x}). As another example, if we consider f()=g(2:f()) g(s)=1:s, then the deriva- tion of the judgment f(), ∅, ∅ ⇓ (x, ρ) with ρ = {x 7→ y, y 7→ 1 : 2 : x} is built on top of the derivation of g(2:x ), ∅, {f() 7→ x} ⇓ (y, {y 7→ 1 : 2 : x}), corresponding to the evaluation of g(2:x ) where x is an operand of the stream constructor whose result is passed as argument to the call to g, despite x is not defined yet. Finally, rule (at) computes the i-th element of a stream expression. After evaluation of the arguments, the numeric result is obtained by the auxiliary judgment at ρ (s, i) = n, inductively defined in the bottom part of the figure. If the stream value is a variable, rule (at-var), then the evaluation is propagated to the associated stream value in the environment, if any. If, instead, the variable is free in the environment, then execution is stuck; again, an implementation should raise a runtime error instead. If the stream value is built by the constructor, then the result is the first element of the stream if the index is 0, rule (at-cons-0); otherwise, the evaluation is recursively propagated to its tail with the predecessor index, rule (at-cons-n). Conversely, if the stream is built by the tail operator, rule (at-tail), then the evaluation is recursively propagated to the stream argument with the successor index. Finally, if the stream is built by a pointwise operation, rule (at-pw), then the evaluation is recursively propagated to the operands with the same index and then the corresponding arithmetic operation is computed on the results. Derivations of examples in this section can be found in the extended version. Enhanced Regular Corecursion for Data Streams 7 3 Examples First we show some simple examples, to explain how regular corecursion works. Then we provide some more significant examples. Consider the following function declarations: repeat ( n ) = n : repeat ( n ) one_two () = 1: two_one () two_one () = 2: one_two () With the standard semantics of recursion, the calls, e.g., repeat(0) and one_two() lead to non-termination. Thanks to regular corecursion, instead, these calls ter- minate, producing as result (x, {x 7→ 0 : x}), and (x, {x 7→ 1 : y, y 7→ 2 : x}), re- spectively. Indeed, when initially invoked, the call repeat(0) is added in the call trace with an associated fresh variable, say x. In this way, when evaluating the body of the function, the recursive call is detected as cyclic, the variable x is returned as its result, and, finally, the stream value 0 : x is associated in the environment with the result x of the initial call. The evaluation of one_two() is analogous, except that another fresh variable y is generated for the intermediate call two_one(). The formal derivations are given below. (value) (corec) repeat(0), ∅, {repeat(0) 7→ x} ⇓ (x, ∅) (cons) 0 : repeat(0), ∅, {repeat(0) 7→ x} ⇓ (0 : x, ∅) (invk) repeat(0), ∅, ∅ ⇓ (x, {x 7→ 0 : x}) (value) (corec) one two(), ∅, {one two() 7→ x, two one() 7→ y} ⇓ (x, ∅) (cons) 2 : one two(), ∅, {one two() 7→ x, two one() 7→ y} ⇓ (2 : x, ∅) (value) (invk) two one(), ∅, {one two() 7→ x} ⇓ (y, {y 7→ 2 : x}) (cons) 1 : two one(), ∅, {one two() 7→ x} ⇓ (1 : y, {y 7→ 2 : x}) (invk) one two(), ∅, ∅ ⇓ (x, {x 7→ 1 : y, y 7→ 2 : x}) For space reasons, we did not report the application of rule (value). In both derivations, note that rule (corec) is applied, without evaluating the body once more, when the cyclic call is detected. The following examples show function definitions whose calls return non- regular streams, notably, the natural numbers, the natural numbers raised to the power of a number, the factorials, the powers of a number, the Fibonacci numbers, and the stream obtained by pointwise increment by one. nat () = 0:( nat ()[+] repeat (1)) nat_to_pow ( n ) = // nat_to_pow ( n )( i )= i ^ n if n <= 0 then repeat (1) else nat_to_pow (n -1)[*] nat () fact () = 1:(( nat ()[+] repeat (1))[*] fact ()) pow ( n ) = 1:( repeat ( n )[*] pow ( n )) // pow ( n )( i )= n ^ i fib () = 0:1:( fib ()[+] fib ()^) incr ( s ) = s [+] repeat (1) 8 D. Ancona et al. The definition of nat uses regular corecursion, since the recursive call nat() is cyclic. Hence the call nat() returns (x, {x 7→ 0 : (x[+]y), y 7→ 1 : y}). The def- inition of nat_to_pow is a standard inductive one where the argument strictly decreases in the recursive call. Hence, the call, e.g., nat_to_pow(2), returns (x2 , {x2 7→ x1 [∗]x, x1 7→ x0 [∗]x, x0 7→ y, y 7→ 1 : y, x 7→ 0 : (x[+]y0 ), y0 7→ 1 : y0 }). The definitions of fact, pow, and fib are regularly corecursive. For instance, the call fact() returns (z, z 7→ (x[+]y)[∗]z, x 7→ 0 : (x[+]y0 ), y 7→ 1 : y, y0 7→ 1 : y0 ). The definition of incr is non-recursive, hence always converges, and the call incr(s) returns (x, {x 7→ s[+]y, y 7→ 1 : y}). The following alternative definition incr_reg ( s ) = ( s (0)+1): incr_reg ( s ^) relies, instead, on regular corecursion. Note the difference: the latter version ensures termination only for regular streams, as in incr_reg(one_two()), since, eventually, in the recursive call, the expression s^ turns out to denote the initial stream; however, the computation does not terminate for non-regular streams, as in incr_reg(nat()), which, however, converges with incr. The following function computes the stream of partial sums of the first i + 1 elements of a stream s, that is, sum(s)(i)= ik=0 s(k): P sum ( s ) = s (0):( s ^[+] sum ( s )) Such a function is useful for computing streams whose elements approximate a series with increasing precision; for instance, the following function returns the stream of partial sums of the first i + 1 elements of the Taylor series of the exponential function: sum_expn ( n ) = sum ( pow ( n )[/] fact ()) Function sum_expn calls sum with the argument pow(n)[/]fact() corresponding to the stream of all terms of the Taylor series of the exponential function; hence, by accessing the i-th element of the stream, we have the following approximation: nk n2 n3 n4 ni Pi sum expn(n)(i)= k=0 k! = 1 + n + 2! + 3! + 4! + · · · + i! Lastly, we present a couple of examples showing how it is possible to define primitive operations provided by IoT platforms for real time analysis of data streams; we start with aggr(n,s), which allows aggregation (by addition) of contiguous data in the stream s w.r.t. a frame of length n: aggr (n , s ) = if n <=0 then repeat (0) else s [+] aggr (n -1 , s ^) For instance, aggr(3,s) returns the stream s0 s.t. s0 (i) = s(i)+s(i+1)+s(i+2). On top of aggr, we can easily define avg(n,s) to compute the stream of average values of s in the frame of length n: avg (n , s ) = aggr (n , s )[/] repeat ( n ) Enhanced Regular Corecursion for Data Streams 9 4 Well-defined environments and equivalent streams In the semantic rules, we have left unspecified two notions: well-defined environ- ments, and equivalent streams. We provide now a formal definition in abstract terms. Then, we provide an operational definition of well-defined environments. Semantically, a stream σ is an infinite sequence of numeric values, that is, a function which returns, for each index i ≥ 0, the i-th element σ(i). Given a result (s, ρ), we get a stream by instantiating variables in s with streams, in a way consistent with ρ, and evaluating operators. To make this formal, we need some preliminary definitions. A substitution θ is a function from a finite set of variables to streams. We denote by JsKθ the stream obtained by applying θ to s, and evaluating operators, as formally defined below. JxKθ = θ(x) ( n i=0 (Jn : sKθ)(i) = (JsKθ)(i − 1) i≥1 (JsˆKθ)(i) = JsKθ(i + 1) i≥0 (Js1 [op]s2 Kθ)(i) = Js1 Kθ(i) op Js2 Kθ(i) i≥0 Given an environment ρ and a substitution θ with domain vars(ρ), the sub- stitution ρ[θ] is defined by: ( Jρ(x)Kθ x ∈ dom(ρ) ρ[θ](x) = θ(x) x ∈ fv(ρ) Then, a solution of ρ is a substitution θ with domain vars(ρ) such that ρ[θ] = θ. A closed environment ρ is well-defined if it has exactly one solution, de- noted sol(ρ). For instance, {x 7→ 1 : x} and {y 7→ 0 : (y[+]x), x 7→ 1 : x} are well- defined, since their unique solutions map x to the infinite stream of ones, and y to the stream of natural numbers, respectively. Instead, for {x 7→ 1[+]x} there are no solutions. Lastly, an environment can be undetermined: for instance, a substitution mapping x into an arbitrary stream is a solution of {x 7→ x}. An open environment ρ is well-defined if, for each θ with domain fv(ρ), it has exactly one solution θ0 such that θ ⊆ θ0 . For instance, the open environment {y 7→ 0 : (y[+]x)} is well-defined. Given a closed result (s, ρ), with ρ well-defined, we define its semantics by JsKρ = JsKθ for θ = sol(ρ). Then, two stream values s and s0 are semantically equivalent in ρ if JsKρ = Js0 Kρ0 . We now consider the non-trivial problem of ensuring that a closed environ- ment ρ is well-defined; if environments would be allowed to contain only the stream constructor, then it would suffice to require all non-free variables to be guarded by the stream constructor [5]. For instance, the environment {x 7→ 1 : x} satisfies such a syntactic condition, and is well-defined, while in the non well- defined environment {x 7→ x} the variable x is not guarded by the constructor. 10 D. Ancona et al. However, when non constructors as the tail and pointwise operators come into play, the fact that variables are guarded by the stream constructor no longer ensures that the environment is well-defined; for instance, the environment ρ = {x 7→ 0 : xˆ} corresponding to the definition of bad_stream in Sect. 1 is not well- defined since it admits infinite solutions (all streams starting with 0), although variable x is guarded by the stream constructor. A more complex check is needed: roughly, more constructors than tail operators should have been traversed when we find a cyclic reference, as formalized in Fig. 3. m :: = x1 7→ n1 . . . xn 7→ nk (n ≥ 0) map from variables to natural numbers wd(x, ρ{x 7→ v}, ∅) wd(ρ(x), ρ, m{x 7→ 0}) (main) (wf-var) x 6∈ dom(m) wd (ρ, x, v) wd(x, ρ, m) x ∈ dom(m) (wf-corec) (wf-fv) x 6∈ dom(ρ) wd(x, ρ, m) m(x) > 0 wd(x, ρ, m) wd(s, ρ, m +1 ) wd(s, ρ, m −1 ) wd(s1 , ρ, m) wd(s2 , ρ, m) (wf-cons) (wf-tail) (wf-pw) wd(n : s, ρ, m) wd(sˆ, ρ, m) wd(s1 [op]s2 , ρ, m) Fig. 3. Operational definition of well-defined environments The judgment wd (ρ, x, s) used in the side condition of rule (invk) holds if wd(x, ρ{x 7→ v}, ∅) holds. The judgment wd(s, ρ, ∅) means that a result is well- defined. That is, restricting the domain of ρ to the variables reachable from s (those either occurring in s, or, transitively, in values associated with reachable variables) we get a well-defined environment; thus, wd (ρ, x, s) holds if adding the association of s with x preserves well-definedness of ρ. The additional argument m in the judgment wd(s, ρ, m) is a map from vari- ables to natural numbers. We write m +1 and m −1 for the maps {(x, m(x) + 1) | x ∈ dom(m)}, and {(x, m(x) − 1) | x ∈ dom(m)}, respectively. In rule (main), this map is initially empty. In rule (wf-var), a variable x defined in the environment is added in the map, with initial value 0, the first time it is found. In rule (wf-corec), when it is found the second time, it is checked that more constructors than tail operators have been traversed. In rule (wf-fv), a free variable is considered well-defined.5 In rules (wf-cons), (wf-tail), and (wf-pw), the value associated with a variable is incremented/decremented by one each time a constructor and tail operator are traversed, respectively. As an example of derivation of well-definedness and access to the i-th element, in Fig. 4 we consider the result (x, {x 7→ 0 : (x [+] y), y 7→ 1 : y}), obtained by evaluating the call nat() with nat defined as in Sect. 3. 5 Indeed, non-well-definedness can only be detected on closed results. Enhanced Regular Corecursion for Data Streams 11 (wf-corec) wd(y, ρ, {x 7→ 1, y 7→ 1}) (wf-cons) wd(1 : y, ρ, {x 7→ 1, y 7→ 0}) (wf-corec) (wf-var) wd(x, ρ, {x 7→ 1}) wd(y, ρ, {x 7→ 1}) (wf-pw) wd(x [+] y, ρ, {x 7→ 1}) (wf-cons) wd(0 : (x [+] y), ρ, {x 7→ 0}) (wf-var) wd(x, ρ, ∅) (at-cons-0) (at-cons-0) at ρ (0 : (x [+] y), 0) = 0 at ρ (1 : y, 0) = 1 .. .. . . (at-var) (at-var) at ρ (x, i − 1) = i − 1 at ρ (y, i − 1) = 1 (at-op) at ρ (x [+] y, i − 1) = i (at-cons-n) at ρ (0 : (x [+] y), i) = i (at-var) at ρ (x, i) = i Fig. 4. Derivations for ρ = (x, {x 7→ 0 : (x [+] y), y 7→ 1 : y}). In the extended version we show the derivation for a trickier example, that is, the result (x, {x 7→ 0 : 1 : (2 : xˆ)ˆ}). Its semantics is the stream 0, 1, 1, 1, . . .. We show now that well-definedness of a result is a necessary and sufficient condition for termination of access to an arbitrary index. To formally express and prove this statement, we introduce some definitions and notations. First of all, since the numeric value obtained as result is not relevant for the following technical treatment, for simplicity we will write at ρ (s, i) rather than at ρ (s, i) = n. We call derivation an either finite or infinite proof tree. We write wd(s0 , ρ, m 0 ) ` wd(s, ρ, m) to mean that wd(s0 , ρ, m 0 ) is a premise of a (meta-)rule where wd(s, ρ, m) is the consequence, and `? for the reflexive and transitive closure of this relation. Moreover, wd(x, ρ, m 0 ) `?X wd(s, ρ, m), with x 6∈ X , means that in the path there can be nodes of shape wd(y, ρ, ) only for y ∈ X and non-repeated. We use analogous notations for the judgment at ρ (s, i). Lemma 1. 1. If at ρ (x, i0 ) `?X at ρ (s, i), then at ρ (x, i0 + k) `?X at ρ (s, i + k), for each k ≥ 0. 2. A judgment wd(s, ρ, ∅) has no derivation iff the following condition holds: (wf-stuck) wd(x, ρ, m 0 ) `?X 0 wd(ρ(x), ρ, m{x 7→ 0}) ` wd(x, ρ, m) `?X wd(s, ρ, ∅) for some x ∈ dom(ρ), X 0 , X , and m 0 , m s.t. x 6∈ dom(m), m 0 (x) = k ≤ 0. 3. The derivation of at ρ (s, j) is infinite iff the following condition holds: (at-∞) at ρ (x, i + k) `?X 0 at ρ (ρ(x), i) ` at ρ (x, i) `?X at ρ (s, j) for some x ∈ dom(ρ), X 0 , X , and i, k ≥ 0. Lemma 2. For x ∈ dom(m), the following conditions are equivalent: 1. at ρ (x, i0 ) `?X at ρ (s, i) for some i0 , i 2. wd(x, ρ, m 0 ) `?X wd(s, ρ, m) for some m 0 such that m 0 (x) = m(x) + i − i0 . 12 D. Ancona et al. Proof. 1⇒ 2 By induction on the length of the path in at ρ (x, i0 ) `?X at ρ (s, i). Base The length of the path is 0, hence we have at ρ (x, i) `?∅ at ρ (x, i). We also have wd(x, ρ, m) `?∅ wd(x, ρ, m), and m(x) = m(x) + i − i, as requested. Inductive step By cases on the rule applied to derive at ρ (s, i). We show the most significant cases. (at-var) We have at ρ (y, i), with y 6= x since the length of the path is > 0, and at ρ (x, i0 ) `?χ\{y} at ρ (ρ(y), i). Moreover, we can derive wd(y, ρ, m) by rule (wf-var), and by induc- tive hypothesis we also have wd(x, ρ, m 0 ) `?χ\{y} wd(ρ(y), ρ, m{y 7→ 0}), and m 0 (x) = m{y 7→ 0}(x) + i − i0 , hence we get the thesis. (at-cons-0) Empty case, since the derivation for at ρ (n : s, 0) does not contain a node at ρ (x, i0 ). (at-cons) We have at ρ (n : s, i), and at ρ (x, i0 ) `?X at ρ (s, i − 1). More- over, we can derive wd(n : s, ρ, m) by rule (wf-cons), and by in- ductive hypothesis we also have wd(x, ρ, m 0 ) `?X wd(s, ρ, m +1 ), with m 0 (x) = m +1 (x) + (i − 1) − i0 , hence we get the thesis. 2⇒ 1 By induction on the length of the path in wd(x, ρ, m 0 ) `? wd(s, ρ, m). Base The length of the path is 0, hence we have wd(x, ρ, m) `?∅ wd(x, ρ, m). We also have, for an arbitrary i, at ρ (x, i) `?∅ at ρ (x, i), and m(x) = m(x)+ i − i, as requested. Inductive step By cases on the rule applied to derive wd(s, ρ, m). We show the most significant cases. (wf-var) We have wd(y, ρ, m), with y ∈ / dom(m), y 6= x since x ∈ dom(m), and wd(x, ρ, m 0 ) `?X \{y} wd(ρ(y), ρ, m{y 7→ 0}). By inductive hy- pothesis we have at ρ (x, i0 ) `?X \{y} at ρ (ρ(y), i) for some i0 , i such that m 0 (x) = m(x) + i − i0 . Moreover, since y ∈ dom(ρ), at ρ (ρ(y), i) ` at ρ (y, i) by rule (at-var), hence we get at ρ (x, i0 ) `?χ at ρ (y, i). (wf-corec) Empty case, since the derivation for wd(y, ρ, m) would not contain a node wd(x, ρ, m). (wf-fv) Empty case, since the derivation for wd(y, ρ, m) would not con- tain a node wd(x, ρ, m). (wf-cons) We have wd(n : s, ρ, m), and wd(x, ρ, m 0 ) `?X wd(s, ρ, m +1 ). By inductive hypothesis we have at ρ (x, i0 ) `?X at ρ (s, i) for some i0 , i such that m 0 (x) = m +1 (x)+i−i0 . Moreover, at ρ (s, i) ` at ρ (n:s, i+1) by rule (at-cons-n), hence we get at ρ (x, i0 ) `?χ at ρ (n : s, i + 1) with m 0 (x) = m(x) + i + 1 − i0 , as requested. Lemma 3. For x 6∈ dom(m), the following conditions are equivalent: 1. at ρ (x, i0 ) `?X at ρ (s, i) for some i0 , i 2. wd(x, ρ, m 0 ) `?X wd(s, ρ, m) for some m 0 such that x 6∈ dom(m 0 ). Proof. Easy variant of the proof of Lemma 2. Theorem 1. wd(s, ρ, ∅) is derivable iff, for all j, at ρ (s, j) either has no deriva- tion or a finite derivation. Enhanced Regular Corecursion for Data Streams 13 Proof. We prove that at ρ (s, j) has an infinite derivation for some j iff wd(s, ρ, ∅) has no derivation. ⇒ By Lemma 1-(3),we have that the following condition holds: (at-∞) at ρ (x, i + k) `?X 0 at ρ (ρ(x), i) ` at ρ (x, i) `?X at ρ (s, j) for some x ∈ dom(ρ), X 0 , X , and i, k ≥ 0. Then, starting from the right, by Lemma 3 we have wd(x, ρ, m) `?X wd(s, ρ, ∅) for some m such that x 6∈ dom(m); by rule (wf-var) we have wd(ρ(x), ρ, m{x 7→ 0}) ` wd(x, ρ, m), and finally by Lemma 2 we have: (wf-stuck) wd(x, ρ, m 0 ) `?X 0 wd(ρ(x), ρ, m{x 7→ 0}) ` wd(x, ρ, m) `?X wd(s, ρ, ∅) for x ∈ dom(ρ), X 0 , X , and m 0 , m s.t. x 6∈ dom(m), m 0 (x) = k ≤ 0. hence we get the thesis. ⇐ By Lemma 1-(2),we have that the condition (wf-stuck) above holds. Then, starting from the left, by Lemma 2 we have at ρ (x, i0 ) `?X 0 at ρ (ρ(x), i) for some i0 , i such that i − i0 = k ≤ 0; by rule (at-var) we have at ρ (ρ(x), i) ` at ρ (x, i), and by Lemma 3 we have at ρ (x, j 0 ) `?X at ρ (s, j) for some j 0 , j. If i = j 0 + h, h ≥ 0, then by Lemma 1-(1) we have at ρ (x, i + k) `?X 0 at ρ (ρ(x), i) ` at ρ (x, i) `?X at ρ (s, j + h) 0 If j = i + h, h ≥ 0, then by Lemma 1-(1) we have at ρ (x, i + k + h) `?X 0 at ρ (ρ(x), i) ` at ρ (x, i + h) `?X at ρ (s, j). In both cases, the derivation of at ρ (s, j) is infinite. 5 Related and future work As mentioned in Sect. 1, our approach extends regular corecursion, where the semantics keeps track of method/function calls. Regular corecursion originated from co-SLD resolution [16,17,1,3], where already considered goals (up to unifi- cation), called coinductive hypotheses, are considered successful. Language con- structs that support this programming style have also been proposed in the functional [11] and object-oriented [4,2] paradigm. There have been a few attempts of extending the expressive power of regu- lar corecursion. Notably, structural resolution [12,13] is a proposed operational semantics for logic programming where infinite derivations that cannot be built in finite time are generated lazily, and only partial answers are shown. Another approach is the work on infinite trees [6], where Courcelle introduces algebraic trees and equations as generalizations of regular ones. For the operators considered in the calculus and some examples, our main sources of inspiration have been the works of Rutten [14], where a coinductive calculus of streams of real numbers is defined, and Hinze [9], where a calculus of generic streams is defined in a constructive way and implemented in Haskell. The problem of ensuring well-defined corecursive definitions has been also considered in the context of type theory and proof assistants. We have shown in Sect. 4 that simple guarded definitions [5] do not work properly in case values are allowed to contain non constuctors as the tail operator; a more complex approach based on a type system has been proposed by Sacchini [15] for an extension of the calculus of constructions which is more expressive than that considered here; 14 D. Ancona et al. however, as opposed to what happens with the judgment wd defined in Sect. 4, corecursive calls to the result of an application of tail are never well-typed even in case of well-defined streams as happens for the definition of fib as given in Sect. 3. Lastly, D’Angelo et al. presented LOLA [8], a specification language for run- time monitoring that manipulates streams. The general idea behind the frame- work is to generate a set of output streams, starting from a given set of input streams. The main difference with respect to our work is that LOLA only allows streams with a finite number of elements. In this framework, well-formedness is checked by relying on a dependency graph, which keeps track of relations be- tween the processed streams. The vertices of this graph are the streams, while the edges represent the dependencies between them. Each edge is weighted with a value ω to point the fact that a stream depends on another one shifted by ω positions. Then, the well-formedness constraint is that each closed-walk inside the graph must have a total weight different from 0. These syntactic constraints appear to be very similar to the approach we used for predicate wd . Our main technical result is Theorem 1, stating that passing the well-definedness check performed at runtime for each function call is necessary and sufficient to prevent non-termination in accessing elements in the resulting stream at an ar- bitrary index. In future work, we plan to also prove soundness of the operational well-definedness with respect to its abstract definition. Completeness does not hold, as shown by the example zeros() = repeat(0) [*] zeros() which is not well-defined operationally, but admits as unique solution the stream of all zeros. On the other hand, the simplest operational characterization of equivalence of stream values is syntactic equivalence. This works for all the examples presented in this paper, but is, again, not complete with respect to the abstract definition, as illustrated below: first ( s ) = s (0): first ( s ) // works with syntactic equivalence first2 ( s ) = s (0): first2 ( s (0): s ^) // does not work Indeed, we get an infinite derivation for first2(repeat(1)) with call traces of increasing shape {first2(x) 7→ y1 , first2(1:x^) 7→ y2 , first2(1:(1:x^)^) 7→ y3 , . . .} in the environment {x 7→ 1 : x}. In future work we plan to investigate more ex- pressive operational characterizations of equivalence. References 1. Davide Ancona. Regular corecursion in Prolog. Computer Languages, Systems & Structures, 39(4):142–162, 2013. 2. Davide Ancona, Pietro Barbieri, Francesco Dagnino, and Elena Zucca. Sound regu- lar corecursion in coFJ. In Robert Hirschfeld and Tobias Pape, editors, ECOOP’20 - Object-Oriented Programming, volume 166 of LIPIcs, pages 1:1–1:28. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. 3. Davide Ancona and Agostino Dovier. A theoretical perspective of coinductive logic programming. Fundamenta Informaticae, 140(3-4):221–246, 2015. 4. Davide Ancona and Elena Zucca. Corecursive Featherweight Java. In FTfJP’12 - Formal Techniques for Java-like Programs, pages 3–10. ACM Press, 2012. Enhanced Regular Corecursion for Data Streams 15 5. Thierry Coquand. Infinite objects in type theory. In Types for Proofs and Programs, International Workshop TYPES’93, Nijmegen, The Netherlands, May 24-28, 1993, Selected Papers, pages 62–78, 1993. 6. Bruno Courcelle. Fundamental properties of infinite trees. Theoretical Computer Science, 25:95–169, 1983. 7. Francesco Dagnino, Davide Ancona, and Elena Zucca. Flexible coinductive logic programming. Theory and Practice of Logic Programming, 20(6):818–833, 2020. Issue for ICLP 2020. 8. Ben D’Angelo, Sriram Sankaranarayanan, César Sánchez, Will Robinson, Bernd Finkbeiner, Henny B. Sipma, Sandeep Mehrotra, and Zohar Manna. LOLA: run- time monitoring of synchronous systems. In 12th International Symposium on Temporal Representation and Reasoning (TIME 2005), pages 166–174, 2005. 9. Ralf Hinze. Concrete stream calculus: An extended study. Journal of Functional Programming, 20(56):463535, 2010. 10. Jean-Baptiste Jeannin and Dexter Kozen. Computing with capsules. Journal of Automata, Languages and Combinatorics, 17(2-4):185–204, 2012. 11. Jean-Baptiste Jeannin, Dexter Kozen, and Alexandra Silva. CoCaml: Functional programming with regular coinductive types. Fundamenta Informaticae, 150:347– 377, 2017. 12. Ekaterina Komendantskaya, Patricia Johann, and Martin Schmidt. A productivity checker for logic programming. In Manuel V. Hermenegildo and Pedro López- Garcı́a, editors, Logic-Based Program Synthesis and Transformation - LOPSTR 2016, Revised Selected Papers, volume 10184 of Lecture Notes in Computer Science, pages 168–186. Springer, 2016. 13. Ekaterina Komendantskaya, John Power, and Martin Schmidt. Coalgebraic logic programming: from semantics to implementation. J. Log. Comput., 26(2):745–783, 2016. 14. Jan J. M. M. Rutten. A coinductive calculus of streams. Mathematical Structures in Computer Science, 15(1):93–147, 2005. 15. Jorge Luis Sacchini. Type-based productivity of stream definitions in the calculus of constructions. In Symposium on Logic in Computer Science, LICS 2013, pages 233–242, 2013. 16. Luke Simon. Extending logic programming with coinduction. PhD thesis, University of Texas at Dallas, 2006. 17. Luke Simon, Ajay Bansal, Ajay Mallya, and Gopal Gupta. Co-logic programming: Extending logic programming with coinduction. In Lars Arge, Christian Cachin, Tomasz Jurdzinski, and Andrzej Tarlecki, editors, Automata, Languages and Pro- gramming, 34th International Colloquium, ICALP 2007, volume 4596 of Lecture Notes in Computer Science, pages 472–483. Springer, 2007.