=Paper= {{Paper |id=Vol-3072/paper22 |storemode=property |title=Enhanced Regular Corecursion for Data Streams |pdfUrl=https://ceur-ws.org/Vol-3072/paper22.pdf |volume=Vol-3072 |authors=Davide Ancona,Pietro Barbieri,Elena Zucca |dblpUrl=https://dblp.org/rec/conf/ictcs/AnconaBZ21 }} ==Enhanced Regular Corecursion for Data Streams== https://ceur-ws.org/Vol-3072/paper22.pdf
                 Enhanced Regular Corecursion
                      for Data Streams?

                 Davide Ancona, Pietro Barbieri, and Elena Zucca

                              DIBRIS, University of Genova



        Abstract. We propose a simple calculus for processing data streams (in-
        finite flows of data series), represented by finite sets of equations built on
        stream operators. Furthermore, functions defining streams are regularly
        corecursive, that is, cyclic calls are detected, avoiding non-termination
        as happens with ordinary recursion in the call-by-value evaluation strat-
        egy. As we illustrate by several examples, the combination of such two
        mechanisms provides a good compromise between expressive power and
        decidability. Notably, we provide an algorithm to check that the stream
        returned by a function call is represented by a well-defined set of equa-
        tions which actually admits a unique solution, hence access to an arbi-
        trary element of the returned stream will never diverge.

        Keywords: Operational semantics, stream programming, regular terms.


1     Introduction

Applications often deal with data structures which are conceptually infinite,
among those data streams (infinite flows of data series) are a mainstream exam-
ple: as we venture deeper into the Internet of Things (IoT) era, stream processing
is becoming increasingly important. Indeed, all main IoT platforms provide em-
bedded and integrated engines for real time analysis of potentially infinite flowing
data series; such a process occurs before the data is stored for efficiency and, as
often happens in Computer Science, there is a trade-off between the expressive
power of the language, the efficiency of its implementation and the decidability
of properties important to guarantee reliability and tractability.
    Another related important problem is data stream generation, which is es-
sential to test complex distributed IoT systems; the deterministic simulation of
sensor data streams through a suitable language offers a practical solution to
IoT testing and favors early detection of some kinds of bugs that can be fixed
more easily before the deployment of the whole system.
    A well-established solution to data stream generation and processing is lazy
evaluation, as supported, e.g., in Haskell, and most stream libraries offered by
mainstream languages, as java.util.stream. In this approach, conceptually infi-
nite data streams are the result of a function or method call, which is evaluated
?
    Copyright c 2021 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
2         D. Ancona et al.

according to the call-by-need strategy. For instance, in Haskell we can define
one_two = 1:2:one_two, or even represent the list of natural numbers as from 0,
where from n = n:from(n+1). However, such a great expressive power comes at a
cost; let us consider, for instance, the definition bad_stream = 0:tail bad_stream.
The Haskell compiler does not complain about this definition, and no problem
arises at runtime as long as the manipulation of bad_stream requires only its first
element to be accessed; anyway, any operation which needs to inspect bad_stream
at a deeper level is deemed to diverge. Unfortunately, it is not decidable to check,
even at runtime, whether the stream returned by a Haskell function is well-
defined, that is, all of its elements can be computed1 ; indeed, the full expressive
power of Haskell can be used to define streams by means of recursive functions.
For similar reasons, it is not decidable to check at runtime whether the streams
returned by two Haskell functions are equal.
    More recently, a complementary approach has been considered in different
programming paradigms — functional [11], logic [17,1,7], and object-oriented [2]
— based on the following two ideas:
    – Infinite streams can be finitely represented by finite sets of equations involv-
      ing only the stream constructor, e.g., x = 1 : 2 : x. Such a representation
      corresponds to what has been called by Courcelle in its seminal paper [6] a
      regular, a.k.a. rational, tree, that is, a tree with possibly infinite depth but
      a finite set of subtrees.
    – Functions are regularly corecursive, that is, execution keeps track of pending
      function calls, so that, when the same call is considered the second time, this
      is detected, avoiding non-termination as happens with ordinary recursion in
      the call-by-value evaluation strategy.
In this way, the Haskell stream one_two can be equivalently obtained by the call2
one_two(), with the function one_two defined by one_two() = 1:2:one_two(). In-
deed, with regular corecursion the result of this call is the value corresponding
to the unique solution of the equation x = 1 : 2 : x. On the other hand, since the
expressive power is limited to regular streams, it is not possible to define a core-
cursive function whose call returns the stream of natural numbers, as happens
for the from 0 Haskell example. However, there exist procedures for checking
well-defined streams and their equality, even with tractable algorithms.
    In this paper, we propose a simple calculus of numeric streams which supports
regular corecursion and goes beyond regular streams by extending equations with
other typical stream operators besides the stream constructor: tail and pointwise
operators can be contained in stream equations and are therefore not evaluated.
    In this way, we are able to achieve a good compromise between expressive
power and decidability. Notably:
    – the extended shape of equations allows the definition of functions which
      return non-regular streams; for instance, it is possible to obtain the stream of
1
    This is what is also known as a productive corecursive definition [5].
2
    Differently from Haskell, for simplicity in our calculus functions are uncurried, hence
    they take as arguments possibly empty tuples, delimited by parentheses.
                               Enhanced Regular Corecursion for Data Streams                3

   natural numbers as from(0), by defining from(n)=n:(from(n)[+]repeat(1)),
   with [+] the pointwise addition on numeric streams and repeat the function
   defined by repeat(n)=n:repeat(n);
 – there exists a decidable procedure to dynamically check whether the stream
   returned by a corecursive function is well-defined;
 – however, it is not possible to express all streams computable with the lazy
   evaluation approach, but only those which have a specific structure (that is,
   can be expressed as the unique solution of a set of equations built with the
   above mentioned operators).
    In Sect. 2 we define the calculus, in Sect. 3 we show examples, and in
Sect. 4 we provide an operational characterization of well-defined streams, proved
sufficient and necessary for an access to an arbitrary index to never diverge.
In Sect. 5 we discuss related and further work. An extended version includ-
ing more examples of derivations and complete proofs can be found at http:
//arxiv.org/abs/2108.00281.


2     Stream calculus
Fig. 1 shows the syntax of the calculus.


fd :: = fd1 . . . fdn                                                    program
fd :: = f(x) = se                                                        function declaration
e :: = se | ne | be                                                      expression
se :: = x | if be then se1 else se2 | ne : se | seˆ | se1 [op]se2 | f(e) stream expression
ne :: = x | se(ne) | ne1 op ne2 | 0 | 1 | 2 | ...                        numeric expression
be :: = x | true | false | ...                                           boolean expression
op :: = + | − | ∗ | /                                                    numeric operation

                             Fig. 1. Stream calculus: syntax


     A program is a sequence of (mutually recursive) function declarations, for
simplicity assumed to only return streams. Stream expressions are variables,
conditional expressions, expressions built by stream operators, and function calls.
We consider the following stream operators: constructor (prepending a numeric
element), tail, and pointwise arithmetic operations. Numeric expressions include
the access to the i-th3 element of a stream. We use fd to denote a sequence
fd1 , . . . , fdn of function declarations, and analogously for other sequences.
     The operational semantics, given in Fig. 2, is based on two key ideas:
 1. (some) infinite streams are represented in a finite way
 2. evaluation keeps trace of already considered function calls

3
    For simplicity, here indexing and numeric expressions coincide, even though indexes
    are expected to be natural numbers, while values in streams can range over a larger
    numeric domain.
4             D. Ancona et al.

c :: = f(v)                            (evaluated) call
v :: = s | n | b                       value
s :: = x | n : s | sˆ | s1 [op]s2      (open) stream value
i, n :: = 0 | 1 | 2 | ...              index, numeric value
b :: = true | false                    boolean value
τ :: = c1 7→ x1 . . . cn 7→ xn (n ≥ 0) call trace
ρ :: = x1 7→ s1 . . . xn 7→ sn (n ≥ 0) environment


                                     be, ρ, τ ⇓ (true, ρ) se1 , ρ, τ ⇓ (s, ρ0 )                        be, ρ, τ ⇓ (false, ρ) se2 , ρ, τ ⇓ (s, ρ0 )
(val)                       (if-t)                                                            (if-f)
        v, ρ, τ ⇓ (v, ρ)             if be then se1 else se2 , ρ, τ ⇓ (s, ρ0 )                         if be then se1 else se2 , ρ, τ ⇓ (s, ρ0 )


         ne, ρ, τ ⇓ (n, ρ) se, ρ, τ ⇓ (s, ρ0 )                         se, ρ, τ ⇓ (s, ρ0 )                  se1 , ρ, τ ⇓ (s1 , ρ1 ) se2 , ρ, τ ⇓ (s2 , ρ2 )
(cons)                                                       (tail)                                  (pw)
             ne : se, ρ, τ ⇓ (n : s, ρ0 )                             seˆ, ρ, τ ⇓ (sˆ, ρ0 )                  se1 [op]se2 , ρ, τ ⇓ (s1 [op]s2 , ρ1 t ρ2 )

         ei , ρ, τ ⇓ (vi , ρi ) ∀i ∈ 1..n        f(v), ρb, τ ⇓ (s, ρ0 ) e = e1 , . . . , en not of shape v
(args)                                                                  v = vF1 , . . . , vn
                              f(e), ρ, τ ⇓ (s, ρ0 )
                                                                        ρb = i∈1..n ρi

                                            f(v) 6∈ dom(τ≈ρ )
       se[v/x], ρ, τ {f(v) 7→ x} ⇓ (s, ρ0 ) x fresh
(invk)                                                                                 (corec)                            τ≈ρ (f (v)) = x
          f(v), ρ, τ ⇓ (x, ρ0 {x 7→ s})     fbody(f) = (x, se)                                    f(v), ρ, τ ⇓ (x, ρ)
                                            wd (ρ0 , x, s)
       se, ρ, τ ⇓ (s, ρ0 ) ne, ρ, τ ⇓ (i, ρ)
(at)                                                     at ρ0 (s, i) = n
                se(ne), ρ, τ ⇓ (n, ρ)


            at ρ (ρ(x), i) = n0                                                                   at ρ (s, i − 1) = n0
(at-var)                                 (at-cons-0)                                (at-cons-n)                        i>0
             at ρ (x, i) = n0                          at ρ (n : s, 0) = n                        at ρ (n : s, i) = n0

            at ρ (s, i + 1) = n                    at ρ (s1 , i) = n1 at ρ (s2 , i) = n2
(at-tail)                                (at-pw)
             at ρ (sˆ, i) = n                         at ρ (s1 [op]s2 , i) = n1 op n2

                             Fig. 2. Stream calculus: operational semantics




    To obtain (1), our approach is inspired by capsules [10], which are essentially
expressions supporting cyclic references. That is, the result of the evaluation of a
stream expression is a pair (s, ρ), where s is an (open) stream value, built on top
of stream variables, numeric values, the stream constructor, the tail destructor
and the pointwise arithmetic operators, and ρ is an environment mapping a finite
set of variables into stream values. In this way, cyclic streams can be obtained:
for instance, (x, x 7→ n : x) denotes the stream constantly equal to n.

    We denote by vars(ρ) the set of variables occurring in ρ, by fv(ρ) the set of
its free variables, that is, vars(ρ) \ dom(ρ), and say that ρ is closed if fv(ρ) = ∅,
open otherwise, and analogously for a result (v, ρ).
                                 Enhanced Regular Corecursion for Data Streams                     5

    To obtain point (2) above, evaluation has an additional parameter which is
a call trace, a map from function calls where arguments are values (dubbed calls
for short in the following) into variables.
    Altogether, the semantic judgment has shape e, ρ, τ ⇓ (v, ρ0 ), where e is the
expression to be evaluated, ρ the current environment defining possibly cyclic
stream values that can occur in e, τ the call trace, and (v, ρ0 ) the result. The se-
mantic judgments should be indexed by an underlying (fixed) program, omitted
for sake of simplicity. Rules use the following auxiliary definitions:
 – ρ t ρ0 is the union of two environments, which is well-defined if they have
   disjoint domains; ρ{x 7→ s} is the environment which gives s on x, coincides
   with ρ elsewhere; we use analogous notations for call traces.
 – se[v/x] is obtained by parallel substitution of variables x with values v.
 – fbody(f) returns the pair of the parameters and the body of the declaration
   of f, if any, in the assumed program.
Moreover, the rules are parametric in the following other judgments, for which
different definitions will be discussed in Sect. 4:
 – wd (ρ, x, s), that is, by adding the association x 7→ s to the (well-defined)
   environment ρ, we still get a well-defined environment.
 – v ≈ρ v0 , that is, the two values are equivalent in the environment4 ρ. Then,
   τ≈ρ is the extension of τ up to equivalence in ρ: τ≈ρ (f (v1 , . . . , vn )) = x iff
   there exist v01 , . . . , v0n such that τ (f (v01 , . . . , v0n )) = x and vi ≈ρ v0i for i ∈ 1..n.
     Intuitively, a closed result (s, ρ) is well-defined if it denotes a unique stream
(infinite sequence of numeric values), and a closed environment ρ is well-defined
if, for each x ∈ dom(ρ), (x, ρ) is well-defined. In other words, the corresponding
set of equations admits a unique solution. For instance, the environment {x 7→ x}
is not well-defined, since it is undetermined (any stream satisfies the equation
x = x); the environment {x 7→ x[+]y, y 7→ 1 : y} is not well-defined as well, since
it is undefined (the two equations x = x 7→ x[+]y, y = 1 : y admit no solutions
for x). Finally, two stream values s and s0 such that the results (s, ρ) and (s0 , ρ)
are closed and well-defined are equivalent if they denote the same stream.
     These notions can be generalized to open results and environments, assuming
that free variables denote unique streams, as will be formalized in Sect. 4.
     Rules for values and conditional are straightforward. In rules (cons), (tail)
and (pw), arguments are evaluated, while the stream operator is applied without
any further evaluation; the fact that the tail and pointwise operators are treated
as the stream constructor : is crucial to get results which denote non-regular
streams as shown in Sect. 3. However, when non-constructors are allowed to
occur in values, ensuring well-defined results become more challenging, because
the usual simple syntactic constraints that can be safely used for constructors
[5] no longer work (see more details in Sect. 4 and 5).
     The rules for function call use a mechanism of cycle detection, similar to
that in [2]. They are given in a modular way. That is, evaluation of arguments
is handled by a separate rule (args).
4
    This equivalence is assumed to be the identity on numeric and boolean values.
6       D. Ancona et al.

    Rule (invk) is applied when a call is considered for the first time, as expressed
by the first side condition. The body is retrieved by using the auxiliary function
fbody, and evaluated in a call trace where the call has been mapped into a fresh
variable. Then, it is checked that adding the association from such variable to
the result of the evaluation of the body keeps the environment well-defined. If
the check succeeds, then the final result consists of the variable associated with
the call and the updated environment. For simplicity, here execution is stuck if
the check fails; an implementation should raise a runtime error instead.

    Rule (corec) is applied when a call is considered for the second time, as
expressed by the first side condition (note that cycle detection takes place up to
equivalence in the environment). The variable x is returned as result. However,
there is no associated value in the environment yet; in other words, the result
(x, ρ) is open at this point. This means that x is undefined until the environment
is updated with the corresponding value in rule (invk). However, x can be safely
used as long as the evaluation does not require x to be inspected; for instance, x
can be safely passed as an argument to a function call.

    For instance, if we consider f()=g() g()=1:f(), then the judgment f(), ∅, ∅ ⇓
(x, ρ), with ρ = {x 7→ y, y 7→ 1 : x}, is derivable; however, while the final result
(x, ρ) is closed, the derivation contains also judgments with open results, as, e.g.,
f(), ∅, {f() 7→ x, g() 7→ y} ⇓ (x, ∅) and g(), ∅, {f() 7→ x} ⇓ (y, {y 7→ 1 : x}).

    As another example, if we consider f()=g(2:f()) g(s)=1:s, then the deriva-
tion of the judgment f(), ∅, ∅ ⇓ (x, ρ) with ρ = {x 7→ y, y 7→ 1 : 2 : x} is built on
top of the derivation of g(2:x ), ∅, {f() 7→ x} ⇓ (y, {y 7→ 1 : 2 : x}), corresponding
to the evaluation of g(2:x ) where x is an operand of the stream constructor
whose result is passed as argument to the call to g, despite x is not defined yet.

    Finally, rule (at) computes the i-th element of a stream expression. After
evaluation of the arguments, the numeric result is obtained by the auxiliary
judgment at ρ (s, i) = n, inductively defined in the bottom part of the figure. If
the stream value is a variable, rule (at-var), then the evaluation is propagated to
the associated stream value in the environment, if any. If, instead, the variable is
free in the environment, then execution is stuck; again, an implementation should
raise a runtime error instead. If the stream value is built by the constructor, then
the result is the first element of the stream if the index is 0, rule (at-cons-0);
otherwise, the evaluation is recursively propagated to its tail with the predecessor
index, rule (at-cons-n). Conversely, if the stream is built by the tail operator, rule
(at-tail), then the evaluation is recursively propagated to the stream argument
with the successor index. Finally, if the stream is built by a pointwise operation,
rule (at-pw), then the evaluation is recursively propagated to the operands with
the same index and then the corresponding arithmetic operation is computed on
the results.

    Derivations of examples in this section can be found in the extended version.
                                                       Enhanced Regular Corecursion for Data Streams         7

3        Examples

First we show some simple examples, to explain how regular corecursion works.
Then we provide some more significant examples.
   Consider the following function declarations:
repeat ( n ) = n : repeat ( n )
one_two () = 1: two_one ()
two_one () = 2: one_two ()

With the standard semantics of recursion, the calls, e.g., repeat(0) and one_two()
lead to non-termination. Thanks to regular corecursion, instead, these calls ter-
minate, producing as result (x, {x 7→ 0 : x}), and (x, {x 7→ 1 : y, y 7→ 2 : x}), re-
spectively. Indeed, when initially invoked, the call repeat(0) is added in the call
trace with an associated fresh variable, say x. In this way, when evaluating the
body of the function, the recursive call is detected as cyclic, the variable x is
returned as its result, and, finally, the stream value 0 : x is associated in the
environment with the result x of the initial call. The evaluation of one_two() is
analogous, except that another fresh variable y is generated for the intermediate
call two_one(). The formal derivations are given below.
                                          (value)        (corec)
                                                         repeat(0), ∅, {repeat(0) 7→ x} ⇓ (x, ∅)
                                 (cons)
                                               0 : repeat(0), ∅, {repeat(0) 7→ x} ⇓ (0 : x, ∅)
                        (invk)
                                                    repeat(0), ∅, ∅ ⇓ (x, {x 7→ 0 : x})

                                                   (value)     (corec)
                                                              one two(), ∅, {one two() 7→ x, two one() 7→ y} ⇓ (x, ∅)
                                          (cons)
                                                    2 : one two(), ∅, {one two() 7→ x, two one() 7→ y} ⇓ (2 : x, ∅)
                  (value)        (invk)
                                                         two one(), ∅, {one two() 7→ x} ⇓ (y, {y 7→ 2 : x})
         (cons)
                                              1 : two one(), ∅, {one two() 7→ x} ⇓ (1 : y, {y 7→ 2 : x})
(invk)
                                                  one two(), ∅, ∅ ⇓ (x, {x 7→ 1 : y, y 7→ 2 : x})

   For space reasons, we did not report the application of rule (value). In both
derivations, note that rule (corec) is applied, without evaluating the body once
more, when the cyclic call is detected.
   The following examples show function definitions whose calls return non-
regular streams, notably, the natural numbers, the natural numbers raised to
the power of a number, the factorials, the powers of a number, the Fibonacci
numbers, and the stream obtained by pointwise increment by one.
nat () = 0:( nat ()[+] repeat (1))
nat_to_pow ( n ) =                          // nat_to_pow ( n )( i )= i ^ n
  if n <= 0 then repeat (1) else nat_to_pow (n -1)[*] nat ()
fact () = 1:(( nat ()[+] repeat (1))[*] fact ())
pow ( n ) = 1:( repeat ( n )[*] pow ( n )) // pow ( n )( i )= n ^ i
fib () = 0:1:( fib ()[+] fib ()^)
incr ( s ) = s [+] repeat (1)
8        D. Ancona et al.

    The definition of nat uses regular corecursion, since the recursive call nat()
is cyclic. Hence the call nat() returns (x, {x 7→ 0 : (x[+]y), y 7→ 1 : y}). The def-
inition of nat_to_pow is a standard inductive one where the argument strictly
decreases in the recursive call. Hence, the call, e.g., nat_to_pow(2), returns
  (x2 , {x2 7→ x1 [∗]x, x1 7→ x0 [∗]x, x0 7→ y, y 7→ 1 : y, x 7→ 0 : (x[+]y0 ), y0 7→ 1 : y0 }).
The definitions of fact, pow, and fib are regularly corecursive. For instance,
the call fact() returns (z, z 7→ (x[+]y)[∗]z, x 7→ 0 : (x[+]y0 ), y 7→ 1 : y, y0 7→ 1 : y0 ).
The definition of incr is non-recursive, hence always converges, and the call
incr(s) returns (x, {x 7→ s[+]y, y 7→ 1 : y}). The following alternative definition

incr_reg ( s ) = ( s (0)+1): incr_reg ( s ^)

relies, instead, on regular corecursion. Note the difference: the latter version
ensures termination only for regular streams, as in incr_reg(one_two()), since,
eventually, in the recursive call, the expression s^ turns out to denote the initial
stream; however, the computation does not terminate for non-regular streams,
as in incr_reg(nat()), which, however, converges with incr.
    The following function computes the stream of partial sums of the first i + 1
elements of a stream s, that is, sum(s)(i)= ik=0 s(k):
                                              P

sum ( s ) = s (0):( s ^[+] sum ( s ))

Such a function is useful for computing streams whose elements approximate
a series with increasing precision; for instance, the following function returns
the stream of partial sums of the first i + 1 elements of the Taylor series of the
exponential function:
sum_expn ( n ) = sum ( pow ( n )[/] fact ())

Function sum_expn calls sum with the argument pow(n)[/]fact() corresponding to
the stream of all terms of the Taylor series of the exponential function; hence, by
accessing the i-th element of the stream, we have the following approximation:

                                  nk           n2   n3   n4           ni
                           Pi
    sum expn(n)(i)=           k=0 k! = 1 + n + 2! + 3! + 4! + · · · + i!


Lastly, we present a couple of examples showing how it is possible to define
primitive operations provided by IoT platforms for real time analysis of data
streams; we start with aggr(n,s), which allows aggregation (by addition) of
contiguous data in the stream s w.r.t. a frame of length n:
aggr (n , s ) = if n <=0 then repeat (0) else s [+] aggr (n -1 , s ^)

   For instance, aggr(3,s) returns the stream s0 s.t. s0 (i) = s(i)+s(i+1)+s(i+2).
On top of aggr, we can easily define avg(n,s) to compute the stream of average
values of s in the frame of length n:
avg (n , s ) = aggr (n , s )[/] repeat ( n )
                                Enhanced Regular Corecursion for Data Streams      9

4   Well-defined environments and equivalent streams

In the semantic rules, we have left unspecified two notions: well-defined environ-
ments, and equivalent streams. We provide now a formal definition in abstract
terms. Then, we provide an operational definition of well-defined environments.
    Semantically, a stream σ is an infinite sequence of numeric values, that is,
a function which returns, for each index i ≥ 0, the i-th element σ(i). Given a
result (s, ρ), we get a stream by instantiating variables in s with streams, in a
way consistent with ρ, and evaluating operators. To make this formal, we need
some preliminary definitions.
    A substitution θ is a function from a finite set of variables to streams. We
denote by JsKθ the stream obtained by applying θ to s, and evaluating operators,
as formally defined below.

    JxKθ = θ(x)
                      (
                          n               i=0
    (Jn : sKθ)(i) =
                          (JsKθ)(i − 1)   i≥1
    (JsˆKθ)(i) = JsKθ(i + 1)        i≥0
    (Js1 [op]s2 Kθ)(i) = Js1 Kθ(i) op Js2 Kθ(i)   i≥0

    Given an environment ρ and a substitution θ with domain vars(ρ), the sub-
stitution ρ[θ] is defined by:
               (
                 Jρ(x)Kθ x ∈ dom(ρ)
    ρ[θ](x) =
                 θ(x)      x ∈ fv(ρ)

Then, a solution of ρ is a substitution θ with domain vars(ρ) such that ρ[θ] = θ.
    A closed environment ρ is well-defined if it has exactly one solution, de-
noted sol(ρ). For instance, {x 7→ 1 : x} and {y 7→ 0 : (y[+]x), x 7→ 1 : x} are well-
defined, since their unique solutions map x to the infinite stream of ones, and y
to the stream of natural numbers, respectively. Instead, for {x 7→ 1[+]x} there
are no solutions. Lastly, an environment can be undetermined: for instance, a
substitution mapping x into an arbitrary stream is a solution of {x 7→ x}.
    An open environment ρ is well-defined if, for each θ with domain fv(ρ), it
has exactly one solution θ0 such that θ ⊆ θ0 . For instance, the open environment
{y 7→ 0 : (y[+]x)} is well-defined.
    Given a closed result (s, ρ), with ρ well-defined, we define its semantics by
JsKρ = JsKθ for θ = sol(ρ). Then, two stream values s and s0 are semantically
equivalent in ρ if JsKρ = Js0 Kρ0 .
    We now consider the non-trivial problem of ensuring that a closed environ-
ment ρ is well-defined; if environments would be allowed to contain only the
stream constructor, then it would suffice to require all non-free variables to be
guarded by the stream constructor [5]. For instance, the environment {x 7→ 1 : x}
satisfies such a syntactic condition, and is well-defined, while in the non well-
defined environment {x 7→ x} the variable x is not guarded by the constructor.
10           D. Ancona et al.

    However, when non constructors as the tail and pointwise operators come
into play, the fact that variables are guarded by the stream constructor no longer
ensures that the environment is well-defined; for instance, the environment ρ =
{x 7→ 0 : xˆ} corresponding to the definition of bad_stream in Sect. 1 is not well-
defined since it admits infinite solutions (all streams starting with 0), although
variable x is guarded by the stream constructor. A more complex check is needed:
roughly, more constructors than tail operators should have been traversed when
we find a cyclic reference, as formalized in Fig. 3.


m :: = x1 7→ n1 . . . xn 7→ nk (n ≥ 0) map from variables to natural numbers


         wd(x, ρ{x 7→ v}, ∅)                 wd(ρ(x), ρ, m{x 7→ 0})
(main)                            (wf-var)                          x 6∈ dom(m)
            wd (ρ, x, v)                          wd(x, ρ, m)

                         x ∈ dom(m)
(wf-corec)                                      (wf-fv)                 x 6∈ dom(ρ)
             wd(x, ρ, m) m(x) > 0                         wd(x, ρ, m)


            wd(s, ρ, m +1 )                 wd(s, ρ, m −1 )         wd(s1 , ρ, m) wd(s2 , ρ, m)
(wf-cons)                       (wf-tail)                   (wf-pw)
            wd(n : s, ρ, m)                 wd(sˆ, ρ, m)               wd(s1 [op]s2 , ρ, m)

                   Fig. 3. Operational definition of well-defined environments



    The judgment wd (ρ, x, s) used in the side condition of rule (invk) holds if
wd(x, ρ{x 7→ v}, ∅) holds. The judgment wd(s, ρ, ∅) means that a result is well-
defined. That is, restricting the domain of ρ to the variables reachable from s
(those either occurring in s, or, transitively, in values associated with reachable
variables) we get a well-defined environment; thus, wd (ρ, x, s) holds if adding the
association of s with x preserves well-definedness of ρ.
    The additional argument m in the judgment wd(s, ρ, m) is a map from vari-
ables to natural numbers. We write m +1 and m −1 for the maps {(x, m(x) + 1) |
x ∈ dom(m)}, and {(x, m(x) − 1) | x ∈ dom(m)}, respectively.
    In rule (main), this map is initially empty. In rule (wf-var), a variable x
defined in the environment is added in the map, with initial value 0, the first
time it is found. In rule (wf-corec), when it is found the second time, it is
checked that more constructors than tail operators have been traversed. In rule
(wf-fv), a free variable is considered well-defined.5 In rules (wf-cons), (wf-tail),
and (wf-pw), the value associated with a variable is incremented/decremented
by one each time a constructor and tail operator are traversed, respectively.
    As an example of derivation of well-definedness and access to the i-th element,
in Fig. 4 we consider the result (x, {x 7→ 0 : (x [+] y), y 7→ 1 : y}), obtained by
evaluating the call nat() with nat defined as in Sect. 3.

5
     Indeed, non-well-definedness can only be detected on closed results.
                                                      Enhanced Regular Corecursion for Data Streams                                    11

                                                                                                (wf-corec)
                                                                                   wd(y, ρ, {x 7→ 1, y 7→ 1})
                                                                                    (wf-cons)
                                                                               wd(1 : y, ρ, {x 7→ 1, y 7→ 0})
                           (wf-corec)                        (wf-var)
                                      wd(x, ρ, {x 7→ 1})                         wd(y, ρ, {x 7→ 1})
                   (wf-pw)
                                                          wd(x [+] y, ρ, {x 7→ 1})
         (wf-cons)
                                                      wd(0 : (x [+] y), ρ, {x 7→ 0})
(wf-var)
                                                            wd(x, ρ, ∅)

                                              (at-cons-0)                                                    (at-cons-0)
                                                            at ρ (0 : (x [+] y), 0) = 0                                    at ρ (1 : y, 0) = 1
                                                                       ..                                                        ..
                                                                        .                                                         .
                                   (at-var)                                                       (at-var)
                                                       at ρ (x, i − 1) = i − 1                                     at ρ (y, i − 1) = 1
                         (at-op)
                                                                         at ρ (x [+] y, i − 1) = i
           (at-cons-n)
                                                                      at ρ (0 : (x [+] y), i) = i
(at-var)
                                                                         at ρ (x, i) = i

                    Fig. 4. Derivations for ρ = (x, {x 7→ 0 : (x [+] y), y 7→ 1 : y}).


     In the extended version we show the derivation for a trickier example, that
is, the result (x, {x 7→ 0 : 1 : (2 : xˆ)ˆ}). Its semantics is the stream 0, 1, 1, 1, . . ..
     We show now that well-definedness of a result is a necessary and sufficient
condition for termination of access to an arbitrary index. To formally express
and prove this statement, we introduce some definitions and notations.
     First of all, since the numeric value obtained as result is not relevant for the
following technical treatment, for simplicity we will write at ρ (s, i) rather than
at ρ (s, i) = n. We call derivation an either finite or infinite proof tree.
     We write wd(s0 , ρ, m 0 ) ` wd(s, ρ, m) to mean that wd(s0 , ρ, m 0 ) is a premise of
a (meta-)rule where wd(s, ρ, m) is the consequence, and `? for the reflexive and
transitive closure of this relation. Moreover, wd(x, ρ, m 0 ) `?X wd(s, ρ, m), with
x 6∈ X , means that in the path there can be nodes of shape wd(y, ρ, ) only for
y ∈ X and non-repeated. We use analogous notations for the judgment at ρ (s, i).

Lemma 1.
 1. If at ρ (x, i0 ) `?X at ρ (s, i), then at ρ (x, i0 + k) `?X at ρ (s, i + k), for each k ≥ 0.
 2. A judgment wd(s, ρ, ∅) has no derivation iff the following condition holds:
    (wf-stuck) wd(x, ρ, m 0 ) `?X 0 wd(ρ(x), ρ, m{x 7→ 0}) ` wd(x, ρ, m) `?X wd(s, ρ, ∅)
                      for some x ∈ dom(ρ), X 0 , X , and m 0 , m s.t.
                      x 6∈ dom(m), m 0 (x) = k ≤ 0.
 3. The derivation of at ρ (s, j) is infinite iff the following condition holds:
    (at-∞) at ρ (x, i + k) `?X 0 at ρ (ρ(x), i) ` at ρ (x, i) `?X at ρ (s, j)
               for some x ∈ dom(ρ), X 0 , X , and i, k ≥ 0.

Lemma 2. For x ∈ dom(m), the following conditions are equivalent:
 1. at ρ (x, i0 ) `?X at ρ (s, i) for some i0 , i
 2. wd(x, ρ, m 0 ) `?X wd(s, ρ, m) for some m 0 such that m 0 (x) = m(x) + i − i0 .
12       D. Ancona et al.

Proof.
1⇒ 2 By induction on the length of the path in at ρ (x, i0 ) `?X at ρ (s, i).
  Base The length of the path is 0, hence we have at ρ (x, i) `?∅ at ρ (x, i). We also
      have wd(x, ρ, m) `?∅ wd(x, ρ, m), and m(x) = m(x) + i − i, as requested.
  Inductive step By cases on the rule applied to derive at ρ (s, i). We show
      the most significant cases.
      (at-var) We have at ρ (y, i), with y 6= x since the length of the path is
          > 0, and at ρ (x, i0 ) `?χ\{y} at ρ (ρ(y), i).
          Moreover, we can derive wd(y, ρ, m) by rule (wf-var), and by induc-
          tive hypothesis we also have wd(x, ρ, m 0 ) `?χ\{y} wd(ρ(y), ρ, m{y 7→ 0}),
          and m 0 (x) = m{y 7→ 0}(x) + i − i0 , hence we get the thesis.
      (at-cons-0) Empty case, since the derivation for at ρ (n : s, 0) does not
          contain a node at ρ (x, i0 ).
      (at-cons) We have at ρ (n : s, i), and at ρ (x, i0 ) `?X at ρ (s, i − 1). More-
          over, we can derive wd(n : s, ρ, m) by rule (wf-cons), and by in-
          ductive hypothesis we also have wd(x, ρ, m 0 ) `?X wd(s, ρ, m +1 ), with
          m 0 (x) = m +1 (x) + (i − 1) − i0 , hence we get the thesis.
2⇒ 1 By induction on the length of the path in wd(x, ρ, m 0 ) `? wd(s, ρ, m).
  Base The length of the path is 0, hence we have wd(x, ρ, m) `?∅ wd(x, ρ, m).
      We also have, for an arbitrary i, at ρ (x, i) `?∅ at ρ (x, i), and m(x) = m(x)+
      i − i, as requested.
  Inductive step By cases on the rule applied to derive wd(s, ρ, m). We show
      the most significant cases.
      (wf-var) We have wd(y, ρ, m), with y ∈       / dom(m), y 6= x since x ∈ dom(m),
          and wd(x, ρ, m 0 ) `?X \{y} wd(ρ(y), ρ, m{y 7→ 0}). By inductive hy-
          pothesis we have at ρ (x, i0 ) `?X \{y} at ρ (ρ(y), i) for some i0 , i such that
          m 0 (x) = m(x) + i − i0 . Moreover, since y ∈ dom(ρ), at ρ (ρ(y), i) `
          at ρ (y, i) by rule (at-var), hence we get at ρ (x, i0 ) `?χ at ρ (y, i).
      (wf-corec) Empty case, since the derivation for wd(y, ρ, m) would not
          contain a node wd(x, ρ, m).
      (wf-fv) Empty case, since the derivation for wd(y, ρ, m) would not con-
          tain a node wd(x, ρ, m).
      (wf-cons) We have wd(n : s, ρ, m), and wd(x, ρ, m 0 ) `?X wd(s, ρ, m +1 ).
          By inductive hypothesis we have at ρ (x, i0 ) `?X at ρ (s, i) for some i0 , i
          such that m 0 (x) = m +1 (x)+i−i0 . Moreover, at ρ (s, i) ` at ρ (n:s, i+1)
          by rule (at-cons-n), hence we get at ρ (x, i0 ) `?χ at ρ (n : s, i + 1) with
          m 0 (x) = m(x) + i + 1 − i0 , as requested.
Lemma 3. For x 6∈ dom(m), the following conditions are equivalent:
1. at ρ (x, i0 ) `?X at ρ (s, i) for some i0 , i
2. wd(x, ρ, m 0 ) `?X wd(s, ρ, m) for some m 0 such that x 6∈ dom(m 0 ).
Proof. Easy variant of the proof of Lemma 2.
Theorem 1. wd(s, ρ, ∅) is derivable iff, for all j, at ρ (s, j) either has no deriva-
tion or a finite derivation.
                             Enhanced Regular Corecursion for Data Streams          13

Proof. We prove that at ρ (s, j) has an infinite derivation for some j iff wd(s, ρ, ∅)
has no derivation.
⇒ By Lemma 1-(3),we have that the following condition holds:
          (at-∞) at ρ (x, i + k) `?X 0 at ρ (ρ(x), i) ` at ρ (x, i) `?X at ρ (s, j)
                 for some x ∈ dom(ρ), X 0 , X , and i, k ≥ 0.
  Then, starting from the right, by Lemma 3 we have wd(x, ρ, m) `?X wd(s, ρ, ∅)
  for some m such that x 6∈ dom(m); by rule (wf-var) we have
  wd(ρ(x), ρ, m{x 7→ 0}) ` wd(x, ρ, m), and finally by Lemma 2 we have:
  (wf-stuck) wd(x, ρ, m 0 ) `?X 0 wd(ρ(x), ρ, m{x 7→ 0}) ` wd(x, ρ, m) `?X wd(s, ρ, ∅)
                  for x ∈ dom(ρ), X 0 , X , and m 0 , m s.t. x 6∈ dom(m), m 0 (x) = k ≤ 0.
  hence we get the thesis.
⇐ By Lemma 1-(2),we have that the condition (wf-stuck) above holds. Then,
  starting from the left, by Lemma 2 we have at ρ (x, i0 ) `?X 0 at ρ (ρ(x), i) for some
  i0 , i such that i − i0 = k ≤ 0; by rule (at-var) we have at ρ (ρ(x), i) ` at ρ (x, i),
  and by Lemma 3 we have at ρ (x, j 0 ) `?X at ρ (s, j) for some j 0 , j. If i = j 0 + h,
  h ≥ 0, then by Lemma 1-(1) we have
                   at ρ (x, i + k) `?X 0 at ρ (ρ(x), i) ` at ρ (x, i) `?X at ρ (s, j + h)
        0
  If j = i + h, h ≥ 0, then by Lemma 1-(1) we have
               at ρ (x, i + k + h) `?X 0 at ρ (ρ(x), i) ` at ρ (x, i + h) `?X at ρ (s, j).
  In both cases, the derivation of at ρ (s, j) is infinite.


5    Related and future work
As mentioned in Sect. 1, our approach extends regular corecursion, where the
semantics keeps track of method/function calls. Regular corecursion originated
from co-SLD resolution [16,17,1,3], where already considered goals (up to unifi-
cation), called coinductive hypotheses, are considered successful. Language con-
structs that support this programming style have also been proposed in the
functional [11] and object-oriented [4,2] paradigm.
    There have been a few attempts of extending the expressive power of regu-
lar corecursion. Notably, structural resolution [12,13] is a proposed operational
semantics for logic programming where infinite derivations that cannot be built
in finite time are generated lazily, and only partial answers are shown. Another
approach is the work on infinite trees [6], where Courcelle introduces algebraic
trees and equations as generalizations of regular ones.
    For the operators considered in the calculus and some examples, our main
sources of inspiration have been the works of Rutten [14], where a coinductive
calculus of streams of real numbers is defined, and Hinze [9], where a calculus of
generic streams is defined in a constructive way and implemented in Haskell.
    The problem of ensuring well-defined corecursive definitions has been also
considered in the context of type theory and proof assistants. We have shown in
Sect. 4 that simple guarded definitions [5] do not work properly in case values are
allowed to contain non constuctors as the tail operator; a more complex approach
based on a type system has been proposed by Sacchini [15] for an extension of
the calculus of constructions which is more expressive than that considered here;
14      D. Ancona et al.

however, as opposed to what happens with the judgment wd defined in Sect. 4,
corecursive calls to the result of an application of tail are never well-typed even
in case of well-defined streams as happens for the definition of fib as given in
Sect. 3.
    Lastly, D’Angelo et al. presented LOLA [8], a specification language for run-
time monitoring that manipulates streams. The general idea behind the frame-
work is to generate a set of output streams, starting from a given set of input
streams. The main difference with respect to our work is that LOLA only allows
streams with a finite number of elements. In this framework, well-formedness is
checked by relying on a dependency graph, which keeps track of relations be-
tween the processed streams. The vertices of this graph are the streams, while
the edges represent the dependencies between them. Each edge is weighted with
a value ω to point the fact that a stream depends on another one shifted by ω
positions. Then, the well-formedness constraint is that each closed-walk inside
the graph must have a total weight different from 0. These syntactic constraints
appear to be very similar to the approach we used for predicate wd .
    Our main technical result is Theorem 1, stating that passing the well-definedness
check performed at runtime for each function call is necessary and sufficient to
prevent non-termination in accessing elements in the resulting stream at an ar-
bitrary index. In future work, we plan to also prove soundness of the operational
well-definedness with respect to its abstract definition. Completeness does not
hold, as shown by the example zeros() = repeat(0) [*] zeros() which is not
well-defined operationally, but admits as unique solution the stream of all zeros.
On the other hand, the simplest operational characterization of equivalence of
stream values is syntactic equivalence. This works for all the examples presented
in this paper, but is, again, not complete with respect to the abstract definition,
as illustrated below:
first ( s ) = s (0): first ( s ) // works with syntactic equivalence
first2 ( s ) = s (0): first2 ( s (0): s ^) // does not work
    Indeed, we get an infinite derivation for first2(repeat(1)) with call traces of
increasing shape {first2(x) 7→ y1 , first2(1:x^) 7→ y2 , first2(1:(1:x^)^) 7→ y3 , . . .}
in the environment {x 7→ 1 : x}. In future work we plan to investigate more ex-
pressive operational characterizations of equivalence.

References
 1. Davide Ancona. Regular corecursion in Prolog. Computer Languages, Systems &
    Structures, 39(4):142–162, 2013.
 2. Davide Ancona, Pietro Barbieri, Francesco Dagnino, and Elena Zucca. Sound regu-
    lar corecursion in coFJ. In Robert Hirschfeld and Tobias Pape, editors, ECOOP’20
    - Object-Oriented Programming, volume 166 of LIPIcs, pages 1:1–1:28. Schloss
    Dagstuhl - Leibniz-Zentrum für Informatik, 2020.
 3. Davide Ancona and Agostino Dovier. A theoretical perspective of coinductive logic
    programming. Fundamenta Informaticae, 140(3-4):221–246, 2015.
 4. Davide Ancona and Elena Zucca. Corecursive Featherweight Java. In FTfJP’12 -
    Formal Techniques for Java-like Programs, pages 3–10. ACM Press, 2012.
                             Enhanced Regular Corecursion for Data Streams          15

 5. Thierry Coquand. Infinite objects in type theory. In Types for Proofs and Programs,
    International Workshop TYPES’93, Nijmegen, The Netherlands, May 24-28, 1993,
    Selected Papers, pages 62–78, 1993.
 6. Bruno Courcelle. Fundamental properties of infinite trees. Theoretical Computer
    Science, 25:95–169, 1983.
 7. Francesco Dagnino, Davide Ancona, and Elena Zucca. Flexible coinductive logic
    programming. Theory and Practice of Logic Programming, 20(6):818–833, 2020.
    Issue for ICLP 2020.
 8. Ben D’Angelo, Sriram Sankaranarayanan, César Sánchez, Will Robinson, Bernd
    Finkbeiner, Henny B. Sipma, Sandeep Mehrotra, and Zohar Manna. LOLA: run-
    time monitoring of synchronous systems. In 12th International Symposium on
    Temporal Representation and Reasoning (TIME 2005), pages 166–174, 2005.
 9. Ralf Hinze. Concrete stream calculus: An extended study. Journal of Functional
    Programming, 20(56):463535, 2010.
10. Jean-Baptiste Jeannin and Dexter Kozen. Computing with capsules. Journal of
    Automata, Languages and Combinatorics, 17(2-4):185–204, 2012.
11. Jean-Baptiste Jeannin, Dexter Kozen, and Alexandra Silva. CoCaml: Functional
    programming with regular coinductive types. Fundamenta Informaticae, 150:347–
    377, 2017.
12. Ekaterina Komendantskaya, Patricia Johann, and Martin Schmidt. A productivity
    checker for logic programming. In Manuel V. Hermenegildo and Pedro López-
    Garcı́a, editors, Logic-Based Program Synthesis and Transformation - LOPSTR
    2016, Revised Selected Papers, volume 10184 of Lecture Notes in Computer Science,
    pages 168–186. Springer, 2016.
13. Ekaterina Komendantskaya, John Power, and Martin Schmidt. Coalgebraic logic
    programming: from semantics to implementation. J. Log. Comput., 26(2):745–783,
    2016.
14. Jan J. M. M. Rutten. A coinductive calculus of streams. Mathematical Structures
    in Computer Science, 15(1):93–147, 2005.
15. Jorge Luis Sacchini. Type-based productivity of stream definitions in the calculus
    of constructions. In Symposium on Logic in Computer Science, LICS 2013, pages
    233–242, 2013.
16. Luke Simon. Extending logic programming with coinduction. PhD thesis, University
    of Texas at Dallas, 2006.
17. Luke Simon, Ajay Bansal, Ajay Mallya, and Gopal Gupta. Co-logic programming:
    Extending logic programming with coinduction. In Lars Arge, Christian Cachin,
    Tomasz Jurdzinski, and Andrzej Tarlecki, editors, Automata, Languages and Pro-
    gramming, 34th International Colloquium, ICALP 2007, volume 4596 of Lecture
    Notes in Computer Science, pages 472–483. Springer, 2007.