=Paper=
{{Paper
|id=Vol-1335/wflp2014_paper6
|storemode=property
|title=A Partial Evaluator for Curry
|pdfUrl=https://ceur-ws.org/Vol-1335/wflp2014_paper6.pdf
|volume=Vol-1335
|dblpUrl=https://dblp.org/rec/conf/wlp/HanusP14
}}
==A Partial Evaluator for Curry==
<pdf width="1500px">https://ceur-ws.org/Vol-1335/wflp2014_paper6.pdf</pdf>
<pre>
                     A Partial Evaluator for Curry

                           Michael Hanus and Björn Peemöller

                  Institut für Informatik, CAU Kiel, D-24098 Kiel, Germany
                         {mh|bjp}@informatik.uni-kiel.de


       Abstract. We present a partial evaluator for functional logic programs written in
       Curry. In contrast to previous approaches to the partial evaluation of functional
       logic programs, we take into account the features used in contemporary Curry
       programs, in particular, non-deterministic operations and recursive let expres-
       sions. For this purpose, we base our partial evaluator on FlatCurry, an intermedi-
       ate language for the representation of Curry programs. We sketch our approach
       and present initial benchmarks of our implementation.


1   Introduction
Partial evaluation of programs is a technique to anticipate the evaluation of computa-
tions once at compile time instead of performing them (possibly several times) at run
time. This is possible if some part of the input data, also called static data, is known at
compile time. In this case, some parts of the program are evaluated so that a residual
program, i.e., a specialized version of the original one, is returned. Since some compu-
tations have been performed at compile time, the run time of the specialized program
could be considerably decreased. The static data does not need to be some user input,
but can also be subexpressions in the original program. Offline partial evaluators ob-
tain information about static data from a separate static analysis phase (binding-time
analysis), whereas online partial evaluators obtain this information on the fly and prop-
agate it during the partial evaluation process. In this work we follow the online partial
evaluation approach.
    Partial evaluation has already been studied for different kinds of programming lan-
guages, like functional languages, logic languages, as well as for combined functional
logic languages. An interesting aspect of the partial evaluation of functional logic pro-
grams is the fact that the effects of supercompilation [23] can be obtained by applying
the operational semantics of the source language (narrowing) at partial evaluation time
[5]: if a function is called with unknown arguments, narrowing instantiates these ar-
guments such that the rules defining this function can be applied. Hence, one mainly
needs to control the partial evaluator, i.e., avoiding infinite unfoldings and instantiations
of logic variables, in order to obtain residual programs.
    Thanks to this insight, partial evaluators for functional logic languages can be con-
structed with techniques similarly to the implementation of these languages. For in-
stance, Albert et al. [3] proposed a partial evaluator for Curry [17] based on the inter-
mediate language FlatCurry. Since FlatCurry makes the evaluation strategy of Curry
programs explicit [1], the use of FlatCurry led to a partial evaluator able to optimize
practical Curry programs. Unfortunately, when this partial evaluator was constructed,
the use of non-deterministic operations, although proposed some years ago [14], was
not well established. Therefore, the partial evaluation scheme was based on term rewrit-
ing and restricted to confluent programs, i.e., all operations were required to be deter-
ministic, and recursive let expressions were also not taken into account. Thus, if this
partial evaluator is applied to programs containing non-deterministic operations, which
is a useful programming pattern in contemporary functional logic programs [6,8], the
resulting programs are not semantically equivalent to the source programs.
    In order to deal with realistic Curry programs, it is crucial for a partial evaluator to
cover the full source language, including both logic features such as non-determinism
and functional features such as recursive let expressions. Therefore, we extend in this
work the partial evaluator of Albert et al. [3] to cover the full language of FlatCurry. In
contrast to [3], we base our partial evaluator on an operational semantics [1] which is
adequate for contemporary Curry programs.
    We start with an introduction to the functional logic language Curry in Sect. 2 before
we sketch the structure of the partial evaluator in Sect. 3. The partial evaluation scheme
is presented in Sect. 4, whereas control issues are discussed in Sect. 5. We evaluate our
implementation with some benchmarks in Sect. 6 before we conclude in Sect. 7.

2     Curry
We briefly review the basic concepts of the functional logic language Curry. More de-
tails can be found in recent surveys on functional logic programming [7,15] and in the
language report [17].
     The syntax of Curry [17] is close to Haskell [21], i.e., type variables and names
of defined operations usually start with lowercase letters and the names of type and
data constructors start with an uppercase letter. The application of an operation f to an
expression e is denoted by juxtaposition (“f e”). In addition to Haskell, Curry allows
free (logic) variables in rules and initial expressions. If the data type of Booleans and a
negation operation are defined by
    data Bool = False | True
    not True = False
    not False = True
the expression “not x where x free” non-deterministically reduces to False with
the binding x = True, and to True with the binding x = False. A further kind of non-
determinism is supported in Curry by the choice operator “?”, which can be considered
as predefined by the overlapping rules
    x ? _ = x
    _ ? y = y
Thus, we can define a non-deterministic operation coin yielding the values 0 and 1 by
    coin = 0 ? 1
    If non-deterministic operations are used as arguments in other operations, a seman-
tical ambiguity might occur. Consider the operation
    double x = x + x
and the expression “double coin”. If we evaluated this expression by term rewriting,
we could have the reduction
    double coin    →    coin + coin       →    0 + coin      →    0 + 1     →    1
leading to the unintended result 1. Note that this result cannot be obtained with a strict
reduction strategy where arguments are evaluated prior to the function calls. In order
to avoid dependencies on the evaluation strategies and exclude such unintended results,
Curry is based on the rewriting logic CRWL, proposed by González-Moreno et al. [14]
as a logical (execution- and strategy-independent) foundation for declarative program-
ming with non-strict and non-deterministic operations. This logic specifies the call-time
choice semantics [18] where values of the arguments of an operation are determined be-
fore the operation is evaluated. In a lazy strategy, this can be enforced by sharing actual
arguments. For instance, the expression above can be lazily evaluated provided that all
occurrences of coin are shared so that all of them consistently reduce to either 0 or 1.

3     Overview of the Partial Evaluator
Before describing the details of the partial evaluation process, we provide an overview
of the partial evaluator and its usage. Since our partial evaluator is an extension of the
first partial evaluator for Curry described in [3], our representation is oriented towards
the original description.
     Our partial evaluator is intended to specialize some parts of a given input program
in order to create an optimized, residual program. In order to support the specification
of expressions to be optimized, we assume that these expressions are annotated with
PEVAL. For example, we assume a program which contains the function definition
    main xs = map (twice square) xs
We can then annotate the main expression (or parts of it) as follows:
    main xs = PEVAL (map (twice square) xs)
Actually, PEVAL is the identity function, i.e., it has the type a → a. As a consequence,
annotations with PEVAL do not change the semantics of the original program. After
annotating the program, the process of partial evaluation is fully automatic. The process
itself consists of the following phases (depicted in Fig. 1):
 1. The partial evaluator is called for a given program, containing annotated expres-
    sions as described above. This source program is converted into the standard inter-
    mediate representation for Curry programs, called FlatCurry (see Sect. 4.1).
 2. The process continues by extracting the set of annotated expressions and creating a
    copy of the original program without annotations.
 3. Both form the input for the partial evaluation phase, which is later described.
 4. The output of the partial evaluation is a set of semantically equivalent, potentially
    more efficient expressions. These expressions are converted to new function defini-
    tions to allow reuse, a process called renaming.
 5. The evaluation process tends to produce some “intermediate” functions which only
    pass their parameters to another function. Therefore, the process is finished by a
    compression phase which removes such intermediate functions by inlining and sim-
    plifies expressions to produce a more efficient and legible result.
 6. Finally, the annotated expressions of the form (PEVAL e) are replaced with their
    (hopefully more efficient) equivalents e′ , where e′ is the renaming of e. This opti-
    mized program is then stored as a FlatCurry program.
                                                                                     Partial Evaluator
                                                            Annotated
                                                           expressions
       Annotated        (1)       Annotated                                          Residual
                                                  (2)                      (3)
      Curry program           FlatCurry program                                     expressions
                                                        Program without
                                                          annotations                      (4)
                        (6)
          Final                  Compressed                                        New function
    FlatCurry program             definitions                  (5)                  definitions


               Fig. 1. Overview of the partial evaluation process with phases (1) to (6)


For instance, with the usual definitions of map, twice, and square, the example above
is transformed into
    main xs = map0 xs
    map0 xs = case xs of []                → []
                         y:ys              → let z = (y*y) in (z*z) : map0 ys
so that the overhead of the higher-order operations map and twice is eliminated.
    The fact that the partial evaluator internally operates on the FlatCurry format is no
restriction, since this format is used by current Curry compilers anyway, e.g., PAKCS
[16] or KiCS2 [11]. Hence, the partial evaluator can easily be incorporated into a com-
pilation chain.

4      The Partial Evaluation Scheme
As already mentioned, the partial evaluator described in [3] lacks support for two lan-
guage features, namely non-deterministic operations and let expressions. For instance,
consider the definition
    main = PEVAL (double coin)
w.r.t. the definitions of coin and double shown in Sect. 2. The partial evaluator [3]
unfolds the call to double in the body of main to (0 ? 1) + (0 ? 1), so that the
residual program yields the values 0, 1, 1, and 2 for main. However, according to the
call-time choice semantics of Curry [14,18], the correct result would be 0 or 2 but not
1. This problem arises from the residual semantics of the original partial evaluator,
which is based on term-rewriting so that non-determinism in shared subexpressions is
duplicated in the residual programs.
    The second missing feature are (mutually recursive) let expressions, i.e., bindings
where the variables to be bound might occur in the right-hand side of the bindings. For
example, it is not possible to partially evaluate the program
    ones = let ones = 1 : ones in ones
    main = PEVAL (take 2 ones)
One might encounter that this does not impose a real restriction because recursive let-
bindings could be interpreted by recursive function definitions (at the cost of some
overhead). While this is possible for the example above, it is not whenever a non-
deterministic value should be shared. For instance, consider the following program:
    digits = let digits = (0 ? 1) : digits in digits
    main   = PEVAL (take 2 digits)
          P ∶∶= Dm                                                 (program)
          D ∶∶= f (xn ) = e                                (defined function)
          e ∶∶= x                                                   (variable)
              ∣ c(ek )                                      (constructor call)
              ∣ f (ek )                                        (function call)
              ∣ let { xn = en } in e                  (recursive let binding)
              ∣ let xn free in e                              (free variables)
              ∣ e1 ? e2                                         (disjunction)
              ∣ case e of { pk → ek } (case expression, pi pairwise different)
          p ∶∶= c(xn )                                  (constructor pattern)
                       Fig. 2. The FlatCurry representation of programs

Because of the let binding, the decision to bind digit to either 0 or 1 is shared, and, in
consequence, the expression main evaluates to either [0,0] or [1,1]. If we replaced
the definition of digits by a top-level operation, as in
  digits = (0 ? 1) : digits
  main   = PEVAL (take 2 digits)
the expression main would produce the additional results [0,1] and [1,0]. Thus,
recursive let expressions cannot be transformed into operations but must be explicitly
considered by a partial evaluator.
    The usage of both features in contemporary Curry programs is the motivation for us
to develop a new partial evaluator. In contrast to [3], we do not use a semantics based
on term rewriting. Instead, we base our work on the natural semantics for FlatCurry
proposed in [1] which is intended to specify the call-time choice semantics of non-
deterministic operations by modeling a heap structure to express sharing. A similar
semantics has been used in [13] in a partial evaluator for first-order functional programs.
In contrast to our approach, non-determinism, which is essential for Curry, has not been
considered there.


4.1   FlatCurry

FlatCurry is a simple intermediate language used by Curry compilers [11,16]. More-
over, it is also the basis of precise descriptions of the semantics of Curry [3] and
semantics-based tools for Curry (e.g., [2,3,4]). The syntax of this representation is de-
picted in Fig. 2, where we denote a sequence of objects o1 , . . . , on by on . A FlatCurry
program P consists of a sequence of function definitions D such that each function
must be defined by a single rule with a linear left-hand side, i.e., the variables xn must
be pairwise different. The right-hand side of a function definition is an expression e
composed of variables (x, y, z, . . . ), constructors (A, B, C, . . . ), and function calls (f ,
g, h, . . . ). In the following, we denote by φ a constructor c or a function f . For the
sake of simplicity, we assume that literals occurring in the source program, like num-
bers or characters, are represented as nullary constructors. Additionally, we allow local
(mutually recursive) bindings of variables, the introduction of free (logic) variables,
disjunctions (to represent overlapping left-hand sides in the source language), and pat-
tern matching. The patterns pi in case expressions are required to be pairwise different
and only consist of constructors applied to variables. In consequence, nested patterns in
the source language are represented by nested case expressions. For example, the list
concatenation conc is represented in FlatCurry as
     conc(xs,ys) = case xs of { []             → ys
                              ; z:zs           → z : conc(zs,ys) }
    Note that, in contrast to [1], we do not distinguish between flexible and rigid case
expressions. Although they behave differently on free (logic) variables [17], this differ-
ence is not relevant for partial evaluation [3]. Furthermore, we omit the representation
of external functions like arithmetics, which are implemented in the partial evaluator
but do not play a significant role in the evaluation scheme. Finally, we do not consider
higher-order applications in the syntax of FlatCurry since they can be represented by an
operation apply where partial applications are interpreted as constructor calls [3].


4.2     Natural Semantics

We base our partial evaluator on a variant of the operational semantics of FlatCurry
[1], also referred to as the natural semantics of FlatCurry. The semantics uses a heap
structure to specify sharing of expressions and computes the (flat) value of an expression
which is either a logic variable (w.r.t. the associated heap) or a constructor applied to
variables.
                Heap = V → {free, ∎} ⊎ Exp               V alue ∶∶= x ∣ c(xn )
A heap is a partial mapping from a set of variables V to either an expression (Exp is
the set of expressions according to the syntax of FlatCurry), a special symbol “free”
to represent a free variable,1 or a symbol “∎” representing a black hole.2 We denote
the empty heap by [], and the value associated to a variable x in a heap Γ by Γ [x].
Γ [x ↦ e] denotes a heap Γ ′ with Γ ′ [x] = e and Γ ′ [y] = Γ [y] for all y ≠ x.
    We use judgements of the form Γ ∶ e ⇓ ∆ ∶ v which express the fact that “the
expression e under the heap Γ evaluates to the value v and the (possibly modified) heap
∆ ”. The basic inference rules of the natural semantics are depicted in Fig. 3. We briefly
describe these rules and explain the differences to the original version of [1].

(Value) Evaluation of a value directly returns the value without modifying the heap.
(VarExp) This rule implements sharing of subexpressions. If a variable to be evaluated
    is bound to an expression, the expression is evaluated and its value is returned.
    In addition, the heap is updated with the value. During evaluation of the expres-
    sion, the binding is replaced by ∎, in contrast to [1]. This allows the detection of
    black holes and is necessary for the correctness of the semantics [10] (see also
    Appendix A for a detailed explanation).
(Flatten) To correctly implement sharing, arguments of function or constructor calls
    must be represented in the heap. This is usually achieved by a preprocessing step
    called flattening or normalization [1,19], but it can also be performed on demand.
 1
   [1] represents free variables by circular let bindings of the form let {x = x} in e, but
   this prohibits the correct representation of such bindings occurring in the source code.
 2
   We use a special symbol for black holes instead of simply removing the binding for a variable
   (as in [19]) in order to distinguish black holes from unbound variables.
   (Value)     Γ ∶v ⇓ Γ ∶v          where v = c(xn ) or v ∈ V with Γ [v] = free

                   Γ [x ↦ ∎] ∶ e ⇓ ∆ ∶ v
   (VarExp)                                             where e ∉ {free, ∎}
               Γ [x ↦ e] ∶ x ⇓ ∆[x ↦ v] ∶ v
               Γ [y ↦ ei ] ∶ φ(x1 , . . . , xi−1 , y, ei+1 , . . . , ek ) ⇓ ∆ ∶ v
   (Flatten)                                                                          where ei ∉ V, y fresh
                   Γ ∶ φ(x1 , . . . , xi−1 , ei , ei+1 , . . . , ek ) ⇓ ∆ ∶ v
               Γ ∶ σ(e) ⇓ ∆ ∶ v
   (Fun)                                  where f (xn ) = e ∈ P, σ = {xn ↦ yn }
               Γ ∶ f (yn ) ⇓ ∆ ∶ v

                Γ [yk ↦ σ(ek )] ∶ σ(e) ⇓ ∆ ∶ v
   (Let)                                                      where σ = {xk ↦ yk }, yk fresh
               Γ ∶ let { xk = ek } in e ⇓ ∆ ∶ v
                 Γ ∶ ei ⇓ ∆ ∶ v
   (Or)                                     where i ∈ {1, 2}
               Γ ∶ e1 ? e2 ⇓ ∆ ∶ v

                Γ [yn ↦ free] ∶ σ(e) ⇓ ∆ ∶ v
   (Free)                                                  where σ = {xn ↦ yn }, yn fresh
               Γ ∶ let xn free in e ⇓ ∆ ∶ v
               Γ ∶ e ⇓ ∆ ∶ c(yn )    ∆ ∶ σ(ei ) ⇓ Θ ∶ v                             pi = c(xn ),
   (Select)                                                             where
                  Γ ∶ case e of { pk → ek } ⇓ Θ ∶ v                                 σ = {xn ↦ yn }

               Γ ∶ e ⇓ ∆[x ↦ free] ∶ x            ∆[x ↦ σ(pi ), yn ↦ free] ∶ σ(ei ) ⇓ Θ ∶ v
   (Guess)                       Γ ∶ case e of { pk → ek } ⇓ Θ ∶ v
                      where i ∈ {1, . . . , k}, pi = c(xn ), σ = {xn ↦ yn }, yn fresh
                                        Fig. 3. Natural semantics

(Fun) This rule unfolds a function call, where the result is obtained by evaluation of
    the function’s right-hand side. We assume that the program P is a global parameter
    of the calculus. Generally, whenever new variables are introduced by the program,
    we apply a renaming substitution σ to prohibit name clashes.
(Let) The bindings of a let construct are added to the heap after all variables have
    been renamed to fresh variable names.
(Or) This rule non-deterministically chooses one of the arguments to be futher evalu-
    ated. In consequence, this rule introduces non-determinism into the calculus itself.
(Free) Like variables bound to expressions, logic variables are renamed and afterwards
    bound in the heap.
(Select) For case expressions whose argument evaluates to a constructor-rooted term
    the right-hand side of the corresponding alternative is selected and evaluated.
(Guess) For case expressions whose argument evaluates to a logic variable, one of
    the alternatives is non-deterministically chosen to be evaluated. The variable is then
    bound to the corresponding pattern where the variables inside the pattern are also
    bound as logic variables.

4.3   Ensuring Termination
Following the general idea of partial evaluation of functional logic programs [5] as
well as logic programs [20], we evaluate an annotated expression e with a (possibly
incomplete) standard derivation [] ∶ e ⇓ ∆ ∶ e′ . In order to ensure the termination of the
partial evaluation process, we defer the evaluation of some expressions. For example,
consider the program
     loop xs = loop xs
     main xs = PEVAL (loop xs)
The evaluation of the expression main does not terminate due to the recursive function
call to loop. To achieve termination of the partial evaluation process, we modify the
natural semantics as follows:

 1. The evaluation of an expression can be deferred to avoid non-termination.
 2. An operation proceed is used to decide whether a function call should be unfolded
    or deferred.
This residualizing natural semantics is similar to [3,4] but more complex due to the
use of a heap for sharing instead of term rewriting. Regarding the first modification, we
extend the representation of values with a new symbol ⟪⋅⟫ which encloses expressions
whose evaluation should be deferred.
                          V alue ∶∶= . . . ∣ ⟪e⟫    (annotated expression)

This annotation directly corresponds to the PEVAL annotation in source programs. Sec-
ond, we extend the inference system with the operation proceed , deciding whether a
function call should be unfolded, and replace the rule Fun with:3
                      Γ ∶ σ(e) ⇓ ∆ ∶ v                         f (xn ) = e ∈ P, σ = {xn ↦ yn },
      (FunEval)                                        where
                      Γ ∶ f (yn ) ⇓ ∆ ∶ v                      proceed (Γ, f (yn )) = true

                                                               f (xn ) = e ∈ P, σ = {xn ↦ yn },
      (FunDefer)      Γ ∶ f (yn ) ⇓ Γ ∶ ⟪f (yn )⟫      where
                                                               proceed (Γ, f (yn )) = false

Approaches for the concrete definition of proceed will be discussed in Sect. 5.1. Fur-
thermore, we extend the rule Value to also return deferred expressions unchanged and
constrain the rule VarExp in that the value must not be a deferred expression. Finally,
we add two more rules for deferred expressions where the annotation is lifted upwards:
                              Γ [x ↦ ∎] ∶ e ⇓ ∆ ∶ ⟪e′ ⟫
        (VarDefer)                                               where e ∉ {free, ∎}
                          Γ [x ↦ e] ∶ x ⇓ ∆[x ↦ e′ ] ∶ ⟪x⟫
                                                    Γ ∶ e ⇓ ∆ ∶ ⟪e′ ⟫
        (CaseDefer)
                          Γ ∶ case e of { pk → ek } ⇓ ∆ ∶ ⟪case e′ of { pk → ek }⟫


4.4     Dealing with Partial Information

In contrast to the evaluation performed in a standard interpreter, the partial evalua-
tion process has to deal with partial knowledge in the form of unbound variables. For
instance, if the right-hand side of a function declaration like f x = PEVAL (g x)
should be evaluated, there is no binding information for the parameter variable x.
 3
     Actually, the operation proceed also takes into account the context of reductions already per-
     formed, but we omit them here for the sake of simplicity.
    A possible solution is to handle such unbound variables as logic variables, as done
in [5], so that they are bound to appropriate values by the partial evaluator. Since it
has been shown in [2] that the back-propagation of these bindings can lead to incorrect
residual programs, [3] uses a residualizing semantics which represents such bindings
by case expressions in the residual program. However, this is only necessary for un-
bound variables. Explicitly introduced logic variables are known to be free during the
actual evaluation so that they can be bound during partial evaluation time. For instance,
consider the expression
 let x free in case x of { True             → 1 }
Here we can bind x to True, since this binding is not visible outside the scope of this
expression, select the (single) branch as the value of the case expression, and continue
by evaluating its right-hand side. In consequence, our implementation evaluates this ex-
pression to 1, while the partial evaluator described in [3] cannot evaluate the expression
any further.
    Hence, we distinguish unbound variables from logic variables by not binding them
in the heap. Furthermore, we assume that rule Value is also applicable to variables not
bound in the heap so that unknown variables reduce to themselves. Thus, only the rules
for case expressions have to be changed, where it is now also possible that the scru-
tinized value is an unknown variable. Following the idea of [3], we generate residual
case expressions to defer the inspection of the variable to the run time of the specialized
program. Therefore, we extend the definition of values to

             V alue ∶∶= . . . ∣ case x of { pk → vk } (residual case expression)

where the variable x inspected in the case expression is not bound in the corresponding
heap. Because case expressions are now contained in the set of values, we also have
to consider them as the value of a variable or an expression examined by another case
expression. Hence, we add the following rules:
                                          Γ ∶e ⇓ ∆∶x
(CaseUnbound)     Γ ∶ case e of { pk → ek } ⇓ ∆ ∶ case x of { pk → ⟪ek ⟫ }
                                      where x ∉ dom(∆)

                                   Γ ∶ e ⇓ ∆ ∶ case x of { p′j → ⟪e′j ⟫ }
(CaseCase)
                                                                        case e′j of
                  Γ ∶ case e of { pk → ek } ⇓ ∆ ∶ case x of { p′j → ⟪                ⟫}
                                                                         { pk → ek }

                     Γ [x ↦ ∎] ∶ e ⇓ ∆ ∶ case y of { pk → ⟪ek ⟫ }
(VarCase)                                                               where e ∉ {free, ∎}
                  Γ [x ↦ e] ∶ x ⇓ ∆[x ↦ case y of { pk → ek }] ∶ x
                                               Γ ∶e ⇓ ∆∶x
                                                                     case x of
(CaseVarCase)     Γ ∶ case e of { pk → ek } ⇓ ∆ ∶ case y of { p′j → ⟪              ⟫}
                                                                      { pk → e k }
                                   where ∆[x] = case y of { pj → ej }
                                                             ′    ′


The general idea is to lift case expressions inspecting an unbound variable upwards
and to defer the evaluation of the alternatives. Such deferred expressions are not further
evaluated in the residual semantics but later extracted by the global iterative process
as the initial expressions of a new specialization run. Because the alternatives are then
evaluated independently, it will be possible to take the binding information of the case
expression into account. For instance, if we consider the expression
    case x of { True      → not x }
a subsequent evaluation of the right-hand side not x may respect the binding of x to
True and, thus, directly evaluate to False.


4.5    Dereferencing the Heap
After evaluating an expression to a residual value, this value might contain variables
which are either free or bound to expressions in the corresponding heap. To be able
to replace parts of the input program with residual values, these bindings have to be
added to the values to form valid expressions, a process we call dereferencing the heap.
Conceptually, for a given configuration Γ ∶ e, we retrieve the set of variables transitively
reachable from e and bound in Γ and add the corresponding bindings to the expression.
For residual case expressions, we also respect the bindings represented by the case
expression. The bindings are divided into logic variables (fv ) and variables bound to
expressions (bv ) and added to the original expression:
             ⎧
             ⎪
             ⎪case x of { pk → drf (Γ [x ↦ pk ], ek ) }        if e = case x of { pk → ek }
drf (Γ, e) = ⎨
             ⎪
             ⎩⟪let fv (Γ, e) free in let bv (Γ, e) in e⟫
             ⎪                                                 otherwise

For instance, if we consider the configuration
                [y ↦ free] ∶ case x of {True -> x; False -> y}

then dereferencing will produce the expression
    case x of { True → ⟪let { x = True } in x⟫;False → ⟪let y free in y⟫ }


5     Control

Our partial evaluation algorithm follows the general procedure of Alpuente et. al. [5],
which is parametric w.r.t. an unfolding rule used to construct a finite derivation for an
expression and an abstraction operator used to guarantee that only finitely many ex-
pressions are evaluated. The basic algorithm is depicted in Fig. 4 and works as follows.
Given an input program P and a set of annotated expressions E, the algorithm starts by
applying an unfolding rule which evaluates each expression according to the residual
semantics presented in the previous section and extracts the results by drf . If there is
more than one derivation in the residual semantics due to the non-deterministic infer-
ence rules Or and Guess, the different extracted results of the derivation are combined
by the choice operator “?”. If there is no derivation at all, the result is represented by the
predefined operation failed. In the next step, an abstraction operator is applied to this
set, adding the new expressions to the set of already evaluated expressions. This phase
yields a new set which may need further evaluation, hence, this process is iteratively re-
peated until no more expressions are added to the set. This iteration is necessary for the
correctness of partial deduction [20] in order to achieve a “closed” set of expressions
                 Input: A program P and a set of expressions E
                 Output: A set of expressions S
                 i ∶= 0; E0 ∶= E;
                 repeat
                      E ′ ∶= unfold (Ei , P );
                      Ei+1 ∶= abstract(Ei , E ′ );
                      i ∶= i + 1;
                 until Ei = Ei+1 (modulo renaming);
                 return S ∶= Ei
                          Fig. 4. Basic algorithm for partial evaluation

that covers all expressions possibly occurring in the residual program. To generate the
resulting program, the same unfolding rule has to be applied to the resulting set of ex-
pressions to generate the corresponding resultants, i.e., the rules of the residual program
(in our implementation, this step is integrated into the algorithm). Finally, the set of gen-
erated resultants are compressed to eliminate intermediate and redundant functions (see
[5] for details).
    This procedure distinguishes two levels of control, namely the local level, managed
by the unfolding rule to avoid infinite evaluations, and the global level, managed by the
abstraction operator to avoid infinitely repetitions of the partial evaluation algorithm. To
ensure termination of the whole process, both local and global termination is required.


5.1   Local Control

Termination of the unfolding rule directly corresponds to termination of the residual se-
mantics presented in Sect. 4. For this purpose, the semantics has already been extended
by an oracle proceed (Γ, e) responsible for the decision whether a function call should
be unfolded or not. There exist several well-known techniques in the literature to come
to this decision, e. g., depth-bounds, loop-checks [9], well-founded orderings [12], or
well-quasi orderings [22]. Our implementation currently supports the following simple
strategies:

None No unfolding is performed for user-defined functions.
One Only one function call is unfolded for each evaluation.
Each At most one call is unfolded for each user-defined function, subsequent calls are
    deferred.
All All function calls are unfolded, which corresponds to the original inference system.
    This does not guarantee termination but may be useful if the user is sure that the
    process terminates.

Note that, regardless of the chosen strategy, built-in functions (such as arithmetics) are
evaluated in any case, since they are known to terminate.
    Expressions that have been deferred during evaluation will be extracted and eventu-
ally added to the set of expressions to be evaluated, depending on the operation abstract
(see Sect. 5.2 for details). Generally, a strategy that allows more evaluation steps in
one derivation than another strategy might seem superior. If an evaluation is split into
multiple derivations with deferred subexpressions, each of these subexpressions has to
be evaluated anew and leads to a new residual function to be generated. In contrast,
longer derivations will produce less deferred subexpressions and, hence, less residual
functions. Nevertheless, although a simpler strategy may produce more intermediate
expressions, there are better chances that some of these expressions have already been
encountered before, reducing the overall number of expressions to be evaluated. Fur-
thermore, the final compression phase will eliminate intermediate functions so that even
the simple strategies perform very well in practice.

5.2   Global Control
The local control is parametric w.r.t. the decision whether to stop or to proceed with the
evaluation, since it is safe to terminate the evaluation at any point. This flexibility does
not apply to the global control because we cannot stop the iterative extension of the set
of expressions until all function calls in this set are “closed” w.r.t. the set of expressions.
An expression e is closed w.r.t. a set of expressions if it is an instance of an expression
in the set and all expressions in the matching substitution are recursively closed (see [5]
for details). This condition is necessary to ensure the correctness of the partial evaluator
so that the specialized program computes the same solutions as the original program.
In order to avoid the construction of infinite sets of expressions, expressions in this set
are generalized to ensure termination of this process.
    Hence, the operation abstract returns a safe approximation of Ei ∪ E ′ so that each
expression in the set of Ei ∪ E ′ is closed w.r.t. the result of abstract(Ei , E ′ ). More
precisely, an expression e′ ∈ E ′ is added to the set Ei according to the following rules
(note that the result of unfolding is either a variable, a deferred expression, a constructor
appplication, a case expression, or a choice of these results):
 1. If e′ is a variable, it is discarded.
 2. If e′ has the form ⟪e⟫, one of the following options is considered:
    (a) add e to the set Ei ,
    (b) discard the expression e, or
    (c) compute the most specific generalization of e and some expression e′ ∈ E ′ , say
         ê, and try to add both ê and the expressions in the corresponding substitutions
         σ and θ, where e = σ(ê) and e′ = θ(ê).
 3. For all other cases (constructor calls, case expressions, choices), the corresponding
    subexpressions are considered.
Like for the unfolding rule, the abstraction can be parameterized by a criterion to decide
the option taken in (2). Our implementation currently supports abstractions using a well-
founded ordering or an embedding ordering to distinguish between (2a) and (2c), i.e.,
smaller expressions are added but larger expressions are generalized.
    To achieve a good level of specialization, it is crucial to recognize different variants
of one expression as equivalent in order to discard them in (2b). This is more com-
plex in our framework compared to [3], since we take let expressions into account.
For example, consider the equivalent expressions “map(square,xs)” and “let {f
= square} in map(f,xs)”. If we do not recognize them as variants, they might be
generalized to map(f,xs) which could not further be specialized. Therefore, we nor-
malize expressions by applying α-conversion and flattening [19] before computing their
abstractions.
 Benchmark                                 Time for PE    Original   Specialized    Speedup
 allOnes                                           280        180           140        1.29
 doubleApp                                         330        190           160        1.19
 doubleFlip                                        330        230           210        1.10
 lengthApp                                         320        120            90        1.33
 kmp                                              6100       1000            50       20.00
 foldr (+) 0 xs (sum)                              300        600           400        1.50
 foldr (+) 0 (map square xs)                       340       1280           800        1.60
 foldr (++) [] xs (concat)                         400        440           220        2.00
 map (twice square) xs                             420       1600          1410        1.13
 foldr (?) failed ys            (choose)           330         50             40       1.25
 head (perm ys)                                    410       2960             40      74.00

            Table 1. Benchmarks of selected partial evaluation examples (in msec)


6   Experimental Results
In this section we evaluate the implementation of our partial evaluator by some bench-
marks. We compile both the partial evaluator and the benchmarks with the PAKCS
Curry compiler (version 1.11.3, based on SICStus Prolog 4.2.3). All benchmarks were
executed on a Linux machine (Debian Wheezy) with an Intel Core i5-750 (2.66GHz)
processor and 4GiB of memory. The timings were performed using the profiling oper-
ation profileTimeNF of PAKCS and denote the time required for computing the nor-
mal form of the respective result in milliseconds (the arguments passed to the various
functions were evaluated before to bring out the speedup obtained by partial evaluation).
The benchmark examples have been specialized with one unfolding per evaluation and
without any abstraction, since all examples terminated. Experiments with both a well-
founded ordering or a well-quasi ordering resulted in the same or worse performance.
Table 1 presents the time required for the partial evaluation process itself, for executing
the original and the specialized program, and the gained speedup.
     In the first group of benchmarks, we consider some typical examples of partial de-
duction and functional program transformations. These are simple functions working
on lists or trees as (intermediate) data structures: allOnes computes the length of its
input list, represented as Peano numbers, and constructs a new list of the same length
with 1 as all elements, doubleApp is the concatenation of three lists, doubleFlip flips
a tree structure twice, returning the same tree, lengthApp computes the length of the
concatenation of two lists, and kmp implements a generic string pattern matcher. The
first four functions were specialized without static input data, while the kmp example
was specialized w.r.t. a fixed pattern of length 4, explaining both the time needed for
partial evaluation and the gained speedup.
     In the second group, we benchmark some examples with higher-order functions: the
computation of the sum of list elements using foldr, the sum of squared numbers, the
concatenation of a list of lists, and repeatedly applying a function to a list. All functions
are applied to an input list xs containing 200,000 elements. The speedup is generally
achieved because of the removal of intermediate data structures. For instance, the Curry
expression “foldr (+) 0 (map square xs)” is specialized to the following resid-
ual FlatCurry definition:
    sumSquare(xs) = case xs of { []          → 0
                               ; y:ys        → (y*y) + sumSquare(ys) }
Finally, we evaluate two (complicated) variants of the function choose, which non-
deterministically chooses one element of a given list ys containing 10,000 elements:
  choose (x:xs) = x ? choose xs
Our partial evaluator computes this simple implementation of choose for the first ex-
ample. The result for the second example only differs from choose in the order in
which the two non-deterministic alternatives are taken, which stems from the imple-
mentation of perm. The huge speedup is achieved because of the omission of the non-
deterministic intermediate list structure.
    To summarize, our partial evaluator shows promising results and is capable of per-
forming optimizations such as deforestation [24] and transformation of higher-order
functions to first-order ones. In addition, non-deterministic operations are correctly
specialized in contrast to [3], and the results for deterministic operations are almost
identical.


7     Conclusions and Future Work
We have presented a new partial evaluation scheme for the functional logic language
Curry based on its intermediate representation FlatCurry. The partial evaluator is based
on an adaptation of the natural semantics of FlatCurry, extending the semantics to deal
with the requirements of partial evaluation such as ensuring termination. In contrast
to the original partial evaluator [3], which is based on term rewriting without shar-
ing, the new implementation correctly handles both recursive let expressions and
non-deterministic operations and, thus, supports full (Flat)Curry. As our benchmarks
demonstrate, the implementation is capable of powerful optimizations both to deter-
ministic and non-deterministic programs.
    For future work, we intend to formally prove the correctness of the partial evalu-
ation scheme, which should be manageable due to the similarity of the original and
residual semantics. Another aspect for further investigations is the improvement of the
abstraction operator. While the abstraction is necessary to ensure termination, a too
general abstraction reduces the quality of the specialization. Thus, more sophisticated
abstraction operators might be beneficial.


References
 1. E. Albert, M. Hanus, F. Huch, J. Oliver, and G. Vidal. Operational semantics for declarative
    multi-paradigm languages. Journal of Symbolic Computation, 40(1):795–829, 2005.
 2. E. Albert, M. Hanus, and G. Vidal. Using an abstract representation to specialize functional
    logic programs. In Proc. of the 7th International Conference on Logic for Programming and
    Automated Reasoning (LPAR 2000), pages 381–398. Springer LNCS 1955, 2000.
 3. E. Albert, M. Hanus, and G. Vidal. A practical partial evaluator for a multi-paradigm declar-
    ative language. Journal of Functional and Logic Programming, 2002(1), 2002.
 4. E. Albert, M. Hanus, and G. Vidal. A residualizing semantics for the partial evaluation of
    functional logic programs. Information Processing Letters, 85(1):19–25, 2003.
 5. M. Alpuente, M. Falaschi, and G. Vidal. Partial evaluation of functional logic programs.
    ACM Transactions on Programming Languages and Systems, 20(4):768–844, 1998.
 6. S. Antoy. Optimal non-deterministic functional logic computations. In Proc. International
    Conference on Algebraic and Logic Programming (ALP’97), pages 16–30. Springer LNCS
    1298, 1997.
 7. S. Antoy and M. Hanus. Functional logic programming. Communications of the ACM,
    53(4):74–85, 2010.
 8. S. Antoy and M. Hanus. New functional logic design patterns. In Proc. of the 20th Interna-
    tional Workshop on Functional and (Constraint) Logic Programming (WFLP 2011), pages
    19–34. Springer LNCS 6816, 2011.
 9. R.N. Bol. Loop checking in partial deduction. Journal of Logic Programming, 16(1&2):25–
    46, 1993.
10. B. Braßel. Implementing Functional Logic Programs by Translation into Purely Functional
    Programs. PhD thesis, Christian-Albrechts-Universität zu Kiel, 2011.
11. B. Braßel, M. Hanus, B. Peemöller, and F. Reck. KiCS2: A new compiler from Curry to
    Haskell. In Proc. of the 20th International Workshop on Functional and (Constraint) Logic
    Programming (WFLP 2011), pages 1–18. Springer LNCS 6816, 2011.
12. M. Bruynooghe, D. De Schreye, and B. Martens. A general criterion for avoiding infinite
    unfolding. New Generation Computing, 11(1):47–79, 1992.
13. S. Fischer, J. Silva, S. Tamarit, and G. Vidal. Preserving sharing in the partial evaluation of
    lazy functional programs. In A. King, editor, Logic-based Program Synthesis and Transfor-
    mation (revised and selected papers from LOPSTR’07), pages 74–89. Springer LNCS 4915,
    2008.
14. J.C. González-Moreno, M.T. Hortalá-González, F.J. López-Fraguas, and M. Rodríguez-
    Artalejo. An approach to declarative programming based on a rewriting logic. Journal
    of Logic Programming, 40:47–87, 1999.
15. M. Hanus. Functional logic programming: From theory to Curry. In Programming Logics -
    Essays in Memory of Harald Ganzinger, pages 123–168. Springer LNCS 7797, 2013.
16. M. Hanus, S. Antoy, B. Braßel, M. Engelke, K. Höppner, J. Koj, P. Niederau, R. Sadre,
    and F. Steiner. PAKCS: The Portland Aachen Kiel Curry System. Available at http:
    //www.informatik.uni-kiel.de/~pakcs/, 2013.
17. M. Hanus (ed.). Curry: An integrated functional logic language (vers. 0.8.3). Available at
    http://www.curry-language.org, 2012.
18. H. Hussmann. Nondeterministic algebraic specifications and nonconfluent term rewriting.
    Journal of Logic Programming, 12:237–255, 1992.
19. J. Launchbury. A natural semantics for lazy evaluation. In Proc. 20th ACM Symposium on
    Principles of Programming Languages (POPL’93), pages 144–154. ACM Press, 1993.
20. J.W. Lloyd and J.C. Shepherdson. Partial evaluation in logic programming. Journal of Logic
    Programming, 11:217–242, 1991.
21. S. Peyton Jones, editor. Haskell 98 Language and Libraries—The Revised Report. Cam-
    bridge University Press, 2003.
22. M.H. Sørensen and R. Glück. An algorithm of generalization in positive supercompilation.
    In Proc. of the 1995 International Logic Programming Symposium, pages 465–479. MIT
    Press, 1995.
23. V.F. Turchin. The concept of a supercompiler. ACM Transactions on Programming Lan-
    guages and Systems, 8(3), 1986.
24. P. Wadler. Deforestation: Transforming programs to eliminate trees. Theoretical Computer
    Science, 73:231–248, 1990.
A    Black Hole Detection
As mentioned in Sect. 4.2, rule VarExp of the natural semantics shown in Fig. 3 re-
places the variable binding x ↦ e by x ↦ ∎ in the heap when evaluating the associated
expression e. This allows the detection of black holes (a self-dependent infinite loop)
[19], as done in some implementations of functional (logic) languages. For instance,
an attempt to evaluate the expression “let {x = x} in x” would result in a finite
but incomplete derivation tree, whereas it would trigger the construction of an infinite
derivation tree if the binding x ↦ e was kept.
    The detection of black holes included in the semantics seems to be an optimization
that could be omitted for deterministic programs [19]. However, it is crucial in combi-
nation with non-determinism in order to prevent the binding of a variable to different
values in the same derivation, as shown in [10]. For example, consider the expression
“let { x = T ? case x of { T → F }} in x”. If we do not replace the vari-
able binding in rule VarExp, the following derivation would be possible:
                        Γ ∶T ⇓ Γ ∶T
          Γ ∶ T ? case x of { T → F } ⇓ Γ ∶ T
                    Γ ∶ x ⇓ [x ↦ T] ∶ T              [x ↦ T] ∶ F ⇓ [x ↦ T] ∶ F
                              Γ ∶ case x of { T → F } ⇓ [x ↦ T] ∶ F
                            Γ ∶ T ? case x of { T → F } ⇓ [x ↦ T] ∶ F
                                        Γ ∶ x ⇓ [x ↦ F] ∶ F
                   [] ∶ let { x = T ? case x of { T → F } } in x ⇓ [x ↦ F] ∶ F
                        where Γ = [x ↦ T ? case x of { T → F }]

In this derivation, the variable x is looked up in the heap twice, where at first the right
(non-deterministic) branch is chosen and afterwards the left branch. Hence, x is bound
to T as well as F, which violates the single assignment property of call-time choice.
    With our semantics, there is one successful and one failing derivation but no deriva-
tion where x is bound to T as well as F:
                                 [x ↦ ∎] ∶ T ⇓ [x ↦ ∎] ∶ T
                     [x ↦ ∎] ∶ T ? case x of { T → F } ⇓ [x ↦ ∎] ∶ T
                     [x ↦ T ? case x of { T → F }] ∶ x ⇓ [x ↦ T] ∶ T
              [] ∶ let { x = T ? case x of { T → F } } in x ⇓ [x ↦ T] ∶ T
                                    [x ↦ ∎] ∶ x ⇓ failure
                             [x ↦ ∎] ∶ case x of { T → F } ⇓
                           [x ↦ ∎] ∶ T ? case x of { T → F } ⇓
                           [x ↦ T ? case x of { T → F }] ∶ x ⇓
                    [] ∶ let { x = T ? case x of { T → F } } in x ⇓


B    Residualizing Semantics
Since the various rules of the residualizing semantics used in our partial evaluator are
distributed over the paper and some of them were only informally sketched, we sum-
marize in the following the complete set of rules of our residualizing semantics.
                                              v = c(xn ) or v = ⟪e′ ⟫ or v ∈ V
(Value)         Γ ∶v ⇓ Γ ∶v          where
                                               with (v ∉ dom(Γ ) or Γ [v] = free)
                    Γ [x ↦ ∎] ∶ e ⇓ ∆ ∶ v                         e ∉ {free, ∎}
(VarExp)                                                 where
                Γ [x ↦ e] ∶ x ⇓ ∆[x ↦ v] ∶ v                      and (v ∈ V or v = c(xn ))
                     Γ [x ↦ ∎] ∶ e ⇓ ∆ ∶ ⟪e′ ⟫
(VarDefer)                                                    where e ∉ {free, ∎}
                Γ [x ↦ e] ∶ x ⇓ ∆[x ↦ e′ ] ∶ ⟪x⟫
                   Γ [x ↦ ∎] ∶ e ⇓ ∆ ∶ case y of { pk → ⟪ek ⟫ }
(VarCase)                                                                               where e ∉ {free, ∎}
                Γ [x ↦ e] ∶ x ⇓ ∆[x ↦ case y of { pk → ek }] ∶ x
                Γ [y ↦ ei ] ∶ φ(x1 , . . . , xi−1 , y, ei+1 , . . . , ek ) ⇓ ∆ ∶ v
(Flatten)                                                                              where ei ∉ V, y fresh
                    Γ ∶ φ(x1 , . . . , xi−1 , ei , ei+1 , . . . , ek ) ⇓ ∆ ∶ v
                Γ ∶ σ(e) ⇓ ∆ ∶ v                    f (xn ) = e ∈ P, σ = {xn ↦ yn },
(FunEval)                                  where
                Γ ∶ f (yn ) ⇓ ∆ ∶ v                 proceed (Γ, f (yn )) = true
                                                              f (xn ) = e ∈ P, σ = {xn ↦ yn },
(FunDefer)      Γ ∶ f (yn ) ⇓ Γ ∶ ⟪f (yn )⟫          where
                                                              proceed (Γ, f (yn )) = false

                 Γ [yk ↦ σ(ek )] ∶ σ(e) ⇓ ∆ ∶ v
(Let)                                                          where σ = {xk ↦ yk }, yk fresh
                Γ ∶ let { xk = ek } in e ⇓ ∆ ∶ v
                  Γ ∶ ei ⇓ ∆ ∶ v
(Or)                                         where i ∈ {1, 2}
                Γ ∶ e1 ? e2 ⇓ ∆ ∶ v
                 Γ [yn ↦ free] ∶ σ(e) ⇓ ∆ ∶ v
(Free)                                                      where σ = {xn ↦ yn }, yn fresh
                Γ ∶ let xn free in e ⇓ ∆ ∶ v
                Γ ∶ e ⇓ ∆ ∶ c(yn )    ∆ ∶ σ(ei ) ⇓ Θ ∶ v                             pi = c(xn ),
(Select)                                                                 where
                   Γ ∶ case e of { pk → ek } ⇓ Θ ∶ v                                 σ = {xn ↦ yn }
                Γ ∶ e ⇓ ∆[x ↦ free] ∶ x            ∆[x ↦ σ(pi ), yn ↦ free] ∶ σ(ei ) ⇓ Θ ∶ v
(Guess)                           Γ ∶ case e of { pk → ek } ⇓ Θ ∶ v
                       where i ∈ {1, . . . , k}, pi = c(xn ), σ = {xn ↦ yn }, yn fresh
                                             Γ ∶ e ⇓ ∆ ∶ ⟪e′ ⟫
(CaseDefer)
                Γ ∶ case e of { pk → ek } ⇓ ∆ ∶ ⟪case e′ of { pk → ek }⟫
                                               Γ ∶e ⇓ ∆∶x
(CaseUnbound)   Γ ∶ case e of { pk → ek } ⇓ ∆ ∶ case x of { pk → ⟪ek ⟫ }
                                    where x ∉ dom(∆)

                                 Γ ∶ e ⇓ ∆ ∶ case x of { p′j → ⟪e′j ⟫ }
(CaseCase)           case e of          case x of
                Γ∶                 ⇓ ∆∶
                      { pk → e k }       { p′j → ⟪case e′j of { pk → ek }⟫ }
                                                 Γ ∶e ⇓ ∆∶x
                     case e of          case y of
(CaseVarCase)   Γ∶                 ⇓ ∆∶
                      { pk → e k }       { p′j → ⟪case x of { pk → ek }⟫ }
                             where ∆[x] = case y of { p′j → e′j }

</pre>