A taxonomy of program analyses

           Gianluca Amato, Maria Chiara Meo, and Francesca Scozzari

            Dipartimento di Economia, Università di Chieti-Pescara, Italy
        {gianluca.amato, mariachiara.meo, francesca.scozzari}@unich.it


        Abstract. The design of static analyses of programs in the abstract
        interpretation theory starts with the choice of a collecting semantics,
        which is the strongest property we can derive for a program. Starting
        from well-known collecting semantics for functional programs in the lit-
        erature, we propose a taxonomy of program properties by considering
        the sets of abstract interpretations for which the collecting semantics is
        initial and show that they can be constructively characterized in terms
        of the abstraction functions.


1     Introduction
Abstract interpretation [7, 8] is a framework for approximating the behavior of
discrete systems. The basic idea is to replace the formal semantics of a system
with an abstract semantics computed over a domain of abstract objects, which
describe the properties of the system we are interested in. Given a discrete system
e, we assume that its semantics JeK is an element of a set C called the concrete
domain. For instance, consider a simple setting for functional programs1 where
the concrete domain C is the poset of functions D⊥ → D⊥ ordered pointwise and
D⊥ has a distinguished value ⊥ which represents non-terminating computations.
Given the following program over natural numbers:
      l e t rec prog n = i f ( n = 2 ) then prog ( n ) e l s e n
its semantics is                       (
                                           ⊥ if n = 2 or n = ⊥;
                        JprogK = λn.
                                           n otherwise,
where D⊥ = N ∪ {⊥}.
    In most cases we are only interested in determining some properties of the
semantics of a system specified by a set A of abstract objects. Formally, an
abstract interpretation 2 for a concrete domain C is a pair (A, α) where A is par-
tially ordered by ≤A (the approximation ordering) and α : C → A maps every
semantic property to the strongest (smallest) abstract property it enjoys. An
1
    For easiness of presentation, we do not consider here higher-order functions.
2
    The abstract interpretation framework here used is the one presented in [9] under
    the “existence of a best abstract approximation assumption”. Not all the abstract
    interpretations may be formalized in this way, such as polyhedral analysis [11, 3, 2,
    4, 5].
example of a simple and useful abstraction is strictness [13]. We say that a func-
tion f : D⊥ → D⊥ is strict if f (⊥) = ⊥. The strictness abstract interpretation
is Str = ({str , >}, αstr ) where str ≤ > and
                                        (
                                          str if f (⊥) = ⊥,
                            αstr (f ) =
                                          > otherwise.

Clearly, αstr (JprogK) = str since prog describes a strict function.

1.1   Collecting Semantics
In most cases abstract interpretations are defined starting from a so-called col-
lecting semantics. According to [9], a collecting semantics is “a version of the
standard semantics reduced to essentials in order to ignore irrelevant details
about program execution”. Thus, a collecting semantics should be an abstrac-
tion precise enough that many other abstractions may be derived from it. We
say that the abstraction (B, αB ) may be derived from (A, αA ), or that (A, αA )
is more precise than (B, αB ), when for each b ∈ B there exists a ∈ A such that,
for any c ∈ C, αB (c) ≤ b ⇐⇒ αA (c) ≤ a.
    For example, consider the collecting semantics CS1 for the concrete domain
                                                                  ∪
C = D⊥ → D⊥ defined in [10]: the abstract domain is P(D⊥ ) −      → P(D⊥ ), the
set of complete join-morphisms from P(D⊥ ) to itself ordered pointwise, and
αCS1 (f ) = λX ∈ P(D⊥ ).f (X), where f (X) is the image of f through X. It
turns out that strictness may be derived from CS1 . Actually, we have that:
 – αstr (f ) ≤ > is always true, hence it is equivalent to αCS1 (f ) ≤ λX.D⊥ ;
 – αstr (f ) ≤ str means that f is a strict function, which is equivalent to
   αCS1 ({⊥}) ⊆ {⊥}, i.e., αCS1 (f ) ≤ φstr by defining
                                      (
                                         X     if X ⊆ {⊥},
                           φstr = λX.
                                         D⊥ otherwise.
   However, not all the abstractions may be recovered from CS1 . Consider the
property of absence. We say that a function is absent if it ignores its arguments.
This gives origin to the absence abstract interpretation [16] defined as Abs =
({abs, >}, αabs ) where abs ≤ > and
                               (
                                abs if ∀x ∈ D⊥ . f (x) = f (⊥),
                   αabs (f ) =
                                >   otherwise.
It turns out that absence cannot be recovered from CS1 , since there is no element
in the abstract domain of CS1 which exactly corresponds to the set of all the
absent functions. However, more precise collecting semantics may be used to
derive abs, such as CS2 [10] which has P(D⊥ → D⊥ ) as the abstract domain
and αCS2 (f ) = {f } as the abstraction function. Then, αabs (c) ≤ abs is equivalent
to αCS2 (f ) ⊆ {f | αabs (f ) ≤ abs}. We can also compare the relative precision of
two collecting semantics. It is easily shown that CS1 may be derived from CS2 ,
since αCS1 (f ) ≤ φ iff αCS2 (f ) ⊆ {f 0 | αCS1 (f 0 ) ≤ φ}.


                                         2
2     A category of abstract interpretations

We now formalize the notion of precision which arises from the previous consid-
erations and characterize the collecting semantics using the category theory3 . We
recall that a Galois connection (Gc) is a pair of maps hα, γi : A     B such that
α(a) ≤ b ⇔ a ≤ γ(b). Given a set C, we define the category AI(C) whose objects
are abstract interpretations and whose morphisms from (A, αA ) to (B, αB ) are
Galois connections hα, γi : A     B such that αB = α ◦ αA . Identity and compo-
sition of arrows work as in the category of Galois connections. This definition of
the category of abstract interpretations naturally arises from the practical uses
of abstract interpretation and represents a strengthening of the notion of deriv-
ability between abstractions, since the existence of a morphism hα, γi : A       B
implies that B is derivable from A. The following proposition shows that there
is a Gc from CS2 to CS1 but not from CS1 to CS2 , which reflects our intuition
that CS2 is strictly more precise than CS1 , and that CS1 is a suitable collecting
semantics for strictness but not for the absence property.

Proposition 1. The following properties hold:

 – there is no Gc hα12 , γ12 i : CS1     CS2 such that αCS2 = α12 ◦ αCS1 ;
 – there is a Gc hα21 , γ21 i : CS2     CS1 such that αCS1 = α21 ◦ αCS2 ;
 – there is a Gc hα1str , γ1str i : CS1   Str such that αstr = α1str ◦ αCS1 and
                 (                                             (
                   str if φ({⊥}) ⊆ {⊥}                           X  if X ⊆ {⊥}
     α1str (φ) =                             γ1str (str ) =λX.
                   >     otherwise.                              D⊥ otherwise
                                                    γ1str (>) =λX.D⊥

 – there is no Gc hα1abs , γ1abs i : CS1          CSabs such that αabs = α1abs ◦ αCS1 .


3     Subcategories of program analyses

When an abstract interpretation (A, αA ) is designed starting from the collecting
semantics (S, αS ), it means that (A, αA ) is derivable from (S, αS ), hence there
is a map from (S, αS ) to (A, αA ) in our category AI(C). In addition, it would
be preferable to have a canonical way of deriving one from the other. This leads
to the notion of initial object in a category: if (S, αS ) is initial for a given full
subcategory D of AI(C), it means that all the abstract interpretations in D may
be reduced to (S, αS ) in a unique canonical way.
    Given a collecting semantics (S, αS ), we are interested in characterizing the
maximal full subcategory D of AI(C) such that (S, αS ) is initial for D. Such sub-
categories immediately induce a taxonomy on program analysis which precisely
characterizes the program properties suitable for a given collecting semantics.
3
    Since the very beginning of the abstract interpretation theory, there have been work
    on categorical approaches to abstract interpretation (see, for instance, [1, 6, 14, 15]).


                                              3
In the rest of the section, we show that for the two collecting semantics CS1 e
CS2 , such a full subcategory can be constructively described.
    We first show that CS2 is initial for all the abstract interpretations which
have enough joins, and that this is the largest class of abstract interpretations
which enjoys this property.
Definition 2 (Having enough   Wjoins). We say that the abstract interpretation
(A, αA ) has enough joins when A X exists for each X which is a subset of the
image of αA .
Obviously, if A is a complete join-semilattice, than (A, αA ) has enough joins.
Theorem 3. The full subcategory of all the abstract interpretations which have
enough joins is the largest class of abstractions for which the collecting semantics
CS2 is initial.
The maximality of the class of abstract interpretations which have enough joins
allows us to completely characterize the analyses which reduce to the collecting
semantics CS2 . We now show that the collecting semantics CS1 is initial for a
large class of abstract interpretations, which can be constructively characterized.
Definition 4 (Mix of functions). We say that a function g : D⊥ → D⊥ is a
mix of the set of functions F ⊆ D⊥ → D⊥ iff for each x ∈ D⊥ there exists f ∈ F
such that g(x) = f (x).
Definition 5 (Mixable interpretations). Let (A, αA ) be an abstract inter-
pretation such that A has enough joins. We call (A, αA ) mixable if, for any set
         W F ⊆ D⊥ → D⊥ , whenever g is a mix of functions in F , we have
of functions
αA (g) ≤ f ∈F αA (f ).
   An interpretation is mixable when deciding whether α(f ) ≤ a may be done
by looking at the values of f for a single element of the domain at a time. This
observation is formalized by the following lemma.
Lemma 6. Let (A, αA ) be a mixable interpretation. Then αA (f ) ≤ a ⇐⇒ ∀x ∈
D⊥ ∃f 0 s.t. f 0 (x) = f (x) ∧ αA (f 0 ) ≤ a.
This lemma can be exploited to characterize mixable abstract interpretations:
 – the strictness abstract interpretation is mixable, since we only need to check
   the single value of f (⊥) in order to decide whether a function is strict;
 – the absence abstract interpretation is not mixable, since we need to compare
   the values computed by f for different arguments;
 – the totality abstract interpretation, which decides whether a function is total,
   i.e., for all x ∈ D⊥ \ {⊥} we have that f (x) 6= ⊥ is mixable, since we just
   need to see the value of f for (many) single arguments, without the need of
   comparing them.
We now show that all the mixable interpretations may be designed starting from
the collecting semantics CS1 .
Theorem 7. The full subcategory of all the mixable abstract interpretations is
the largest class of abstractions for which the collecting semantics CS1 is initial.


                                         4
4    Conclusion
Static analyses formalized in the abstract interpretation framework are defined
starting from a collecting semantics, which is a fundamental choice in the design
of any abstract interpretation, but few works in the literature systematically
study collecting semantics. [9, 10] present many collecting semantics, without
giving a general method for choosing the right one. Mostly authors simply use
one of the already defined collecting semantics or invent a new one for a specific
abstraction (e.g., [12] for logic programs). On the contrary, we start from the def-
inition of two most commonly used collecting semantics and derive a taxonomy
on program analysis. Most importantly we show that the sets of abstract inter-
pretations that can be derived from them can be constructively characterized.

References
 1. S. Abramsky. Abstract interpretation, logical relations, and Kan extensions. Jour-
    nal of Logic and Computation, 1(1):5–40, 1990.
 2. G. Amato, S. Di Nardo Di Maio, M. C. Meo, F. Scozzari. Descending chains and
    narrowing on template abstract domains. Acta Informatica, 55(6):521–545, 2018.
 3. G. Amato, S. Di Nardo Di Maio and F. Scozzari Numerical static analysis with
    Soot. In Proc. ACM SIGPLAN SOAP. ACM Press, 2013.
 4. Gianluca Amato and Francesca Scozzari. Observational completeness on abstract
    interpretation. Fundamenta Informaticae, 106(2–4):149–173, 2011.
 5. Gianluca Amato and Francesca Scozzari. Random: R-based Analyzer for Numerical
    Domains. In Proc. LPAR-18, LNCS, vol. 7180:375–382. Springer, 2012.
 6. K. Backhouse and R. Backhouse. Safety of abstract interpretations for free, via logi-
    cal relations and galois connections. Science of Computer Programming, 51(1):153–
    196, 2004.
 7. P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for
    static analysis of programs by construction or approximation of fixpoints. In Proc.
    POPL ’77, pages 238–252. ACM Press, 1977.
 8. P. Cousot and R. Cousot. Systematic design of program analysis frameworks. In
    Proc. POPL ’79, pages 269–282. ACM Press, 1979.
 9. P. Cousot and R. Cousot. Abstract interpretation frameworks. Journal of Logic
    and Computation, 2(4):511–549, 1992.
10. P. Cousot and R. Cousot. Higher-order abstract interpretation (and application
    to comportment analysis generalizing strictness, termination, projection and PER
    analysis of functional languages). In Proc. ICCL, pages 95–112. IEEE Press, 1994.
11. P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among
    variables of a program. In Proc. POPL ’78, pages 84–97, 1978. ACM Press.
12. R. Giacobazzi. ”Optimal” collecting semantics for analysis in a hierarchy of logic
    program semantics. In Proc. STACS, LNCS, vol. 1046:503–514. Springer, 1996.
13. A. Mycroft. The theory and practice of transforming call-by-need into call-by-value.
    In Proc. Int. Symposium on Programming, LNCS, vol. 83:269–281. Springer, 1980.
14. P. Panangaden and P. Mishra. A category theoretic formalism for abstract inter-
    pretation. Technical Report UUCS-84-005, University of Utah, 1984.
15. A. Venet. Abstract cofibered domains: Application to the alias analysis of untyped
    programs. In Proc. SAS ’96, LNCS, vol. 1145:366–382. Springer, 1996.
16. P. Wadler and R. J. M. Hughes. Projections for strictness analysis. In Proc. FPCA,
    LNCS, vol. 274:385–407. Springer, 1987.


                                           5