On the Properties of Partial Completeness in Abstract Interpretation (Short Paper)⋆ Marco Campion, Mila Dalla Preda and Roberto Giacobazzi University of Verona, Italy Abstract We present a weakening of the completeness property in abstract interpretation. Completeness of a static analyzer represents the ideal situation where no false alarms are produced when answering queries on program behavior. Completeness, however, is a very rare condition to be satisfied in practice. We introduce the notion of partial completeness as a weakening of precision, namely, the abstract interpreter may produce a bounded number of false alarms, and then we show the key recursive properties of the class of programs for which an abstract interpreter is partially complete with a given bound of imprecision. Keywords Abstract Interpretation, Program Analysis, Computability, Partial Completeness 1. Introduction Abstract interpretation [2, 3, 4] is a general theory for the design of sound-by-construction program analysis tools. The abstract interpretation of a program 𝑃 ∈ Programs1 , denoted by J𝑃 K𝐴 , consists of an abstract domain 𝐴, often specified by a pair of abstraction 𝛼𝐴 : 𝐢 β†’ 𝐴 and concretization 𝛾𝐴 : 𝐴 β†’ 𝐢 monotone maps, abstracting some concrete properties of interest 𝐢 and an interpreter J𝑃 K𝐴 : 𝐴 β†’ 𝐴, designed for the language used to specify 𝑃 and on the abstract domain 𝐴. The structure of the abstract domain is given by a partial order ≀𝐴 that expresses the relative precision of its objects: If π‘Ž, 𝑏 ∈ 𝐴 and π‘Ž ≀𝐴 𝑏 then 𝑏 over approximates π‘Ž. We are interested in inferring program invariants, hence our concrete semantics J𝑃 K : β„˜(S) β†’ β„˜(S) will be the standard collecting denotational semantics (e.g., see [5, Chapter 5] or [6, Chapter 3.5]) operating on a domain of sets of memory of states β„˜(S). Soundness means that if program 𝑃 satisfies the condition J𝑃 K𝐴 𝛼𝐴 (𝑆) ≀𝐴 𝑄 for the input 𝑆 ∈ β„˜(S) and output specification 𝑄 ∈ 𝐴, then J𝑃 K𝑆 βŠ† 𝛾𝐴 (𝑄). When the converse holds we have precision, or completeness of the abstract interpreter, and therefore of the analysis. This Proceedings of the 23rd Italian Conference on Theoretical Computer Science, Rome, Italy, September 7-9, 2022 ⋆ This work is an extended abstract of our recent results published in [1]. " marco.campion@univr.it (M. Campion); mila.dallapreda@univr.it (M. Dalla Preda); roberto.giacobazzi@univr.it (R. Giacobazzi)  0000-0002-1099-3494 (M. Campion); 0000-0003-2761-4347 (M. Dalla Preda); 0000-0002-9582-3960 (R. Giacobazzi) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 1 For our purposes, Programs contains programs written in the simple while-language defined in [5, Chapter 2, page 12] 1 Marco Campion et al. CEUR Workshop Proceedings 1–7 represents the ideal situation where no false alarms are produced. Completeness, however, is a very rare condition to be satisfied in practice, therefore, static program analysis needs to, in some way, deal with incompleteness. Moreover, the experience tells us that there are results that are β€œmore incomplete”, i.e., less precise, than others, and this depends upon the way the program is written and the way the abstract interpreter is implemented [7]. As an example, consider the following program 𝑃 ::= while π‘₯ > 1 do π‘₯ := π‘₯ βˆ’ 2 if π‘₯ = 1 then π‘₯ := 10 if π‘₯ = 2 then π‘₯ := 100 and the set of values {2, 4}. Clearly J𝑃 K{2, 4} = {0}. However, when 𝑃 is analyzed by an abstract interpreter J𝑃 KInt (e.g., see [6, Chapter 4.5]) which considers abstract arithmetical operators as the best correct approximation (bca for short), defined on the abstract domain of intervals Int β‰œ {[π‘Ž, 𝑏] | π‘Ž, 𝑏 ∈ Z βˆͺ {βˆ’βˆž, +∞}, π‘Ž ≀ 𝑏} βˆͺ {βŠ₯Int } [2] or defined as the bca for Int, namely, J𝑃 KInt bca β‰œ 𝛼Int ∘ J𝑃 K ∘ 𝛾Int ∘ 𝛼Int , then it may exhibit different levels of imprecision. Recall that Int abstracts sets of integer values and it contains all intervals [π‘Ž, 𝑏] such that π‘Ž, 𝑏 ∈ Z±∞ where Z±∞ β‰œ Z βˆͺ {βˆ’βˆž, +∞} , and π‘Ž ≀ 𝑏. So, for the input of 𝑃 we get 𝛼Int ({2, 4}) = [2, 4], while for the output we obtain the single point 𝛼Int (J𝑃 K{2, 4}) = [0, 0]. Then, for the two mentioned analysis of 𝑃 we get J𝑃 KInt bca {2, 4} = [0, 10] and J𝑃 KInt 𝛼Int ({2, 4}) = [0, 100]. Note the three different interval properties obtained by the concrete, bca and abstract evaluations of 𝑃 over Int with input {2, 4}: [0, 0] ≀Int [0, 10] ≀Int [0, 100]. These discrepancies are what we want to measure. To this end, we consider a weaker form of metric function that is made specific for the elements of an abstract domain, namely, it is compatible with the underlying ordering relation ≀𝐴 , hence taking into account the presence of incomparable elements. This distance function incorporates both the qualitative comparison given by the partial order and a quantitative comparison (Section 2). By exploiting this idea, we can introduce the notion of πœ€-partial completeness of an abstract domain 𝐴 with respect to a given program and a given (set of) input values (Section 3). A partially complete abstract interpretation allows some false-alarms to be reported, but their number is bounded. In this case, the imprecision of the abstract interpreter is bounded by πœ€, namely, the distance between the property represented by the abstraction of the concrete semantics and the result of the abstract interpretation on the given input, is at most πœ€. Given a tolerance πœ€ of imprecision and a set of inputs 𝐼, we want to answer the following questions (Section 4): Can we implement a program analyzer such that all programs having 𝐼 as input are πœ€-partial complete? Can we prove if our program analysis has or not a bounded level of imprecision? 2. Quasi-metrics on Abstract Domains Our goal is to derive the bound of imprecision of an abstract interpreter with respect to a given measure over the abstract domain. For this reason, we need a metric to compare the elements of the abstract domain according to their relative degree of precision. We refer to the 2 Marco Campion et al. CEUR Workshop Proceedings 1–7 weaker notion of quasi-metric introduced in [8] on a non-empty set 𝑆. This is a metric function 𝛿 : 𝑆 Γ— 𝑆 β†’ Qβ‰₯0 whose symmetry property may not hold. A set endowed with a quasi-metric is called quasi-metric space. Let π’œ(𝐢) be the set of all abstract domains 𝐴 abstracting some set of concrete properties 𝐢 and having ordering relation ≀𝐴 2 . Definition 2.1 (Quasi-metrics 𝐴-compatible). We say that the distance function 𝛿𝐴 : 𝐴×𝐴 β†’ N∞β‰₯0 βˆͺ {βŠ₯} is a quasi-metric 𝐴-compatible for 𝐴 ∈ π’œ(𝐢) if for all π‘Ž1 , π‘Ž2 , π‘Ž3 ∈ 𝐴, it satisfies the following axioms: (i) π‘Ž1 = π‘Ž2 ⇔ 𝛿𝐴 (π‘Ž1 , π‘Ž2 ) = 0 (ii) π‘Ž1 ≀𝐴 π‘Ž2 ⇔ 𝛿𝐴 (π‘Ž1 , π‘Ž2 ) ΜΈ= βŠ₯ (iii) π‘Ž1 ≀𝐴 π‘Ž2 ≀𝐴 π‘Ž3 β‡’ 𝛿𝐴 (π‘Ž1 , π‘Ž3 ) ≀ 𝛿𝐴 (π‘Ž1 , π‘Ž2 ) + 𝛿𝐴 (π‘Ž2 , π‘Ž3 ) (iv) βˆ€π‘Ž1 , π‘Ž2 ∈ 𝐴, πœ€ ∈ Nβ‰₯0 the predicate 𝛿𝐴 (π‘Ž1 , π‘Ž2 ) ≀ πœ€ is decidable. We allow the quasi-metric between two uncomparable elements to be βŠ₯, which represents an undefined distance. An abstract domain 𝐴 ∈ π’œ(𝐢) endowed with a quasi-metric 𝐴-compatible 𝛿𝐴 , forms an abstract quasi-metric space, denoted by 𝐴 β‰œ (𝐴, 𝛿𝐴 ). We use A(𝐢) to refer to the set of all abstract quasi-metric spaces, and write 𝐴 ∈ A(𝐢). As an example of quasi-metric 𝐴-compatible, we define the weighted path-length 𝛿𝐴 w as the minimum weighted path, w.r.t. a weight function w, of intermediate elements between two comparable elements of an abstract domain 𝐴. Here 𝛿𝐴 w considers the lattice of 𝐴 as a weighted directed graph where each edge corresponds to two adjacent abstract elements π‘Ž, 𝑏, that is, π‘Ž ≀𝐴 𝑏 and there are no elements 𝑐 ∈ 𝐴 such that π‘Ž ≀𝐴 𝑐 ≀𝐴 𝑏. So for example, 𝛿Int w over the intervals abstraction having w(Β·, Β·) = 1, i.e., setting to 1 the weight of each edge of Int, returns 9 between [0, 0] and [0, 10] meaning that there are exactly 9 more elements in [0, 10] respect to [0, 0], while 𝛿Int w ([0, 0], [0, +∞]) = ∞ and 𝛿 w ([0, 0], [1, 10]) = βŠ₯ because [0, 0] ̸≀ Int Int [1, 10]. 3. Partial Completeness Standard completeness in program analysis3 , e.g., see [2, 3, 13], means that no false alarms are returned by analyzing the program with an abstract interpreter on any possible input state. Local completeness, instead, is a recently introduced property [14] that requires completeness only with respect to specific inputs. An πœ€-partially complete program analyzer allows some false alarms to be reported over a considered (set of) input, but their amount is bounded by a constant πœ€ which is determined according to a quasi-metric which is compatible with the abstract domain used by the analysis. Formally, given a program 𝑃 ∈ Programs, a constant bound πœ€ ∈ Nβ‰₯0 , a non-empty set of stores 𝑆 ∈ β„˜(S) and 𝐴 ∈ A(β„˜(S)), we say that 𝐴 is πœ€-partially complete for 𝑃 in 𝑆 if 𝛼𝐴 (J𝑃 K𝑆) β‰€πœ€π›Ώπ΄ J𝑃 K𝐴 𝛼𝐴 (𝑆), where π‘Ž β‰€πœ€π›Ώπ΄ 𝑏 iff 𝛿𝐴 (π‘Ž, 𝑏) ≀ πœ€. We now have all the ingredients to introduce the notion of πœ€-partial completeness class of programs. 2 We consider abstract domains in π’œ(𝐢) that are either recursive or trivial [9, 1]. 3 In some topics concerning abstract interpretation, completeness is typically recurrent, e.g., in comparative seman- tics [10, 11] or formal languages [12]. Here we are considering only the field of program analysis. 3 Marco Campion et al. CEUR Workshop Proceedings 1–7 Definition 3.1 (πœ€-Partial completeness class). The partial completeness class of an abstract quasi-metric space 𝐴 ∈ A(β„˜(S)), a constant πœ€ ∈ Nβ‰₯0 and a non-empty set of inputs 𝑆 ∈ β„˜(S), denoted C(𝐴, πœ€, 𝑆) βŠ† Programs, is defined as: C(𝐴, πœ€, 𝑆) β‰œ {𝑃 ∈ Programs | 𝛼𝐴 (J𝑃 K𝑆) β‰€πœ€π›Ώπ΄ J𝑃 K𝐴 𝛼𝐴 (𝑆)}. Similarly to the completeness case [15], for every πœ€, the πœ€-partial completeness class is infinite and non-extensional. It is infinite because for all πœ€ ∈ Nβ‰₯0 and 𝑆 ∈ β„˜(S), C(𝐴) βŠ† C(𝐴, πœ€, 𝑆) and C(𝐴) is infinite, where C(𝐴) denotes the (global) completeness class of programs. It is also non-extensional because there always exist programs 𝑃 and 𝑄 such that: 𝑃 is partially complete for 𝐴, J𝑃 K = J𝑄K, and 𝑄 is not partially complete for 𝐴. 4. Recursive Properties of Partial Complete Programs It is well known that for trivial abstractions (i.e., either the identity or an abstraction having one element only) the corresponding completeness class turns out to be the whole set of programs [13]. Moreover, for all non-trivial abstractions in π’œ(β„˜(S)), its completeness class is strictly contained in Programs [15]. In this section, we study the counterpart of these results for the case of partial completeness. It turns out that the quasi-metric 𝐴-compatible chosen for measuring the imprecision of the analysis, plays an important rule here in order to determine the recursive properties of C(𝐴, πœ€, 𝑆) and, therefore, of the considered program analysis. We first consider the simplest case of abstract quasi-metric spaces of stores 𝐴 ∈ A(β„˜(S)) satisfying the property of having a limited imprecision by πœ€ ∈ Nβ‰₯0 , i.e., abstractions such that βˆ€π‘Ž ∈ 𝐴 : 𝛿𝐴 (βŠ₯𝐴 , π‘Ž) ≀ πœ€. These include, for instance, the case of finite height lattices with the weighted path-length quasi-metricβ€”complete lattices which are both ACC and DCC. Theorem 4.1. If 𝐴 ∈ A(β„˜(S)) has limited imprecision, then we have for all 𝑆 ∈ β„˜(S): βˆƒπœ€ ∈ Nβ‰₯0 . C(𝐴, πœ€, 𝑆) = Programs . For example, the Sign β‰œ {Z, βˆ’, 0, +, βˆ…} abstraction [16] measured with 𝛿Sign w having w(Β·, Β·) = 1, has limited imprecision because by for any πœ€ β‰₯ 3, the partial completeness class turns out C((Sign, 𝛿Sign w ), πœ€, 𝑆) = Programs. The difference with respect to the case of standard completeness class C(𝐴) is that, thanks to the possibility of admitting an upper margin to imprecision (i.e., possible false alarms), there always exists a class of partial completeness with respect to a given bound which includes all programs. Conversely, an abstract quasi-metric space of stores 𝐴 ∈ A(β„˜(S)) has unlimited imprecision when: βˆ€πœ€ ∈ Nβ‰₯0 , βˆƒπ‘Ž ∈ 𝐴. 𝛿𝐴 (βŠ₯𝐴 , π‘Ž) > πœ€. The abstract quasi-metric space (Int, 𝛿Int w ) is such an example. Theorem 4.2. Let 𝐴 ∈ A(β„˜(S)) be any abstract quasi-metric space of stores with unlimited imprecision 𝛿𝐴 and 𝑆 ∈ β„˜(S). Then, the following equivalence holds: βˆƒπœ€ ∈ Nβ‰₯0 . C(𝐴, πœ€, 𝑆) = Programs ⇔ 𝐴 = β„˜(S). Informally, if we consider a non-trivial abstract quasi-metric space that has unlimited imprecision and an input 𝑆, then independently of how we set a threshold πœ€ of false alarms acceptance, there always exists a program 𝑃 for which the abstract analysis over 𝐴 with input 𝑆, is not 4 Marco Campion et al. CEUR Workshop Proceedings 1–7 πœ€-partially complete, namely 𝑃 ̸∈ C(𝐴, πœ€, 𝑆). By a straightforward padding argument, any of these programs can be extended to an infinite set of programs for which the abstraction is πœ€-partially incomplete. The class of πœ€-partial incomplete programs for 𝐴 ∈ A(β„˜(S)) with input 𝑆, is the complement set of C(𝐴, πœ€, 𝑆), formally: C(𝐴, πœ€, 𝑆) = {𝑃 ∈ Programs | 𝛼𝐴 (J𝑃 K𝑆) ΜΈβ‰€πœ€π›Ώπ΄ J𝑃 K𝐴 𝛼𝐴 (𝑆)}. Theorem 4.2 implies that any non-trivial abstract domain of stores endowed with an unlimited imprecision 𝛿𝐴 , has an infinite set of programs for which the abstract interpreter is πœ€-partially incomplete. Conversely, if 𝛿𝐴 has limited imprecision, then, trivially, we can always find a certain level of tolerance that makes the analysis πœ€-partially complete for all programs. We now study the computational limits of the class of partially complete and incomplete programs. In this case, the topological structure of 𝐴 in terms of ascending chains, is fundamental. We say that an abstract quasi-metric space of stores 𝐴 ∈ A(β„˜(S)) is πœ€-trivial for some πœ€ ∈ Nβ‰₯0 if C(𝐴, πœ€, 𝑆) = Programs. The following theorem shows that the class of programs C(𝐴, πœ€, 𝑆) turns out to be recursively enumerable (r.e.) whenever the abstract domain of stores 𝐴 satisfies the Ascending Chain Condition (ACC). Theorem 4.3. If 𝐴 ∈ A(β„˜(S)) is ACC, then for every 𝑆 ∈ β„˜(S) and πœ€ ∈ Nβ‰₯0 , C(𝐴, πœ€, 𝑆) is r.e.. Sketch. Run a dovetail algorithm constructing the set 𝑏 β‰œ π‘–βˆˆN {𝛼𝐴 (J𝑃 K𝑠𝑖 )} starting from ⨆︀ 𝑏 = βŠ₯, where 𝑠𝑖 ∈ 𝑆 is an enumeration of 𝑆. If, at some point of the iterates, there exists 𝑠𝑖 ∈ 𝑆 such that J𝑃 K𝑠𝑖 ΜΈ= βˆ…, 𝑏 = 𝑏 βŠ”π΄ 𝛼𝐴 (J𝑃 K𝑠𝑖 ) and 𝛿𝐴 (𝑏, J𝑃 K𝐴 𝛼𝐴 (𝑆)) ≀ πœ€, then the algorithm terminates. If 𝑃 ∈ C(𝐴, πœ€, 𝑆) then the convergence of the above algorithms is guaranteed by the ACC of 𝐴. The following theorem states that both C(𝐴, πœ€, 𝑆) and C(𝐴, πœ€, 𝑆) are non-r.e. sets when 𝐴 is not πœ€-trivial and not ACC. Theorem 4.4. If 𝐴 ∈ A(β„˜(S)) is not πœ€-trivial, then C(𝐴, πœ€, 𝑆) is non-r.e.. Moreover, if 𝐴 is also not ACC, then C(𝐴, πœ€, 𝑆) is non-r.e.. As a straightforward corollary, the local completeness class C(𝐴, 𝑆) for 𝐴 satisfying the ACC property is r.e., while non-r.e. for non-ACC abstractions. Furthermore, if 𝐴 is not trivial, then C(𝐴, 𝑆) is non-r.e. even if 𝐴 is ACC. Let us notice that Theorems 4.3 and 4.4 provide a further insight into the structure of C(𝐴, πœ€, 𝑆) and its complement class C(𝐴, πœ€, 𝑆). These theorems prove that, given any non-ACC abstract quasi-metric space of stores 𝐴, whenever we limit the expected imprecision of our analysis to a bound πœ€ of possible false alarms w.r.t. an input 𝑆, we cannot build a procedure that enumerates all programs satisfying that bound or that do not respect that bound, unless the abstract domain is πœ€-trivial. Therefore, deciding whether a static program analysis can produce or cannot produce some bounded set of false alarms is in general impossible, unless 𝐴 satisfies the ACC. In this last case, we can at least answer β€œyes” if the static analysis has a bounded imprecision over a program with a specific input. The πœ€-partial completeness and incompleteness class of an abstraction are therefore a non-trivial property of programs for which no recursively enumerable procedure may exist which is able to enumerate all of their elements. 5 Marco Campion et al. CEUR Workshop Proceedings 1–7 References [1] M. Campion, M. Dalla Preda, R. Giacobazzi, Partial (in)completeness in abstract interpreta- tion: Limiting the imprecision in program analysis, Proc. ACM Program. Lang. 6 (2022). doi:10.1145/3498721. [2] P. Cousot, R. Cousot, Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints, in: Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, ACM Press, 1977, pp. 238–252. doi:10.1145/512950.512973. [3] P. Cousot, R. Cousot, Systematic design of program analysis frameworks, in: Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, ACM Press, 1979, pp. 269–282. doi:10.1145/567752.567778. [4] P. Cousot, Principles of Abstract Interpretation, The MIT Press, Cambridge, Mass., 2021. [5] G. Winskel, The formal semantics of programming languages: an introduction, MIT press, 1993. [6] A. MinΓ©, Tutorial on static inference of numeric invariants by abstract interpretation, Foundations and Trends in Programming Languages 4 (2017) 120–372. URL: https://doi. org/10.1561/2500000034. doi:10.1561/2500000034. [7] R. Bruni, R. Giacobazzi, R. Gori, I. Garcia-Contreras, D. Pavlovic, Abstract extensionality: on the properties of incomplete abstract interpretations, PACMPL 4 (2020) 28:1–28:28. doi:10.1145/3371096. [8] W. A. Wilson, On quasi-metric spaces, American Journal of Mathematics 53 (1931) 675–684. doi:10.2307/2371174. [9] P. Cousot, R. Giacobazzi, F. Ranzato, Program analysis is harder than verification: A computability perspective, in: International Conference on Computer Aided Verification, Springer, 2018, pp. 75–95. doi:10.1007/978-3-319-96142-2_8. [10] P. Cousot, R. Cousot, Inductive definitions, semantics and abstract interpretation, in: Conference Record of the 19th ACM Symp. on Principles of Programming Languages (POPL ’92), ACM Press, 1992, pp. 83–94. [11] P. Cousot, R. Cousot, Compositional and inductive semantic definitions in fixpoint, equational, constraint, closure-condition, rule-based and game-theoretic form (Invited Paper), in: P. Wolper (Ed.), Proc. of the 7th Internat. Conf. on Computer Aided Verification (CAV ’95), volume 939 of Lecture Notes in Computer Science, Springer-Verlag, 1995, pp. 293–308. [12] M. Campion, M. Dalla Preda, R. Giacobazzi, Abstract interpretation of indexed grammars, in: International Static Analysis Symposium, Springer, 2019, pp. 121–139. doi:10.1007/ 978-3-030-32304-2_7. [13] R. Giacobazzi, F. Ranzato, F. Scozzari., Making abstract interpretation complete, Journal of the ACM 47 (2000) 361–416. doi:10.1145/333979.333989. [14] R. Bruni, R. Giacobazzi, R. Gori, F. Ranzato, A logic for locally complete abstract interpre- tations, in: Proc. 36th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2021), IEEE Computer Society, 2021, pp. 1–13. doi:10.1109/LICS52264.2021.9470608, distinguished paper. [15] R. Giacobazzi, F. Logozzo, F. Ranzato, Analyzing program analyses, in: Proceedings 6 Marco Campion et al. CEUR Workshop Proceedings 1–7 of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015, 2015, pp. 261–273. doi:10. 1145/2676726.2676987. [16] P. Cousot, R. Cousot, Static determination of dynamic properties of programs, in: Pro- ceedings of the 2nd International Symposium on Programming, Dunod, Paris, 1976, pp. 106–130. doi:10.1145/390019.808314. 7