=Paper=
{{Paper
|id=Vol-2047/BENEVOL_2017_paper_9
|storemode=property
|title=Incrementalizing Abstract Interpretation
|pdfUrl=https://ceur-ws.org/Vol-2047/BENEVOL_2017_paper_9.pdf
|volume=Vol-2047
|authors=Noah Van Es,Maarten Vandercammen,Coen De Roover
|dblpUrl=https://dblp.org/rec/conf/benevol/EsVR17
}}
==Incrementalizing Abstract Interpretation==
<pdf width="1500px">https://ceur-ws.org/Vol-2047/BENEVOL_2017_paper_9.pdf</pdf>
<pre>
                Incrementalizing Abstract Interpretation
                                                         Position Paper

                                  Noah Van Es, Maarten Vandercammen, Coen De Roover
                           Software Languages Lab – Vrije Universiteit Brussel – Brussels, Belgium
                                    {noahves, mvdcamme, cderoove}@vub.ac.be


   Abstract—A powerful approach to design static analysers is            We explore the incrementalization of the Abstracting Ab-
abstract interpretation, which reasons over an approximation          stract Machines (AAM) technique [1], an approach to sound
of the program’s behaviour. The Abstracting Abstract Machines         and decidable program analysis. Currently, when this analysis
(AAM) technique, introduced by Van Horn & Might, presents a
systematic approach to derive abstract interpreters. However, it      technique analyses a given program and this program’s source
is often not useable in the development and evolution of large-       code is later edited, the analysis must throw away its original
scale applications, as with the current state-of-the-art, an AAM      result and recompute the analysis from scratch. Given a series
analysis can not incrementally update its results upon changes in     of small modifications to a program, which are typical for
the program. That is, whenever minor modifications occur in the       an application under development, this often results in time-
program’s source code, one needs to recompute the entire AAM
analysis from scratch, which can easily get time-consuming.           consuming recomputation of the analysis. Indeed, we expect a
   We therefore propose an incremental approach to abstract           minor change to the program to only have a limited impact on
interpretation, more precisely the AAM technique. That is, we         the resulting state graph, making recomputation of the entire
modify the technique so that the result of an AAM analysis,           state graph redundant. Furthermore, analyses developed using
the abstract state graph, can incrementally be updated following      the AAM technique are notorious for being slow due to the
a change in the program’s source code. Our algorithm tracks
dependencies between the nodes in the abstract syntax tree
                                                                      large state graph that needs to be computed, so efficiency and
(AST) of the program and the transitions in the abstract state        performance are critical for these kinds of analyses [2].
graph to invalidate and recompute new transitions in the state           We therefore hypothesise that an incremental approach
graph upon a change in the AST. Our experiments using a               for the AAM technique can be developed, where as much
set of Scheme micro-benchmarks reveal that in practice this           as possible of the original analysis result is reused while
approach is often limited, as only states that are identical
in both state graphs are reusable. We therefore introduce an
                                                                      analysing the updated program.
improvement to the original incremental algorithm, which we              Concretely, we make the following contributions:
refer to as state adaptation. State adaptation also enables reusing      • We design an incremental variant of the AAM approach
states that are not identical, but similar. Both the original and           to abstract interpretation that enables updating the ab-
improved algorithm are integrated and evaluated in the Scala-
AM framework. Our current implementations already show
                                                                            stract state graph given a change in the AST of the
good results in terms of incremental efficiency, although more              program.
optimization is required to achieve actual gains in run time             • We present state adaptation as an improvement to this
performance with our approaches.                                            algorithm, so that more reuse can be exploited from the
   Index Terms—static analysis, incremental computation, ab-                previously computed state graph.
stract interpretation, AAM                                               • An implementation of both the original algorithm as well
                                                                            as the improved version with state adaptation is made
                       I. I NTRODUCTION                                     available in the Scala-AM framework.
                                                                         • We evaluate our implementation using a series of Scheme
   Static analysis tools are often an integral part in the                  micro-benchmarks that enable us to point out strengths
evolution of software artefacts to detect bugs early on in                  and weaknesses of our incrementalization approach.
the development process. Today’s IDEs are equipped with
powerful analysis tools that developers have come to rely on                               II. BACKGROUND
to detect bugs in their programs early on. Static analysis is the        Abstract interpretation [3] is a program analysis technique,
foundation of these tools, making it possible to check prop-          used to statically check some property of a given program. The
erties of a program without actually executing the program.           main idea behind abstract interpretation is that the analysis
Static analyses that are used in this context not only require        reasons over an approximation of the program’s execution
precision and correctness, but also performance. In particular,       behaviour. That is, the program’s semantics are abstracted
when developers make incremental changes to the source code,          using an over-approximation to obtain an analysis that is sound
they expect responsive feedback from the program analysis.            and decidable. In particular, we focus on the AAM technique.
This results in a need for performant static analysis tools that         This technique starts from a concrete CESKt⇤ abstract
can efficiently update their results in response to changes in        machine, a state machine which acts as a concrete interpreter
the program.                                                          for a language and hence models the concrete execution of


                                                                                                                                 31
a program. The AAM technique systematically abstracts this
                           \ abstract machine modelling
machine to an abstract CESKt⇤
the abstract execution behaviour of a program. Running the
abstracted abstract machine on a given program produces an
abstract state graph approximating the execution of the pro-
gram. More precisely, a given program expression is injected
into an initial state s0 . A transition relation modelling the
abstract semantics of the programming language then defines
                                                                                          Fig. 2. Illustration of an initially computed state graph.
how we step from one state to another, until the entire state
graph is computed. Figure 1 depicts an example of such a
resulting abstract state graph.                                                    •   In the first phase, transitions that may be affected by
                                                                                       the change are invalidated, resulting in an invalidated
                                                                                       state graph. Finding out which transitions need to be
                                                                                       invalidated is done by examining the tracked dependen-
                                                                                       cies. An example of the resulting state graph is given in
                                                                                       Figure 3, where states whose transitions were invalidated
                                                                                       are highlighted in grey.


 Fig. 1. Illustration of a state graph resulting from abstract interpretation.


                  III. I NCREMENTALIZING AAM
   We now present the foundation of our approach to incre-
mentalizing abstract interpretation, more precisely the AAM
technique. The goal of this incrementalization is to avoid full                           Fig. 3. Illustration of the state graph after invalidation.
recomputation of the analysis given a small change in the
source code of the analysed program. Therefore, the resulting                      •   In the second phase, new transitions are recomputed from
algorithm is able to incrementally update the abstract state                           those invalidated states, resulting in the updated state
graph produced by the analysis, given a change to the pro-                             graph. An example is given in Figure 4, where newly
gram’s abstract syntax tree (AST). That is, given the previously                       computed states and transitions are highlighted in grey.
computed state graph and a list of changes in the AST as input,
it produces a new, updated state graph.
   Our approach operates on the level of the abstract state
graph. It does not rely on any language-specific features or
characteristics. Instead, it reuses the state graph computed
from the original program (henceforth referred to as the
previous state graph) as a cache that is kept up-to-date with
modifications in the AST through the invalidation of outdated
transitions. When recomputing the new state graph, transitions
that are still valid are reused directly, while invalidated tran-
sitions have to be recomputed.
   To support our incrementalization approach, we need to
explicitly track dependencies between the nodes in the AST
and transitions in the abstract state graph. These dependencies
are registered and maintained in a data structure during the                             Fig. 4. Illustration of the state graph after recomputation.
analysis when building up the state graph.
   By explicitly tracking these dependencies, we can efficiently                   Given these two steps, one ends up with a state graph
update the state graph given a change in the program’s source                    identical to the one obtained from recomputing it from scratch.
code. Once such a change occurs, we first require an AST
differencer that matches nodes between the new and the old                                            IV. S TATE A DAPTATION
AST and points out which of these nodes have been modified.                         We now present an improvement to the previous incremen-
Afterwards, our algorithm works in two phases to update the                      talization technique. More precisely, we introduce the concept
previously computed state graph. An example of such an initial                   of state adaptation to overcome a shortcoming of the original
state graph is given in Figure 2.                                                approach. A key observation is that when recomputing new


                                                                                                                                                        32
transitions, we often end up with states that are similar, but          in updating the state graph. While state adaptation can avoid
not entirely identical to corresponding states in the previously        recomputation of transitions, it itself requires searching for
computed state graph. The original algorithm is not able to             similar states, computing a state delta, checking for violations
exploit this, as these program states no longer appear identical        of the delta with the effects of a transition, and then reapplying
between the new and the old state graph. Indeed, transitions            the delta to some state. The approach presented here provides
that are not completely identical will be invalidated and               a solid and safe technique to further improve the incremental
recomputed, and the initial approach’s effectiveness becomes            AAM algorithm.
limited by the amount of states that are shared identically
between both graphs. Using state adaptation, we can now also                                 V. I MPLEMENTATION
reuse the transitions of states that are not identical, but similar        Both the original incrementalization approach, as well as
to a state of the new graph.                                            the improved algorithm augmented with state adaptation have
   The main idea is that if the difference, or delta, between           been integrated into the Scala-AM framework [4]. All source
both states is unimportant to the computation of their succes-          code has been made available publicly in an online repository1 .
sors, the transitions of one can be reused for the other. Doing            Integrating our novel incremental approach into this existing
so avoids recomputation of transitions from scratch because             framework as an additional component offers several benefits.
of minor differences in the program state after recomputation.          It enables reusing most components of the abstract machine
The high-level idea of state adaptation is shown in Figure 5.           that already exist in the framework. In addition, we avoid
                                                                        producing an isolated artefact by implementing our approach
                                                                        as a component of the framework, so that we can present it
                                                                        as a flexible addition to an established environment.
                                                                           At the time of writing, our implementation does not yet
                                                                        feature an advanced AST differencer. Instead, it uses a simple
                                                                        differencer that expects the same structure for the new and
                                                                        old AST of the program. While the AST differencing is just
                                                                        a front-end to our algorithm, we envision that future work
                                                                        could integrate a more advanced AST matcher [5] in our
                                                                        implementation to get more accurate differencing on larger,
                                                                        real-world applications.
                 Fig. 5. Illustration of state adaptation.
                                                                                               VI. E VALUATION
   We refer to state adaptation as an indirect form of reuse, as
                                                                           To evaluate the effectiveness of our incrementalization ap-
it does not enable direct reuse of states from the previous state
                                                                        proach, we study to what extent the previous state graph can
graph. Rather, it is more of a hybrid approach, as some work
                                                                        be reused. More precisely, when recomputing the updated state
is still required before a transition can be reused. That is, to
                                                                        graph after a change in the AST, we measure how many of
reuse the transition (s01 , s02 ) when determining the transition
                                                                        those states are recomputed from scratch compared to how
(s1 , s2 ) we require the following steps:
                                                                        many are reused or adapted from the state graph which was
   • Given the current state s1 for which we have to determine
                                                                        computed in the previous run of the analysis. We expect that
       the successor state, a similar state s01 is looked up the        lowering the amount of states that need to be recomputed
       previous state graph. If such a state exists, we compute         results in an improvement in the total run-time performance for
       the delta between s1 and s01 . In Figure 5, this is indicated    successive runs of the analysis given incremental modifications
       by an arrow from s1 to s01 .                                     to a program’s source code. Figure 6 shows how many states
   • Next, we examine the effects and dependencies of the
                                                                        need to be recomputed for a set of micro benchmarks.
       transition (s01 , s02 ). If we can determine that the computed      Clearly, the original – or ‘naive’ algorithm – often falls
       delta does not violate any of these, the transition is ready     short, as for most benchmarks the majority of states have to be
       for indirect reuse using state adaptation.                       recomputed. The reason for this is that after recomputation, we
   • This is done by adapting the successor of s1 (i.e. s2 ),
                                                            0      0
                                                                        often notice minor changes in all subsequent program states,
       so that it can be reused as the successor of s1 (i.e. s2 ).      so that only states occurring before the recomputation can be
       As indicated by the arrow from s02 to s2 , this involves         properly reused. A resulting insight here from this observation
       applying the computed delta to s02 to obtain s2 .                is that direct reuse – i.e. reuse based on identical states between
   The main motivation for state adaptation is to avoid redun-          both graphs – is limited in the context of incremental updates
dant computation of transitions. Instead, an efficient incremen-        to the state graph of an AAM analysis. This points out the need
talization strategy should maximise reuse from the previous             for an alternative approach to reusability. Indeed, we observe
state graph. State adaptation offers more flexibility in reusing        greater incremental efficiency for the ‘improved’ algorithm
transitions, since it no longer requires states to be completely        that employs state adaptation. Using state adaptation, small
identical. Of course, the current formulation of state adaptation
does not necessarily translate to better run time performance             1 http://github.com/noahvanes/scala-am


                                                                                                                                       33
                                                                                                                                                    on our incremental AAM algorithm is entirely novel in that
                                  100
                                                                                                                                                    regard. Nevertheless, a large body of research already exists for
                                                                                                                                        Naive
                                                                                                                                        Improved    incremental computation (IC) [7]. Looking at existing work in
                                                                                                                                                    the incrementalization of programs, in particular that of static
                                   80
                                                                                                                                                    analyses, we discern two incrementalization approaches.
Percentage of recomputed states


                                                                                                                                                       On one hand, manual approaches, where for a particular
                                   60
                                                                                                                                                    kind of analysis an incremental version is designed manually.
                                                                                                                                                    Examples of such manual incrementalization efforts include
                                   40                                                                                                               incremental static taint analysis in Andromeda [8] and the
                                                                                                                                                    static analysis of web applications in Gulfstream [9].
                                   20
                                                                                                                                                       On the other hand, automatic approaches where an analysis
                                                                                                                                                    is automatically made incremental by specifying the analysis
                                                                                                                                                    in some framework. For instance, analyses specified in the
                                    0
                                                                                                                                                    DSL of IncA [10] are automatically made incremental, and the
                                          countA

                                                   countB

                                                            countC

                                                                      countD

                                                                               gcipd

                                                                                       fAdd

                                                                                              fRem

                                                                                                     fPerm

                                                                                                             rotateA

                                                                                                                       rotateB

                                                                                                                                 var

                                                                                                                                       fact

                                                                                                                                              eta
                                                                                                                                                    Reviser framework [11] enables automatic incrementalization
                                                                                                                                                    of analyses specified in the IFDS framework. Language sup-
                                   Fig. 6. Recomputation ratios for both algorithms (lower is better).                                              port can also offer automatic incrementalization, using some
                                                                                                                                                    form of self-adjusting computation [12]. Examples include
                                                                                                                                                    adaptive functional programming [13] and incremental compu-
differences that occur after recomputation can be accommo-                                                                                          tation using Adapton [14]. In particular, incremental evaluation
dated for. As a result, using state adaptation fewer states need                                                                                    of tabled Prolog [15] has already been employed to develop
to be recomputed from scratch.                                                                                                                      incremental versions of existing static analysers [16].
   Currently, the focus of our evaluation is not yet to measure                                                                                        Our own incremental AAM algorithm can be seen as an ad
absolute run time performance of the analysis. With our                                                                                             hoc incrementalization approach. Manually incrementalizing
current implementation, our gains in incremental efficiency                                                                                         AAM allows for explicit, fine-grained control over the incre-
do not yet result in consistent gains in performance. The                                                                                           mentalization mechanism which is usually lost in automatic
main reason for this is that our implementation still copies                                                                                        incrementalization approaches.
reused transitions from the previous state graph, and hence still
requires traversal of the entire state graph. We aim to solve this
issue by restarting recomputation only from invalidated states;                                                                                                          VIII. C ONCLUSION
however, in order to do so, an improved implementation that
takes into account dynamic connectivity issues [6] is required.                                                                                        We presented a brief overview of our work on designing an
This is necessary to avoid recomputation in graph components                                                                                        incremental approach to abstract interpretation, more precisely
that are disconnected from the initial state after invalidation                                                                                     the AAM technique. The main idea is that we aim to avoid full
and therefore not reachable during the execution of the pro-                                                                                        recomputation of the abstract state graph, given a change in
gram. Another major issue is that the current formulation of                                                                                        the source code of our program. Instead, our algorithm is able
state adaptation is not that efficient. As previously mentioned,                                                                                    to efficiently update the previously computed state graph by
performing state adaptation can require several costly steps,                                                                                       invalidating affected transitions and recomputing new transi-
which can be more computationally expensive than simply                                                                                             tions as required. We observed that this approach can fall short,
recomputing a transition from scratch. However, we believe                                                                                          as states are often no longer identical after recomputation.
that the core idea of state adaptation offers a foundation that                                                                                     In general, we concluded that to efficiently update the state
enables exploiting more reuse in the context of incremental                                                                                         graph, we can not only rely on reusing identical transitions
AAM, and that such a foundation is more amendable to future                                                                                         from the previous state graph. Therefore, state adaptation
optimization than from scratch recomputation. It is clear that                                                                                      was presented as a novel technique to exploit more indirect
evaluating the incremental effectiveness enables to assess the                                                                                      reuse from the previous state graph. Our experiments reveal
potential performance benefits that could be achieved with a                                                                                        that state adaptation can overcome the main weakness of the
more optimized implementation.                                                                                                                      original algorithm, since it can accommodate for insignificant
   Finally, we asserted the correctness of our implementation                                                                                       different in program states after recomputation. As a result,
by comparing the results of our incremental analysis with those                                                                                     the improved algorithm achieves great incremental efficiency,
of the original Scala-AM framework. For all the experiments                                                                                         i.e. it greatly reduces the amount of transitions that have to
we conducted, the output of the incremental analysis was                                                                                            be recomputed from scratch. However, future work should
identical to that of the original one.                                                                                                              aim to further optimise our current implementation, so that
                                                                                                                                                    these gains in incremental efficiency can also be translated into
                                                                     VII. R ELATED W ORK                                                            actual performance gains. Nevertheless the results demonstrate
  At the time of writing, no other incremental version of AAM                                                                                       that our approach can provide a solid foundation for an
has yet been proposed or developed, so the work we presented                                                                                        efficient and incremental approach to AAM.


                                                                                                                                                                                                                 34
                             R EFERENCES
 [1] Van Horn, D., & Might, M. (2010, September). Abstracting abstract
     machines. In ACM Sigplan Notices (Vol. 45, No. 9, pp. 51-62). ACM.
 [2] Johnson, J. I., Labich, N., Might, M., & Van Horn, D. (2013). Optimizing
     abstract abstract machines. ACM SIGPLAN Notices, 48(9), 443-454.
 [3] Cousot, P., & Cousot, R. (1977, January). Abstract interpretation: a
     unified lattice model for static analysis of programs by construction or
     approximation of fixpoints. In Proceedings of the 4th ACM SIGACT-
     SIGPLAN symposium on Principles of programming languages (pp.
     238-252). ACM.
 [4] Stiévenart, Q., Vandercammen, M., De Meuter, W., & De Roover, C.
     (2016, October). Scala-am: A modular static analysis framework. In
     Source Code Analysis and Manipulation (SCAM), 2016 IEEE 16th
     International Working Conference on (pp. 85-90). IEEE.
 [5] Falleri, J. R., Morandat, F., Blanc, X., Martinez, M., & Monperrus, M.
     (2014, September). Fine-grained and accurate source code differencing.
     In Proceedings of the 29th ACM/IEEE international conference on
     Automated software engineering (pp. 313-324). ACM.
 [6] Thorup, M. (2000, May). Near-optimal fully-dynamic graph connectiv-
     ity. In Proceedings of the thirty-second annual ACM symposium on
     Theory of computing (pp. 343-350). ACM.
 [7] Ramalingam, G.,& Reps, T. (1993, March). A categorized bibliogra-
     phy on incremental computation. In Proceedings of the 20th ACM
     SIGPLAN-SIGACT symposium on Principles of programming lan-
     guages (pp. 502-510). ACM.
 [8] Tripp, O., Pistoia, M., Cousot, P., Cousot, R., & Guarnieri, S. (2013,
     March). Andromeda: Accurate and Scalable Security Analysis of Web
     Applications. In FASE (Vol. 7793, pp. 210-225).
 [9] Guarnieri, S., & Livshits, B. (2010). GULFSTREAM: Staged Static
     Analysis for Streaming JavaScript Applications. WebApps, 10, 6-6.
[10] Szab, T., Erdweg, S., & Voelter, M. (2016, August). IncA: A DSL
     for the definition of incremental program analyses. In Proceedings of
     the 31st IEEE/ACM International Conference on Automated Software
     Engineering (pp. 320-331). ACM.
[11] Arzt, S., & Bodden, E. (2014, May). Reviser: efficiently updating IDE-
     /IFDS-based data-flow analyses in response to incremental program
     changes. In Proceedings of the 36th International Conference on Soft-
     ware Engineering (pp. 288-298). ACM.
[12] Acar, U. A. (2009, January). Self-adjusting computation:(an overview).
     In Proceedings of the 2009 ACM SIGPLAN workshop on Partial
     evaluation and program manipulation (pp. 1-6). ACM.
[13] Acar, U. A., Blelloch, G. E., & Harper, R. (2006). Adaptive functional
     programming. ACM Transactions on Programming Languages and Sys-
     tems (TOPLAS), 28(6), 990-1034.
[14] Hammer, M. A., Phang, K. Y., Hicks, M., & Foster, J. S. (2014,
     June). Adapton: Composable, demand-driven incremental computation.
     In ACM SIGPLAN Notices (Vol. 49, No. 6, pp. 156-166). ACM.
[15] Saha, D., & Ramakrishnan, C. R. (2006, January). Incremental eval-
     uation of tabled prolog: Beyond pure logic programs. In International
     Symposium on Practical Aspects of Declarative Languages (pp. 215-
     229). Springer, Berlin, Heidelberg.
[16] Saha, D., & Ramakrishnan, C. R. (2005, July). Incremental and demand-
     driven points-to analysis using logic programming. In Proceedings of the
     7th ACM SIGPLAN international conference on Principles and practice
     of declarative programming (pp. 117-128). ACM.


                                                                                35

</pre>