=Paper= {{Paper |id=Vol-2047/BENEVOL_2017_paper_7 |storemode=property |title=Employing Run-time Static Analysis to Improve Concolic Execution |pdfUrl=https://ceur-ws.org/Vol-2047/BENEVOL_2017_paper_7.pdf |volume=Vol-2047 |authors=Maarten Vandercammen,Coen De Roover |dblpUrl=https://dblp.org/rec/conf/benevol/VandercammenR17 }} ==Employing Run-time Static Analysis to Improve Concolic Execution== https://ceur-ws.org/Vol-2047/BENEVOL_2017_paper_7.pdf
     Employing Run-time Static Analysis to Improve
                 Concolic Execution
                                                         Position Paper

                            Maarten Vandercammen                                         Coen De Roover
                          Software Languages Lab                                     Software Languages Lab
                          Vrije Universiteit Brussel                                 Vrije Universiteit Brussel
                              Brussels, Belgium                                         Brussels, Belgium
                        Maarten.Vandercammen@vub.be                                  Coen.De.Roover@vub.be




   Abstract—Dynamic symbolic execution, or concolic execution,            The remainder of this paper is organized as follows. Sec-
is a program testing technique that systematically executes a          tion II introduces the concept of run-time static analysis.
program with the aim of exploring all feasible program paths,          Section III gives an introduction to concolic execution and
and locating and reporting all errors encountered in these paths.
However, as the complexity of the program grows, the number of         explains how a concolic tester may benefit from (run-time)
program paths explodes, making it infeasible for concolic testers      static analysis. Section IV briefly describes an implementation
to explore all of them. To reduce the number of paths to explore,      of a prototype of this technique. Section V concludes.
several concolic testing tools have recently started employing
static analysis to prune paths guaranteed by the static analysis                     II. RUN - TIME S TATIC A NALYSIS
to be safe. The concolic tester must then only focus on those
paths that might contain an error, as reported by the analysis.           Our proposed technique focuses on the abstracting abstract
However, due to imprecisions in the analysis’ result, the reported     machines (AAM) technique, developed by Van Horn and
errors may just be false positives, and it is up to the tester to      Might [13], where an abstract interpreter is systematically
verify whether a reported alarm is an actual error or merely a         derived from a concrete interpreter represented as a ‘concrete’
false positive. In this position paper, we propose to increase the     abstract machine written in a small-step style. AAM system-
precision of these analyses by not only performing an initial static
analysis before starting concolic testing of the program, but also     atically threads all unbounded components of the concrete
by launching incremental static analyses over the program at run       machine through the machine’s store and limits the amount of
time, and incorporating into the analyses run-time information         store addresses that can be allocated. As a result, the resulting
observed by the tester. The increased precision that results from      ‘abstract’ abstract machine, or static analysis, explores a finite
incorporating such run-time information should enable further          number of program states during its evaluation of a program,
pruning of the program paths that must be explored by the
concolic tester.                                                       thus ensuring that analysis terminates.
   Index Terms—Concolic Testing, Static Analysis, Blended Anal-           Our run-time static analysis technique builds on the close
ysis                                                                   relationship between the interpreter and its derived static
                                                                       analysis. All run-time information on the program’s execution
                       I. I NTRODUCTION                                is encoded in the concrete machine’s program state. This
                                                                       information is made available to the analysis by abstracting the
   In recent years, there has been an interest in blending static      program state in a similar manner as the abstraction of the ma-
and dynamic analyses to combine the best of both worlds. The           chine itself. In general, this technique therefore first involves
results of a sound static analysis over-approximate all possible       executing a program and observing the interpreter’s concrete
program executions, at the cost of precision. The results of           program state. When an external client decides to commence
a dynamic analysis are precise, yet only valid for a subset            static analysis, the concrete state is abstracted, and the abstract
of all program executions. In this position paper, we propose          interpreter is initialized with this abstract state and runs until
a novel mechanism to blend an abstract interpretation-based            it reaches a fixpoint. Figure 1 illustrates how the run-time
static analysis [5] with a dynamic analysis in the form of             information enables refining static analysis. As the predicate
a concolic tester [9] [10]. The proposed blended technique             depicted in Figure 1a involves a variable with a random value,
employs an initial static analysis, performed before starting          the flow graph (1b) computed by a traditional static analysis
concolic execution of the program, and thereafter launches             must contain both branches. However, should our run-time
static analyses at run time over the program. The run-time             static analysis technique be launched after binding the variable
analyses incorporate observed run-time information to increase         r to its evaluated value, and before evaluating the predicate’s
the precision of the analysis with respect to the program              value, the run-time analysis has the variable’s value available,
execution from which the analysis was launched.                        and can produce a flow graph (1c) that precisely predicts the



                                                                                                                                      26
                           (let ((r (random 10)))                       1    (let* ((a (random 100))
                              (if (> r 5))                              2           (b (random 100))
                                  (then)
                                  (else)))           (if (> r 5))       3           (c (+ a b))
(let ((r (random 10)))                                   (then)
                                                         (else))        4           (d (+ 30 20)))
   (if (> r 5))
       (then)                  (if (> r 5))                             5       (if (> a d)
       (else)))                    (then)
                                   (else))                              6           (if (> b c))
                                                        (then)                          "ok"
          (a)                                                           7


                                            (else)
                                                                        8               (error))
                             (then)                    (c)
                                                                        9           "ok"))
                                      (b)
                                                                                      Fig. 2: An example Scheme program.
  Fig. 1: A Scheme program to be analyzed (a), its initial (b),
  and its refined flow graph (c).                                                                (> a d)
                                                                                       a0 > 50               a0 <= 50

  branch to be be taken. Such a run-time analysis only soundly                        (> b c)                    "ok"
                                                                       a0 > 50 ∧                 a0 > 50 ∧
  predicts the future execution given the current execution of the     b0 > a0 + b0              b0 <= a0 + b0
  program. Should the program be restarted, a different value for             "ok"          (error)
  r may be generated, thus invalidating the result of the run-time
  analysis.                                                            Fig. 3: The symbolic execution tree of the program depicted
                                                                       in Figure 2.
                     III. C ONCOLIC T ESTING
     A concolic tester iteratively explores all feasible paths in      b0 so that the resulting path constraint is true. Note that, as
  a program by manipulating the values of input parameters to          a0 is guaranteed to be a positive number, the second conjunct
  the program, such as program arguments, random numbers,              b0 > a0 + b0 can never be true. Hence no values for a0 and b0
  values that are read from files etc. [3] [1].                        can be found so that the constraint is satisfied. The concolic
     In the first iteration, the tester assigns a random value         tester thus concludes that the constraint is unsatisfiable and
  to these input parameters while simultaneously collecting all        that the program statement at line 7 is unreachable. At this
  conditional predicates it encounters into a path constraint that     point, the concolic tester backtracks further into the execution
  symbolically represents the conditions that must be true in          and tries to negate the first conjunct in the path constraint,
  order for the program to execute that particular path. After         resulting in the new constraint a0 <= 50, and attempts to
  completing the first run of the program, a part of the path          solve this constraint, for example by assigning the value 0 to
  constraint is negated and the resulting path constraint is given     a0 . The concolic tester re-executes the program and assigns
  to a SMT solver. If the solver can find some value for the input     the value 0 to a. It completes the program by reaching the
  parameters so that this new path constraint is satisfied, concolic   expression at line 9. As no new branches were encountered
  testing continues by restarting the program and assigning these      by the tester during this last run, the tester concludes that it
  newly computed values to the input parameters. This process          has explored all feasible program paths and concolic testing
  continues until all feasible program paths have been explored,       terminates. The three path constraints that were constructed
  or until the tester exceeds some given time constraint.              can be collected in a symbolic execution tree that represents
     We illustrate the working of a concolic tester in more detail     all possible executions of the program, as shown in Figure 3.
  via the example Scheme program depicted in Figure 2. In
  this program, the variables a and b are considered input             A. Combining Static Analysis and Concolic Testing
  parameters. Hence, they are symbolically represented as the             The number of program paths is exponential in the number
  input variables a0 and b0 . Suppose now that in the first            of conditional branches of the program. As it is therefore
  iteration, the concolic tester randomly assigns 67 to a and          often infeasible for concolic testers to explore more than a
  31 to b. These values cause the concolic tester to execute the       fraction of all possible program paths in a real-world appli-
  program statement on line 8, at which point the tester reports       cation, testers employ sophisticated search strategies to select
  the error encountered there. Simultaneously with this concrete       a program path that can be explored and that is most likely
  execution, the tester also collects the symbolic representation      to result in a non-trivial error [2]. Nevertheless, in practical
  of the conditional predicates that were encountered at line 5        applications, concolic testers are not guaranteed to identify
  and 6, into a so-called path constraint, i.e., a0 > 50 ^ b0 <=       all errors located in a program. Recently, some concolic
  a0 +b0 . After completing this run, the tester attempts to explore   testers [8], [6], [4], [7] have started employing static analysis
  another path, such as the path leading to the program statement      to greatly prune the number of program paths that must be
  at line 7. To do this, the solver negates the last conditional       explored by the analysis. As a sound static analysis safely
  predicate in the path constraint, resulting in the constraint        proves the absence of errors, any program path not reported
  a0 > 50 ^ b0 > a0 + b0 and feeds this resulting path to a SMT        to contain an error can be avoided by the tester. However,
  solver, which subsequently attempts to assign values to a0 and       errors that were reported by the static analysis may actually



                                                                                                                                    27
 1   (define (f) (error))                                             at line 7, after the reassignment to h but before the paths
 2   (define (g) "ok")                                                to the calls of f and h are split. At this point, the run-time
 3   (let ((a (random 10))                                            analysis can incorporate the knowledge that h equals the safe
 4         (b (random 10))                                            function g, thereby avoiding the imprecision caused by the
 5         (h f))                                                     reassignment of h at line 7. In this case, the abstract state graph
 6      (if (> a 5)                                                   produced by the run-time analysis, as depicted in Figure 6, is
 7          (begin (set! h g)                                         precise enough to determine that calling h will not result in
 8                  (if (> b 5)                                       an error, hence this path should not be explored by the tester.
 9                      (f)   ; error                                 In general, a run-time analysis could be launched over the
10                      (h))) ; ok                                    program when the tester has reached a common ancestor node
11          "ok"))                                                    to two or more paths that are potentially unsafe, in the hope
       Fig. 4: Another example Scheme program.                        that the run-time information available at that point enables
                                                                      the static analysis to be precise enough to determine whether
                 (> a 5)                      (> b 5)                 or not any of the potentially unsafe paths can be pruned.

           (> b 5)         "ok"                                                 IV. I MPLEMENTATION AND F UTURE W ORK
                                        (f)             (h)
                                                                         We have implemented a prototype of this technique1 in the
     (f)         (h)
                                                                      Scala-AM framework [12], [11] to test Scheme programs.We
                                     (error)           "ok"           are currently in the process of extending and improving the
  (error)        ???                                                  prototype and evaluating its effectiveness. To overcome the
                                  Fig. 6: The abstract state          performance overhead incurred by performing (potentially
Fig. 5: The initial abstract      graph resulting from run-time       multiple) static analyses over the program at run time, we
state graph of the program        static analysis of the same         aim to make these analyses incremental with respect to each
depicted in Figure 4.             program.                            other, so that one analysis can reuse part of the work computed
                                                                      by another. Furthermore, we expect this technique to be most
                                                                      effective when it employs simple-but-fast analyses that have a
be false positives that are the result of imprecision in the static   low execution time but a high degree of imprecision, as these
analysis. The tester therefore exercises these paths in order to      imprecisions can hopefully be resolved by incorporating run-
verify whether the reported error is an actual error. Note that       time information.
the tester relies on the static analysis to be sound: as the tester
only explores program paths highlighted by the analysis, a                                       V. C ONCLUSION
false negative produced by the analysis would not be caught              We have proposed a novel technique for improving precision
by the tester.                                                        of a static analysis at run time, using run-time information on
   Consider the program listed in Figure 4 and its abstract state     the program’s execution. We propose to couple this technique
graph, produced by performing a simple initial static analysis        to a concolic tester, to reduce the number of program paths
over this program before starting concolic testing, in Figure 5.      that the tester must explore.
It can be seen from the state graph that the analysis considers
the path where a is not larger than 5 to be safe, the path where                                   R EFERENCES
both a and b are larger than 5 to be unsafe, and the path
where a is larger than 5 but b is not to be potentially unsafe.        [1] Roberto Baldoni, Emilio Coppa, Daniele Cono D’Elia, Camil Deme-
                                                                           trescu, and Irene Finocchi. A survey of symbolic execution techniques.
The conclusion for this last path is due to imprecision that               arXiv preprint arXiv:1610.00502, 2016.
arose in the static analysis: if we employ e.g., a very simple-        [2] Jacob Burnim and Koushik Sen. Heuristics for scalable dynamic test
but-fast constant propagation analysis that does not precisely             generation. In Automated Software Engineering, 2008. ASE 2008. 23rd
                                                                           IEEE/ACM International Conference on, pages 443–446. IEEE, 2008.
track reassignments to variables, such as the reassignment to          [3] Ting Chen, Xiao-song Zhang, Shi-ze Guo, Hong-yuan Li, and Yue
the function h at line 7, then the analysis cannot conclude                Wu. State of the art: Dynamic symbolic execution for automated test
that calling h at line 10 is safe. As a result, any concolic               generation. Future Generation Computer Systems, 29(7):1758–1773,
                                                                           2013.
tester relying on this analysis can only prune the first path          [4] Maria Christakis, Peter Müller, and Valentin Wüstholz. Guiding dynamic
and must still explore the other two paths.                                symbolic execution toward unverified program executions. In Proceed-
                                                                           ings of the 38th International Conference on Software Engineering,
B. Combining Run-time Static Analysis and Concolic Testing                 ICSE ’16, pages 144–155, New York, NY, USA, 2016. ACM.
                                                                       [5] Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified
   The safe path where h is called can be pruned by employing              lattice model for static analysis of programs by construction or approx-
                                                                           imation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN
a run-time static analysis, in combination with an initial static          Symposium on Principles of Programming Languages, POPL ’77, 1977.
analysis performed before starting testing of the program. If
the tester starts exploring the error path that results in the call     1 Available   at    https://github.com/mvdcamme/scala-am/tree/concolic
to f at line 9, the tester could launch a run-time static analysis    prototype




                                                                                                                                               28
 [6] Christoph Csallner and Yannis Smaragdakis. Check ’n’ crash: Combin-
     ing static checking and testing. In Proceedings of the 27th International
     Conference on Software Engineering, ICSE ’05, pages 422–431, New
     York, NY, USA, 2005. ACM.
 [7] Kostas Ferles, Valentin Wüstholz, Maria Christakis, and Isil Dillig.
     Failure-directed program trimming. In Proceedings of the 2017 11th
     Joint Meeting on Foundations of Software Engineering, ESEC/FSE
     2017, pages 174–185, New York, NY, USA, 2017. ACM.
 [8] Xi Ge, Kunal Taneja, Tao Xie, and Nikolai Tillmann. Dyta: Dynamic
     symbolic execution guided with static verification results. In Proceedings
     of the 33rd International Conference on Software Engineering, ICSE
     ’11, pages 992–994, New York, NY, USA, 2011. ACM.
 [9] Patrice Godefroid, Nils Klarlund, and Koushik Sen. Dart: Directed
     automated random testing. SIGPLAN Not., 40(6):213–223, June 2005.
[10] Koushik Sen, Darko Marinov, and Gul Agha. Cute: A concolic unit
     testing engine for c. In Proceedings of the 10th European Software
     Engineering Conference Held Jointly with 13th ACM SIGSOFT Interna-
     tional Symposium on Foundations of Software Engineering, ESEC/FSE-
     13, pages 263–272, New York, NY, USA, 2005. ACM.
[11] Quentin Stiévenart, Jens Nicolay, Wolfgang De Meuter, and Coen
     De Roover. Building a modular static analysis framework in scala (tool
     paper). In Proceedings of the 2016 7th ACM SIGPLAN Symposium on
     Scala, SCALA 2016, pages 105–109, New York, NY, USA, 2016. ACM.
[12] Quentin Stiévenart, Maarten Vandercammen, Wolfgang De Meuter, and
     Coen De Roover. Scala-am: A modular static analysis framework.
     In Source Code Analysis and Manipulation (SCAM), 2016 IEEE 16th
     International Working Conference on, pages 85–90. IEEE, 2016.
[13] David Van Horn and Matthew Might. Abstracting abstract machines.
     SIGPLAN Not., 45(9):51–62, September 2010.




                                                                                  29