Employing Run-time Static Analysis to Improve Concolic Execution Position Paper Maarten Vandercammen Coen De Roover Software Languages Lab Software Languages Lab Vrije Universiteit Brussel Vrije Universiteit Brussel Brussels, Belgium Brussels, Belgium Maarten.Vandercammen@vub.be Coen.De.Roover@vub.be Abstract—Dynamic symbolic execution, or concolic execution, The remainder of this paper is organized as follows. Sec- is a program testing technique that systematically executes a tion II introduces the concept of run-time static analysis. program with the aim of exploring all feasible program paths, Section III gives an introduction to concolic execution and and locating and reporting all errors encountered in these paths. However, as the complexity of the program grows, the number of explains how a concolic tester may benefit from (run-time) program paths explodes, making it infeasible for concolic testers static analysis. Section IV briefly describes an implementation to explore all of them. To reduce the number of paths to explore, of a prototype of this technique. Section V concludes. several concolic testing tools have recently started employing static analysis to prune paths guaranteed by the static analysis II. RUN - TIME S TATIC A NALYSIS to be safe. The concolic tester must then only focus on those paths that might contain an error, as reported by the analysis. Our proposed technique focuses on the abstracting abstract However, due to imprecisions in the analysis’ result, the reported machines (AAM) technique, developed by Van Horn and errors may just be false positives, and it is up to the tester to Might [13], where an abstract interpreter is systematically verify whether a reported alarm is an actual error or merely a derived from a concrete interpreter represented as a ‘concrete’ false positive. In this position paper, we propose to increase the abstract machine written in a small-step style. AAM system- precision of these analyses by not only performing an initial static analysis before starting concolic testing of the program, but also atically threads all unbounded components of the concrete by launching incremental static analyses over the program at run machine through the machine’s store and limits the amount of time, and incorporating into the analyses run-time information store addresses that can be allocated. As a result, the resulting observed by the tester. The increased precision that results from ‘abstract’ abstract machine, or static analysis, explores a finite incorporating such run-time information should enable further number of program states during its evaluation of a program, pruning of the program paths that must be explored by the concolic tester. thus ensuring that analysis terminates. Index Terms—Concolic Testing, Static Analysis, Blended Anal- Our run-time static analysis technique builds on the close ysis relationship between the interpreter and its derived static analysis. All run-time information on the program’s execution I. I NTRODUCTION is encoded in the concrete machine’s program state. This information is made available to the analysis by abstracting the In recent years, there has been an interest in blending static program state in a similar manner as the abstraction of the ma- and dynamic analyses to combine the best of both worlds. The chine itself. In general, this technique therefore first involves results of a sound static analysis over-approximate all possible executing a program and observing the interpreter’s concrete program executions, at the cost of precision. The results of program state. When an external client decides to commence a dynamic analysis are precise, yet only valid for a subset static analysis, the concrete state is abstracted, and the abstract of all program executions. In this position paper, we propose interpreter is initialized with this abstract state and runs until a novel mechanism to blend an abstract interpretation-based it reaches a fixpoint. Figure 1 illustrates how the run-time static analysis [5] with a dynamic analysis in the form of information enables refining static analysis. As the predicate a concolic tester [9] [10]. The proposed blended technique depicted in Figure 1a involves a variable with a random value, employs an initial static analysis, performed before starting the flow graph (1b) computed by a traditional static analysis concolic execution of the program, and thereafter launches must contain both branches. However, should our run-time static analyses at run time over the program. The run-time static analysis technique be launched after binding the variable analyses incorporate observed run-time information to increase r to its evaluated value, and before evaluating the predicate’s the precision of the analysis with respect to the program value, the run-time analysis has the variable’s value available, execution from which the analysis was launched. and can produce a flow graph (1c) that precisely predicts the 26 (let ((r (random 10))) 1 (let* ((a (random 100)) (if (> r 5)) 2 (b (random 100)) (then) (else))) (if (> r 5)) 3 (c (+ a b)) (let ((r (random 10))) (then) (else)) 4 (d (+ 30 20))) (if (> r 5)) (then) (if (> r 5)) 5 (if (> a d) (else))) (then) (else)) 6 (if (> b c)) (then) "ok" (a) 7 (else) 8 (error)) (then) (c) 9 "ok")) (b) Fig. 2: An example Scheme program. Fig. 1: A Scheme program to be analyzed (a), its initial (b), and its refined flow graph (c). (> a d) a0 > 50 a0 <= 50 branch to be be taken. Such a run-time analysis only soundly (> b c) "ok" a0 > 50 ∧ a0 > 50 ∧ predicts the future execution given the current execution of the b0 > a0 + b0 b0 <= a0 + b0 program. Should the program be restarted, a different value for "ok" (error) r may be generated, thus invalidating the result of the run-time analysis. Fig. 3: The symbolic execution tree of the program depicted in Figure 2. III. C ONCOLIC T ESTING A concolic tester iteratively explores all feasible paths in b0 so that the resulting path constraint is true. Note that, as a program by manipulating the values of input parameters to a0 is guaranteed to be a positive number, the second conjunct the program, such as program arguments, random numbers, b0 > a0 + b0 can never be true. Hence no values for a0 and b0 values that are read from files etc. [3] [1]. can be found so that the constraint is satisfied. The concolic In the first iteration, the tester assigns a random value tester thus concludes that the constraint is unsatisfiable and to these input parameters while simultaneously collecting all that the program statement at line 7 is unreachable. At this conditional predicates it encounters into a path constraint that point, the concolic tester backtracks further into the execution symbolically represents the conditions that must be true in and tries to negate the first conjunct in the path constraint, order for the program to execute that particular path. After resulting in the new constraint a0 <= 50, and attempts to completing the first run of the program, a part of the path solve this constraint, for example by assigning the value 0 to constraint is negated and the resulting path constraint is given a0 . The concolic tester re-executes the program and assigns to a SMT solver. If the solver can find some value for the input the value 0 to a. It completes the program by reaching the parameters so that this new path constraint is satisfied, concolic expression at line 9. As no new branches were encountered testing continues by restarting the program and assigning these by the tester during this last run, the tester concludes that it newly computed values to the input parameters. This process has explored all feasible program paths and concolic testing continues until all feasible program paths have been explored, terminates. The three path constraints that were constructed or until the tester exceeds some given time constraint. can be collected in a symbolic execution tree that represents We illustrate the working of a concolic tester in more detail all possible executions of the program, as shown in Figure 3. via the example Scheme program depicted in Figure 2. In this program, the variables a and b are considered input A. Combining Static Analysis and Concolic Testing parameters. Hence, they are symbolically represented as the The number of program paths is exponential in the number input variables a0 and b0 . Suppose now that in the first of conditional branches of the program. As it is therefore iteration, the concolic tester randomly assigns 67 to a and often infeasible for concolic testers to explore more than a 31 to b. These values cause the concolic tester to execute the fraction of all possible program paths in a real-world appli- program statement on line 8, at which point the tester reports cation, testers employ sophisticated search strategies to select the error encountered there. Simultaneously with this concrete a program path that can be explored and that is most likely execution, the tester also collects the symbolic representation to result in a non-trivial error [2]. Nevertheless, in practical of the conditional predicates that were encountered at line 5 applications, concolic testers are not guaranteed to identify and 6, into a so-called path constraint, i.e., a0 > 50 ^ b0 <= all errors located in a program. Recently, some concolic a0 +b0 . After completing this run, the tester attempts to explore testers [8], [6], [4], [7] have started employing static analysis another path, such as the path leading to the program statement to greatly prune the number of program paths that must be at line 7. To do this, the solver negates the last conditional explored by the analysis. As a sound static analysis safely predicate in the path constraint, resulting in the constraint proves the absence of errors, any program path not reported a0 > 50 ^ b0 > a0 + b0 and feeds this resulting path to a SMT to contain an error can be avoided by the tester. However, solver, which subsequently attempts to assign values to a0 and errors that were reported by the static analysis may actually 27 1 (define (f) (error)) at line 7, after the reassignment to h but before the paths 2 (define (g) "ok") to the calls of f and h are split. At this point, the run-time 3 (let ((a (random 10)) analysis can incorporate the knowledge that h equals the safe 4 (b (random 10)) function g, thereby avoiding the imprecision caused by the 5 (h f)) reassignment of h at line 7. In this case, the abstract state graph 6 (if (> a 5) produced by the run-time analysis, as depicted in Figure 6, is 7 (begin (set! h g) precise enough to determine that calling h will not result in 8 (if (> b 5) an error, hence this path should not be explored by the tester. 9 (f) ; error In general, a run-time analysis could be launched over the 10 (h))) ; ok program when the tester has reached a common ancestor node 11 "ok")) to two or more paths that are potentially unsafe, in the hope Fig. 4: Another example Scheme program. that the run-time information available at that point enables the static analysis to be precise enough to determine whether (> a 5) (> b 5) or not any of the potentially unsafe paths can be pruned. (> b 5) "ok" IV. I MPLEMENTATION AND F UTURE W ORK (f) (h) We have implemented a prototype of this technique1 in the (f) (h) Scala-AM framework [12], [11] to test Scheme programs.We (error) "ok" are currently in the process of extending and improving the (error) ??? prototype and evaluating its effectiveness. To overcome the Fig. 6: The abstract state performance overhead incurred by performing (potentially Fig. 5: The initial abstract graph resulting from run-time multiple) static analyses over the program at run time, we state graph of the program static analysis of the same aim to make these analyses incremental with respect to each depicted in Figure 4. program. other, so that one analysis can reuse part of the work computed by another. Furthermore, we expect this technique to be most effective when it employs simple-but-fast analyses that have a be false positives that are the result of imprecision in the static low execution time but a high degree of imprecision, as these analysis. The tester therefore exercises these paths in order to imprecisions can hopefully be resolved by incorporating run- verify whether the reported error is an actual error. Note that time information. the tester relies on the static analysis to be sound: as the tester only explores program paths highlighted by the analysis, a V. C ONCLUSION false negative produced by the analysis would not be caught We have proposed a novel technique for improving precision by the tester. of a static analysis at run time, using run-time information on Consider the program listed in Figure 4 and its abstract state the program’s execution. We propose to couple this technique graph, produced by performing a simple initial static analysis to a concolic tester, to reduce the number of program paths over this program before starting concolic testing, in Figure 5. that the tester must explore. It can be seen from the state graph that the analysis considers the path where a is not larger than 5 to be safe, the path where R EFERENCES both a and b are larger than 5 to be unsafe, and the path where a is larger than 5 but b is not to be potentially unsafe. [1] Roberto Baldoni, Emilio Coppa, Daniele Cono D’Elia, Camil Deme- trescu, and Irene Finocchi. A survey of symbolic execution techniques. The conclusion for this last path is due to imprecision that arXiv preprint arXiv:1610.00502, 2016. arose in the static analysis: if we employ e.g., a very simple- [2] Jacob Burnim and Koushik Sen. Heuristics for scalable dynamic test but-fast constant propagation analysis that does not precisely generation. In Automated Software Engineering, 2008. ASE 2008. 23rd IEEE/ACM International Conference on, pages 443–446. IEEE, 2008. track reassignments to variables, such as the reassignment to [3] Ting Chen, Xiao-song Zhang, Shi-ze Guo, Hong-yuan Li, and Yue the function h at line 7, then the analysis cannot conclude Wu. State of the art: Dynamic symbolic execution for automated test that calling h at line 10 is safe. As a result, any concolic generation. Future Generation Computer Systems, 29(7):1758–1773, 2013. tester relying on this analysis can only prune the first path [4] Maria Christakis, Peter Müller, and Valentin Wüstholz. Guiding dynamic and must still explore the other two paths. symbolic execution toward unverified program executions. In Proceed- ings of the 38th International Conference on Software Engineering, B. Combining Run-time Static Analysis and Concolic Testing ICSE ’16, pages 144–155, New York, NY, USA, 2016. ACM. [5] Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified The safe path where h is called can be pruned by employing lattice model for static analysis of programs by construction or approx- imation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN a run-time static analysis, in combination with an initial static Symposium on Principles of Programming Languages, POPL ’77, 1977. analysis performed before starting testing of the program. If the tester starts exploring the error path that results in the call 1 Available at https://github.com/mvdcamme/scala-am/tree/concolic to f at line 9, the tester could launch a run-time static analysis prototype 28 [6] Christoph Csallner and Yannis Smaragdakis. Check ’n’ crash: Combin- ing static checking and testing. In Proceedings of the 27th International Conference on Software Engineering, ICSE ’05, pages 422–431, New York, NY, USA, 2005. ACM. [7] Kostas Ferles, Valentin Wüstholz, Maria Christakis, and Isil Dillig. Failure-directed program trimming. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, pages 174–185, New York, NY, USA, 2017. ACM. [8] Xi Ge, Kunal Taneja, Tao Xie, and Nikolai Tillmann. Dyta: Dynamic symbolic execution guided with static verification results. In Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11, pages 992–994, New York, NY, USA, 2011. ACM. [9] Patrice Godefroid, Nils Klarlund, and Koushik Sen. Dart: Directed automated random testing. SIGPLAN Not., 40(6):213–223, June 2005. [10] Koushik Sen, Darko Marinov, and Gul Agha. Cute: A concolic unit testing engine for c. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT Interna- tional Symposium on Foundations of Software Engineering, ESEC/FSE- 13, pages 263–272, New York, NY, USA, 2005. ACM. [11] Quentin Stiévenart, Jens Nicolay, Wolfgang De Meuter, and Coen De Roover. Building a modular static analysis framework in scala (tool paper). In Proceedings of the 2016 7th ACM SIGPLAN Symposium on Scala, SCALA 2016, pages 105–109, New York, NY, USA, 2016. ACM. [12] Quentin Stiévenart, Maarten Vandercammen, Wolfgang De Meuter, and Coen De Roover. Scala-am: A modular static analysis framework. In Source Code Analysis and Manipulation (SCAM), 2016 IEEE 16th International Working Conference on, pages 85–90. IEEE, 2016. [13] David Van Horn and Matthew Might. Abstracting abstract machines. SIGPLAN Not., 45(9):51–62, September 2010. 29