=Paper=
{{Paper
|id=Vol-2047/BENEVOL_2017_paper_7
|storemode=property
|title=Employing Run-time Static Analysis to Improve Concolic Execution
|pdfUrl=https://ceur-ws.org/Vol-2047/BENEVOL_2017_paper_7.pdf
|volume=Vol-2047
|authors=Maarten Vandercammen,Coen De Roover
|dblpUrl=https://dblp.org/rec/conf/benevol/VandercammenR17
}}
==Employing Run-time Static Analysis to Improve Concolic Execution==
Employing Run-time Static Analysis to Improve
Concolic Execution
Position Paper
Maarten Vandercammen Coen De Roover
Software Languages Lab Software Languages Lab
Vrije Universiteit Brussel Vrije Universiteit Brussel
Brussels, Belgium Brussels, Belgium
Maarten.Vandercammen@vub.be Coen.De.Roover@vub.be
Abstract—Dynamic symbolic execution, or concolic execution, The remainder of this paper is organized as follows. Sec-
is a program testing technique that systematically executes a tion II introduces the concept of run-time static analysis.
program with the aim of exploring all feasible program paths, Section III gives an introduction to concolic execution and
and locating and reporting all errors encountered in these paths.
However, as the complexity of the program grows, the number of explains how a concolic tester may benefit from (run-time)
program paths explodes, making it infeasible for concolic testers static analysis. Section IV briefly describes an implementation
to explore all of them. To reduce the number of paths to explore, of a prototype of this technique. Section V concludes.
several concolic testing tools have recently started employing
static analysis to prune paths guaranteed by the static analysis II. RUN - TIME S TATIC A NALYSIS
to be safe. The concolic tester must then only focus on those
paths that might contain an error, as reported by the analysis. Our proposed technique focuses on the abstracting abstract
However, due to imprecisions in the analysis’ result, the reported machines (AAM) technique, developed by Van Horn and
errors may just be false positives, and it is up to the tester to Might [13], where an abstract interpreter is systematically
verify whether a reported alarm is an actual error or merely a derived from a concrete interpreter represented as a ‘concrete’
false positive. In this position paper, we propose to increase the abstract machine written in a small-step style. AAM system-
precision of these analyses by not only performing an initial static
analysis before starting concolic testing of the program, but also atically threads all unbounded components of the concrete
by launching incremental static analyses over the program at run machine through the machine’s store and limits the amount of
time, and incorporating into the analyses run-time information store addresses that can be allocated. As a result, the resulting
observed by the tester. The increased precision that results from ‘abstract’ abstract machine, or static analysis, explores a finite
incorporating such run-time information should enable further number of program states during its evaluation of a program,
pruning of the program paths that must be explored by the
concolic tester. thus ensuring that analysis terminates.
Index Terms—Concolic Testing, Static Analysis, Blended Anal- Our run-time static analysis technique builds on the close
ysis relationship between the interpreter and its derived static
analysis. All run-time information on the program’s execution
I. I NTRODUCTION is encoded in the concrete machine’s program state. This
information is made available to the analysis by abstracting the
In recent years, there has been an interest in blending static program state in a similar manner as the abstraction of the ma-
and dynamic analyses to combine the best of both worlds. The chine itself. In general, this technique therefore first involves
results of a sound static analysis over-approximate all possible executing a program and observing the interpreter’s concrete
program executions, at the cost of precision. The results of program state. When an external client decides to commence
a dynamic analysis are precise, yet only valid for a subset static analysis, the concrete state is abstracted, and the abstract
of all program executions. In this position paper, we propose interpreter is initialized with this abstract state and runs until
a novel mechanism to blend an abstract interpretation-based it reaches a fixpoint. Figure 1 illustrates how the run-time
static analysis [5] with a dynamic analysis in the form of information enables refining static analysis. As the predicate
a concolic tester [9] [10]. The proposed blended technique depicted in Figure 1a involves a variable with a random value,
employs an initial static analysis, performed before starting the flow graph (1b) computed by a traditional static analysis
concolic execution of the program, and thereafter launches must contain both branches. However, should our run-time
static analyses at run time over the program. The run-time static analysis technique be launched after binding the variable
analyses incorporate observed run-time information to increase r to its evaluated value, and before evaluating the predicate’s
the precision of the analysis with respect to the program value, the run-time analysis has the variable’s value available,
execution from which the analysis was launched. and can produce a flow graph (1c) that precisely predicts the
26
(let ((r (random 10))) 1 (let* ((a (random 100))
(if (> r 5)) 2 (b (random 100))
(then)
(else))) (if (> r 5)) 3 (c (+ a b))
(let ((r (random 10))) (then)
(else)) 4 (d (+ 30 20)))
(if (> r 5))
(then) (if (> r 5)) 5 (if (> a d)
(else))) (then)
(else)) 6 (if (> b c))
(then) "ok"
(a) 7
(else)
8 (error))
(then) (c)
9 "ok"))
(b)
Fig. 2: An example Scheme program.
Fig. 1: A Scheme program to be analyzed (a), its initial (b),
and its refined flow graph (c). (> a d)
a0 > 50 a0 <= 50
branch to be be taken. Such a run-time analysis only soundly (> b c) "ok"
a0 > 50 ∧ a0 > 50 ∧
predicts the future execution given the current execution of the b0 > a0 + b0 b0 <= a0 + b0
program. Should the program be restarted, a different value for "ok" (error)
r may be generated, thus invalidating the result of the run-time
analysis. Fig. 3: The symbolic execution tree of the program depicted
in Figure 2.
III. C ONCOLIC T ESTING
A concolic tester iteratively explores all feasible paths in b0 so that the resulting path constraint is true. Note that, as
a program by manipulating the values of input parameters to a0 is guaranteed to be a positive number, the second conjunct
the program, such as program arguments, random numbers, b0 > a0 + b0 can never be true. Hence no values for a0 and b0
values that are read from files etc. [3] [1]. can be found so that the constraint is satisfied. The concolic
In the first iteration, the tester assigns a random value tester thus concludes that the constraint is unsatisfiable and
to these input parameters while simultaneously collecting all that the program statement at line 7 is unreachable. At this
conditional predicates it encounters into a path constraint that point, the concolic tester backtracks further into the execution
symbolically represents the conditions that must be true in and tries to negate the first conjunct in the path constraint,
order for the program to execute that particular path. After resulting in the new constraint a0 <= 50, and attempts to
completing the first run of the program, a part of the path solve this constraint, for example by assigning the value 0 to
constraint is negated and the resulting path constraint is given a0 . The concolic tester re-executes the program and assigns
to a SMT solver. If the solver can find some value for the input the value 0 to a. It completes the program by reaching the
parameters so that this new path constraint is satisfied, concolic expression at line 9. As no new branches were encountered
testing continues by restarting the program and assigning these by the tester during this last run, the tester concludes that it
newly computed values to the input parameters. This process has explored all feasible program paths and concolic testing
continues until all feasible program paths have been explored, terminates. The three path constraints that were constructed
or until the tester exceeds some given time constraint. can be collected in a symbolic execution tree that represents
We illustrate the working of a concolic tester in more detail all possible executions of the program, as shown in Figure 3.
via the example Scheme program depicted in Figure 2. In
this program, the variables a and b are considered input A. Combining Static Analysis and Concolic Testing
parameters. Hence, they are symbolically represented as the The number of program paths is exponential in the number
input variables a0 and b0 . Suppose now that in the first of conditional branches of the program. As it is therefore
iteration, the concolic tester randomly assigns 67 to a and often infeasible for concolic testers to explore more than a
31 to b. These values cause the concolic tester to execute the fraction of all possible program paths in a real-world appli-
program statement on line 8, at which point the tester reports cation, testers employ sophisticated search strategies to select
the error encountered there. Simultaneously with this concrete a program path that can be explored and that is most likely
execution, the tester also collects the symbolic representation to result in a non-trivial error [2]. Nevertheless, in practical
of the conditional predicates that were encountered at line 5 applications, concolic testers are not guaranteed to identify
and 6, into a so-called path constraint, i.e., a0 > 50 ^ b0 <= all errors located in a program. Recently, some concolic
a0 +b0 . After completing this run, the tester attempts to explore testers [8], [6], [4], [7] have started employing static analysis
another path, such as the path leading to the program statement to greatly prune the number of program paths that must be
at line 7. To do this, the solver negates the last conditional explored by the analysis. As a sound static analysis safely
predicate in the path constraint, resulting in the constraint proves the absence of errors, any program path not reported
a0 > 50 ^ b0 > a0 + b0 and feeds this resulting path to a SMT to contain an error can be avoided by the tester. However,
solver, which subsequently attempts to assign values to a0 and errors that were reported by the static analysis may actually
27
1 (define (f) (error)) at line 7, after the reassignment to h but before the paths
2 (define (g) "ok") to the calls of f and h are split. At this point, the run-time
3 (let ((a (random 10)) analysis can incorporate the knowledge that h equals the safe
4 (b (random 10)) function g, thereby avoiding the imprecision caused by the
5 (h f)) reassignment of h at line 7. In this case, the abstract state graph
6 (if (> a 5) produced by the run-time analysis, as depicted in Figure 6, is
7 (begin (set! h g) precise enough to determine that calling h will not result in
8 (if (> b 5) an error, hence this path should not be explored by the tester.
9 (f) ; error In general, a run-time analysis could be launched over the
10 (h))) ; ok program when the tester has reached a common ancestor node
11 "ok")) to two or more paths that are potentially unsafe, in the hope
Fig. 4: Another example Scheme program. that the run-time information available at that point enables
the static analysis to be precise enough to determine whether
(> a 5) (> b 5) or not any of the potentially unsafe paths can be pruned.
(> b 5) "ok" IV. I MPLEMENTATION AND F UTURE W ORK
(f) (h)
We have implemented a prototype of this technique1 in the
(f) (h)
Scala-AM framework [12], [11] to test Scheme programs.We
(error) "ok" are currently in the process of extending and improving the
(error) ??? prototype and evaluating its effectiveness. To overcome the
Fig. 6: The abstract state performance overhead incurred by performing (potentially
Fig. 5: The initial abstract graph resulting from run-time multiple) static analyses over the program at run time, we
state graph of the program static analysis of the same aim to make these analyses incremental with respect to each
depicted in Figure 4. program. other, so that one analysis can reuse part of the work computed
by another. Furthermore, we expect this technique to be most
effective when it employs simple-but-fast analyses that have a
be false positives that are the result of imprecision in the static low execution time but a high degree of imprecision, as these
analysis. The tester therefore exercises these paths in order to imprecisions can hopefully be resolved by incorporating run-
verify whether the reported error is an actual error. Note that time information.
the tester relies on the static analysis to be sound: as the tester
only explores program paths highlighted by the analysis, a V. C ONCLUSION
false negative produced by the analysis would not be caught We have proposed a novel technique for improving precision
by the tester. of a static analysis at run time, using run-time information on
Consider the program listed in Figure 4 and its abstract state the program’s execution. We propose to couple this technique
graph, produced by performing a simple initial static analysis to a concolic tester, to reduce the number of program paths
over this program before starting concolic testing, in Figure 5. that the tester must explore.
It can be seen from the state graph that the analysis considers
the path where a is not larger than 5 to be safe, the path where R EFERENCES
both a and b are larger than 5 to be unsafe, and the path
where a is larger than 5 but b is not to be potentially unsafe. [1] Roberto Baldoni, Emilio Coppa, Daniele Cono D’Elia, Camil Deme-
trescu, and Irene Finocchi. A survey of symbolic execution techniques.
The conclusion for this last path is due to imprecision that arXiv preprint arXiv:1610.00502, 2016.
arose in the static analysis: if we employ e.g., a very simple- [2] Jacob Burnim and Koushik Sen. Heuristics for scalable dynamic test
but-fast constant propagation analysis that does not precisely generation. In Automated Software Engineering, 2008. ASE 2008. 23rd
IEEE/ACM International Conference on, pages 443–446. IEEE, 2008.
track reassignments to variables, such as the reassignment to [3] Ting Chen, Xiao-song Zhang, Shi-ze Guo, Hong-yuan Li, and Yue
the function h at line 7, then the analysis cannot conclude Wu. State of the art: Dynamic symbolic execution for automated test
that calling h at line 10 is safe. As a result, any concolic generation. Future Generation Computer Systems, 29(7):1758–1773,
2013.
tester relying on this analysis can only prune the first path [4] Maria Christakis, Peter Müller, and Valentin Wüstholz. Guiding dynamic
and must still explore the other two paths. symbolic execution toward unverified program executions. In Proceed-
ings of the 38th International Conference on Software Engineering,
B. Combining Run-time Static Analysis and Concolic Testing ICSE ’16, pages 144–155, New York, NY, USA, 2016. ACM.
[5] Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified
The safe path where h is called can be pruned by employing lattice model for static analysis of programs by construction or approx-
imation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN
a run-time static analysis, in combination with an initial static Symposium on Principles of Programming Languages, POPL ’77, 1977.
analysis performed before starting testing of the program. If
the tester starts exploring the error path that results in the call 1 Available at https://github.com/mvdcamme/scala-am/tree/concolic
to f at line 9, the tester could launch a run-time static analysis prototype
28
[6] Christoph Csallner and Yannis Smaragdakis. Check ’n’ crash: Combin-
ing static checking and testing. In Proceedings of the 27th International
Conference on Software Engineering, ICSE ’05, pages 422–431, New
York, NY, USA, 2005. ACM.
[7] Kostas Ferles, Valentin Wüstholz, Maria Christakis, and Isil Dillig.
Failure-directed program trimming. In Proceedings of the 2017 11th
Joint Meeting on Foundations of Software Engineering, ESEC/FSE
2017, pages 174–185, New York, NY, USA, 2017. ACM.
[8] Xi Ge, Kunal Taneja, Tao Xie, and Nikolai Tillmann. Dyta: Dynamic
symbolic execution guided with static verification results. In Proceedings
of the 33rd International Conference on Software Engineering, ICSE
’11, pages 992–994, New York, NY, USA, 2011. ACM.
[9] Patrice Godefroid, Nils Klarlund, and Koushik Sen. Dart: Directed
automated random testing. SIGPLAN Not., 40(6):213–223, June 2005.
[10] Koushik Sen, Darko Marinov, and Gul Agha. Cute: A concolic unit
testing engine for c. In Proceedings of the 10th European Software
Engineering Conference Held Jointly with 13th ACM SIGSOFT Interna-
tional Symposium on Foundations of Software Engineering, ESEC/FSE-
13, pages 263–272, New York, NY, USA, 2005. ACM.
[11] Quentin Stiévenart, Jens Nicolay, Wolfgang De Meuter, and Coen
De Roover. Building a modular static analysis framework in scala (tool
paper). In Proceedings of the 2016 7th ACM SIGPLAN Symposium on
Scala, SCALA 2016, pages 105–109, New York, NY, USA, 2016. ACM.
[12] Quentin Stiévenart, Maarten Vandercammen, Wolfgang De Meuter, and
Coen De Roover. Scala-am: A modular static analysis framework.
In Source Code Analysis and Manipulation (SCAM), 2016 IEEE 16th
International Working Conference on, pages 85–90. IEEE, 2016.
[13] David Van Horn and Matthew Might. Abstracting abstract machines.
SIGPLAN Not., 45(9):51–62, September 2010.
29