6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)


    Boost Symbolic Execution Using Dynamic State
               Merging and Forking
                               Chao Zhang                                                Weiliang Yin∗          Zhiqiang Lin†
                 Beijing National Research Center for                                              HUAWEI, China
            Information Science and Technology(BNRist)                            {yinweiliang.ywliang∗ ,linzhiqiang† }@huawei.com
   School of Software, Tsinghua University, Beijing 100084, China
                zhang-chao13@mails.tsinghua.edu.cn


   Abstract—Symbolic execution has achieved wide application in               However on the other hand, path explosion is still the most
software testing and analysis. However, path explosion remains             significant bottleneck which limits scalability of symbolic
the bottleneck limiting scalability of most symbolic execution             execution. When traversing a program, each path encodes
engines in practice. One of the promising solutions to address this
issue is to merge explored states and decrease number of paths.            decisions on all its historical branches and maintains them
Nevertheless, state merging leads to increase in complexity of             in its path predicate [14]. Thus, the number of paths to be
path predicates at the same time, especially in the situation where        explored grows exponentially with the number of branches
variables with concrete values are turned symbolic and chances of          and exhaustively covering all paths could be extremely time-
concretely executing some statements are dissipated. As a result,          consuming even for a small program [15].
calculating expressions and constraints becomes much more time-
consuming and thus, the performance of symbolic execution                     State-of-the-art solutions to resolve path explosion mainly
is weakened in contrast. To resolve the problem, we propose                aim at directly decreasing number of paths or optimizing the
a merge-fork framework enabling states under exploration to                process within limited time budgets. A number of articles
switch automatically between merging mode and forking mode.                introduce various search heuristics to improve performance
First, active state forking is introduced to enable forking a              towards particular targets. For example, ideas from random
state into multiple ones as if a certain merging action taken
before were eliminated. Second, we perform dynamic merge-                  testing and evolutionary algorithms are frequently used to
fork analysis to cut source code into pieces and continuously              guide the search order of path exploration so that coverage
evaluate efficiency of different merging strategies for each piece.        gets higher in the same time [16], [17]. The second way
Our approach dynamically combines paths under exploration                  is to apply some imprecise techniques from static analysis,
to maximize opportunities for concrete execution and ease the              such as abstract interpretation and summary. These techniques
burden on underlying solvers. We implement the framework
on the foundation of the symbolic execution engine KLEE, and               calculate approximations to simplify analysis of code and de-
conduct experiments on GNU Coreutils code using our prototype              crease produced states [18], [19]. State merging is another way
to present the effect of our proposition. Experiments show up to           designed to cut down number of paths to be explored. When
30% speedup and 80% decrease in queries compared to existing               path exploration encounters a branch statement, there are two
works.                                                                     succeeding states constructed and in turn the original path is
   Index Terms—State Merging, Active State Forking, Local
Merge-fork Analysis                                                        split into to two sub-paths, corresponding to true and false case
                                                                           respectively. If the two states join at the same location later,
                                                                           they can be merged into one again, with path predicate updated
                       I. I NTRODUCTION
                                                                           using disjunction and variable expressions using ITE operation
   The innovation of symbolic execution has been proposed                  [20]. Fig.1 shows an example where there are two paths in the
over four decades ago [1], [2]. It shows great potential in                given program: 1 → 2 → 3 → 5 → 6 corresponding to true
software analysis and testing, and underlies a growing number              case of if-statement and 1 → 2 → 4 → 5 → 6 to false
of tools which are widely applied in practical cases [3]–[7]. For          case. Succeeding states of 3 and 4 are merged at 5, with path
example, in [5], [6], they apply symbolic execution to greatly             predicate set to n < 0 ∨ ¬(n < 0) or simply T RU E, and
improve the performance of traditional fuzz testing [8], [9]. In           expression of ret set to IT E(n < 0, −n, n).
[7], they incorporate symbolic execution as a bridge to reduce                State merging is intended to decrease the number of paths
the false positive of static analysis [10] and the false negative          to be explored while no imprecision is introduced. Meanwhile,
of dynamic analysis and run-time verification [11]The key                  it faces the challenge that path predicates and variable expres-
idea behind symbolic execution is to replace input values with             sions may get more complex. Constraints involving disjunction
symbols, update symbolic expressions for program variables,                and IT E (actually another form of disjunction) are extremely
and calculate path predicates along explored execution paths               unfriendly to underlying solvers [21]. Such negative impact
to determine their feasibility [12], [13]. In this way, symbolic           on performance is rather serious when expressions which can
execution simulates how a program runs and explore feasible                be evaluated concretely turn symbolic after merging. Modern
paths in an efficient way.                                                 symbolic execution engines commonly adopt concrete execu-


      Copyright © 2018 for this paper by its authors.                 14
                   6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)

                                                                         state merging. Section III introduces active state forking,
                                                                         followed by description of dynamic merge-fork analysis in
                                              1                          Section IV. Next we display experimental evaluation of a
    int abs(int n) {                              int ret = 0;           prototype implementing our merge-fork framework in Section
      int ret = 0;                                                       V. Finally we give our conclusion in Section VI.
                                 if (n < 0)   2      if (n < 0)
      if (n < 0)
                                n < 0, true         n < 0, false
        ret = -n;                                                             II. S YMBOLIC E XECUTION AND S TATE M ERGING
      else                             3               4
        ret = n;                                                           In this section, we first present a general process of symbolic
                                  ret = ‐n;          ret = n;
      return ret;                                                        execution and then have a discussion on state merging.
                                              5
    }                                             return ret;            A. Classical Symbolic Execution
                                              6                             Symbolic execution starts from the program entry and
                                                                         keeps updating variable expressions and path predicates to
                                                                         systematically explore feasible paths. In this way the executor
            Fig. 1: An example of state merging.                         can reveal code reachability, find bugs or generate test inputs.
                                                                         Descriptions of a general work-flow of classical symbolic
                                                                         execution techniques are given in many works [14], [15].
tion when code to be interpreted only involves concrete values.          We refer to these works and present a general algorithm in
Improper merging actions may dismiss chances of concrete                 Algorithm 1. To be concise but without loss of generality,
execution and lead to extra solving cost, which eliminates the           we consider a language made up of three kinds of statements,
benefit of decrease in path scale in reverse [20]. Actually, a           i.e. assignments, conditional branches and exit statements with
merging action will speed up symbolic execution or not, or               exit code to indicate normal termination or failed assertion/er-
even speed down it, is determined by the future code which is            ror. By unwinding loops and inlining function calls we can
still unexplored. Kuznetsov et al. have proposed an estimation           transform a program to this form. In the paper, all programs
algorithm in [14] to statically find variables frequently used           used to describe our method are in such form if not explained
in branch conditions and prevent any merging action erasing              particularly.
concrete values of these variables. In this way, state merging              In symbolic execution, a state of the program is usually
is selectively performed to exclude inefficient cases. We are            encoded as a triple (l, P, M ) where l refers to the current
inspired by this work and try to improve merging strategies              program location, P to path predicate corresponding to the
to handle more complicated situations. We build a framework              path from the entry to l, and M to symbolic store mapping
dynamically merging and forking states according to the code             from variables to their current expressions. The algorithm
under execution. During execution, combinations of paths vary            keeps taking out a state from W orklist, extending the state
to keep chances of concrete execution while exploration of               and pushing newly found ones back to W orklist. Call to
states and paths are joint as much as possible.                          selectState at line 3 determines which state is chosen to
   Contributions. In this paper, we present a framework                  be visited next, and thus implies the searching strategies of
boosting symbolic execution by automatically switching be-               path exploration. The chosen state is then updated according
tween merging states together and forking them separately                to the next statement just after its location (line 5-21). Call to
throughout path exploration. We implement a prototype on the             eval and checkCond in this procedure symbolically calculates
foundation of KLEE [22] and conduct experiments on GNU                   expressions and checks satisfiability.
Coreutils code to evaluate the method. Our main contributions               Code at line 22-30 of Algorithm 1 is designed for merging
are:                                                                     states. For a newly found state s, if there is another state
   • We introduce active state forking to eliminate a historical         s0 in W orklist sharing the same program location, s could
      merging action and restore the state into original separate        be merged with s0 rather than directly added to W orklist.
      ones. By integrating symbolic execution with state merg-           Definition of shouldM erge makes decisions on whether to
      ing and forking, we can change the way how states are              merge and thus realizes strategies of state merging.
      combined and paths are explored at any moment during
      execution.                                                         B. State Merging: SSE vs. DSE
   • We present dynamic merge-fork analysis to evaluate                     By defining selectState and shouldM erge differently, the
      performance trade-offs for different path combinations.            classical symbolic execution algorithm presented in Section
      Decisions are made by estimating chances of concrete               II-A acts as two extremes in regard to design space of
      execution and efforts in reasoning future statements. The          state merging. Dynamic symbolic execution (DSE, adopted
      source code is cut into pieces so analysis is simplified           in [3], [22]–[24] etc.) skips merging and every execution state
      and conclusions can be drawn in each piece locally.                encodes a unique path from the entry to its corresponding
   The remainder of this paper is organized as follows. In               location. Static symbolic execution (SSE) ensures all paths
Section II we have an overview of symbolic execution and                 joining at each junction node in the control flow structure


     Copyright © 2018 for this paper by its authors.                15
                   6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)

Algorithm 1 Classical symbolic execution                                  from concrete execution.
 1: W orklist = {(l0 , true, M0 )}                                           Taking both SSE and DSE into consideration, it requires a
 2: while W orklist 6= ∅ do                                               compromise of the two extremes to use state merging to deal
 3:   (l, P, M ) = selectState(W orklist)                                 with path explosion in real code. While dynamic techniques
 4:   W orklist = W orklist \ {(l, P, M )}                                remain to keep symbolic execution benefiting from concrete
 5:   switch nextInst(l) do                                               execution and heuristic search order, states are selectively
 6:      case v := e:                                                     merged to eliminate duplicated efforts exploring paths joining
 7:         M 0 = M \ {(v, M [v])} ∪ {(v, eval(e, M ))}                   at a certain location. The work of [14] tries to introduce state
 8:         S = {(succ(l), P, M 0 )}                                      merging in DSE and switch between different searching strate-
 9:      case if cond goto l0 :                                           gies. The executor keeps discovering merging opportunities
10:         sat = checkCond(cond, P, M )                                  with positive effects on performance and prioritizes visit of
11:         if sat == CON ST T RU E then                                  involving states. Veritesting [27] proposes combination of SSE
12:            S = {(l0 , P, M )}                                         and DSE by changing execution mode in process. SSE makes
13:         else if sat == CON ST F ALSE then                             code analyzed one-pass instead of visiting all feasible paths
14:            S = {(succ(l), P, M )}                                     separately in DSE. In this way the executor is able to benefit
15:         else if sat == N ON CON ST then                               from solvers and boost path exploration.
16:            S = {(l0 , P ∧ eval(cond, M ), M ), (succ(l), P ∧             Our proposition tries to mix SSE and DSE on the foundation
               ¬eval(cond, M ), M )}                                      of [14]. In this article, Kuznetsov et al. have constructed
17:         end if                                                        a framework to estimate whether it is worth driving path
18:      case exit:                                                       exploration towards a potential merging action. They present
19:         S=∅                                                           a static bottom-up analysis to find variables which may be
20:         printP ath()                                                  frequently used later and suggest merging actions with no
21:      end switch                                                       impact on values of these variables. Path exploration is then
22:   for all s = (l, P, M ) ∈ S do                                       under control of an SSE-like module temporally instead of a
23:      for all s0 = (l0 , P 0 , M 0 ) ∈ W orklist do                    normal DSE process. We take a more complicated case into
24:         if l = l0 ∧ shouldM erge(s, s0 ) then                         consideration where evaluation on a merging action varies
25:            W orklist = W orklist \ {s0 }                              across the code. A static decision - either merging or not
26:            M 00 = {(v, IT E(P, M [v], M 0 [v]))|(v, M [v]) ∈          - improves efficiency for some pieces of code but burdens
               M}                                                         performance for others. Actually, an optimal solution usually
27:            S = S \ {s} ∪ {(l, P ∨ P 0 , M 00 )}                       consists of multiple parts which are drawn in different pieces
28:         end if                                                        of code locally and combined together dynamically.
29:      end for
                                                                                          III. ACTIVE S TATE F ORKING
30:   end for
31:   W orklist = W orklist ∪ S                                              In this section, we depict our active state forking technique.
32: end while                                                             First, we use a series of examples to explain motivation for
                                                                          forking and then describe definitions and method of it.
                                                                          A. Motivating Examples
are merged completely, which is common in symbolic model                     As is discussed in last chapter, influence of a certain
checking and verification condition generation [25], [26].                merging action varies throughout different pieces of code. We
   SSE systematically explores all feasible control paths with            use some example code in Fig.2 to explain the issue in detail
all possible input values taken into consideration [20]. It visits        and present effect of active state forking. To be concise, some
states in topological order to ensure that all corresponding              function calls are not inlined.
states are merged at every junction node in control flow of the              In function f 1 state merging dismisses the opportunity of
program. However, its major trade-offs are great difficulties in          concrete execution. There are two if-blocks in f 1, namely
calculating path predicates and strict restrictions on searching          line 3 and line 4-5. State forks and path splits when reaching
strategies. DSE techniques are then proposed to interleave con-           the first if-block (this is passive state forking caused by non-
crete and symbolic execution and focus on exploring possible              concrete if-condition), and then two sub-paths join after it.
execution paths one by one. A statement can be executed                   Expression of f lag gets different along two sub-paths, and
concretely rather than symbolically if all involving variables            is instantly used in if-condition at line 4. Expression of f lag
are concrete. For an assignment, the result is calculated                 is 1 along true case of the first if-block, 0 along false case,
directly if possible. For a conditional branch if cond goto l0 ,          and IT E(a == b, 1, 0) if states are merged before line 4. It
concrete execution is performed if cond can be determined                 can be seen that both the two separate states lead concrete
concretely. As a result, interpreting the statement does not              execution on the second if-block since its condition can be
need to split the current path. DSE techniques such as [3],               decided concretely, but the merged one erases concrete value
[23], [24] introduce concrete inputs to make analysis benefit             of f lag and results in passive state forking. Therefore, the


      Copyright © 2018 for this paper by its authors.                16
                   6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)

          1   void f1(int a, int b) {                                     possible to change how paths are combined during exploration.
          2     int flag = 0;                                             To maintain information of historical merging actions and
          3     if (a == b) flag = 1;                                     support forking states, we make extensions to the formalism
          4     if (flag) g1();                                           in classical symbolic execution.
          5     else g2();                                                   A DSE method as shown in Algorithm 1 conducts path
          6     exit(0);                                                  exploration by visiting execution states and creating new ones.
          7   }                                                           This process can be efficiently represented as a graph in
          8                                                               which nodes are execution states and edges indicate transitions
          9   void f2(int a, int b) {                                     between them. Since DSE does not merge states, such a graph
         10     int flag = 0;                                             would be a tree, which is so called symbolic execution tree.
         11     if (a == b) flag = 1;                                     Definitions of execution state and symbolic execution tree are
         12     if (flag) g1();                                           given in Definition 1 and 2.
         13     else g2();                                                   Definition 1: An execution state is defined as a tuple es =
         14     g3(a, b);                                                 (l, P, M ) where
         15   }                                                              - l ∈ L is the current program location.
         16                                                                  - P ∈ F(Σ, Const) is the path predicate, which is a formula
         17   void f3(int a, int b) {                                     over symbolic inputs Σ and constants Const encoding all
         18     int flag = 0;                                             decisions of branches along the path from entry to l.
         19     if (a == b) flag = 1;                                        - M : V → F(Σ, Const) is the symbolic store, which
         20     g3(a, b);                                                 maps a program variable to the formula corresponding to its
         21     if (flag) g1();                                           expression.
         22     else g2();                                                   Definition 2: A symbolic execution tree is a tree T =
         23   }                                                           (ES, T ) where
                                                                             - ES is the set of execution states.
                 Fig. 2: Motivating examples.                                - T ⊆ ES × ES is the set of transitions between execution
                                                                          states.
                                                                             Symbolic execution tree records every explored state and
merging action after the first if-block just makes expressions            path, and is widely used to characterize the process of sym-
and path predicate more complex while number of paths to                  bolic execution [25]. Our extended formalism is built on top
explore is not changed.                                                   of it. While state merging is performed, an execution state
   In function f 2, exit(0) at line 6 in f 1 is replaced with             could encode multiple nodes in the corresponding symbolic
another function call g3(a, b). While things go the same as               execution tree. Such relations naturally extract information of
for f 1, it is required to reason about call to g3 on each                merging actions, and thus, we introduce virtual states to hold
path. If paths are merged before line 14, call to g3 will be              the similar structure. Definition 3 presents how virtual states
analyzed only once rather than once along each path. Hence,               are defined.
it is expected that states forked in the first if-block should be            Definition 3: A virtual state is defined as a tuple vs =
merged at the exit of the second if-block.                                (es, CV ) where
   For function f 3, call to g3 goes before the second if-                   - es ∈ ES is the corresponding execution state.
block. As is discussed for f 1 and f 2, we can make a similar                - CV ⊆ V × Const is the set of variables with concrete
conclusion that states forked in the first if-block should be             values together with their values.
merged immediately to avoid repeated efforts reasoning about                 When two execution states are merged to be one, we create
call to g3, but later ”restored” to two separate paths with               virtual state for each sub-state respectively and map them
concrete value for f lag along each. Common methods try                   to the produced state. Every virtual state tracks its exclusive
to compare advantages and disadvantages to reveal a static                concrete variables. Our formalism then constructs transitions
decision on state merging, and our approach, in another way,              between execution states and virtual states, resulting in a
makes it possible to fork a merged state as if a historical               directed acyclic graph. An execution state contains data of path
merging action were not taken. In f 3, paths are made to join             predicate and symbolic store, and is used to actually execute
after line 19 and fork actively before line 21 with f lag restored        statement. It appears as a node in the graph only if it never
to concrete along each forked path. In this way, the benefit              experiences merging or we are not concerned with its historical
of state merging is maximized while its negative impact on                merging actions anymore. Otherwise, its corresponding virtual
concrete execution is avoid as far as possible.                           states are used instead to make up the graph. Definition 4
                                                                          describes our extended execution graph formally.
B. Merge and Fork States with Extended Execution Graph                       Definition 4: An extended execution graph is a directed
   The motivating examples figure out that an optimal merging             acyclic graph G = (S, T, ψ, ∆) where
strategy may promote contrast decisions for different pieces of              - S ⊆ ES ∪ V S is the set of states.
code. Active state forking together with state merging makes it              - T ⊆ S × S denotes transitions between states.


      Copyright © 2018 for this paper by its authors.                17
                   6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)

   - ψ : ES → 2V S maps a execution state to the set of its                influence any concrete variable and hence, will never invoke
corresponding virtual states.                                              active forking.
   - ∆ : V S → F(Σ, Const) maps a virtual state to its newly                  Forking. For an execution state es in G, passive forking by
added conditions compared to its predecessor’s path predicate.             conditional branch is the same as in traditional DSE. Other-
   While execution states hold essential data and drive execu-             wise, forking action is applied to every sub-state vs ∈ ψ(es)
tion to explore paths, virtual states simulate the structure of            and ∆ is updated with branch decision accordingly for forked
symbolic execution tree and preserve merging history. If there             productions.
is no further operation on virtual state, our extended execution              In the case of active forking aimed at eliminating historical
graph would degenerate into a tree. However, virtual states                merging actions, the work flow gets more complicated: it is
are merged and swapped for execution states when updates                   required to partitioning sub-states, constructing path predicates
on variables efface complexity introduced by merging. In the               and updating expressions.
remaining part of this section, we explain in detail how to                   The execution state es to be forked encodes multiple sub-
perform state merging and forking with our extended execution              states which are represented by virtual states in ψ(es). Active
graph.                                                                     forking is invoked because some variables are turned from
   Merging. When trying to merge es1 = (l, P1 , M1 ) and                   concrete to symbolic by merging actions. Thus, the first step
es2 = (l, P2 , M2 ) into es3 = (l, P1 ∨ P2 , M ) where M (v) =             is to identify concerned variables. Every virtual state vs ∈
IT E(P1 , M1 (v), M2 (v)) for each v ∈ V , we first create                 ψ(es) where these variables are concrete is separated to be a
virtual states and then update the graph G = (S, T, ψ, ∆).                 forked state. For sake of efficiency, states with same concrete
   If es1 appears as a node in G (in other word es1 ∈ S and                values for these variables can be joined. In this way, all sub-
ψ(es1 ) = ∅), we create a virtual state directly to represent              states of es are partitioned into several groups and each group
its corresponding sub-state encoded by es3 : vs = (es3 , CV ).             corresponds to a potential forked state.
CV denotes exclusive concrete variables of this sub-state, so                 For each sub-state vs, its path predicate can be calculated
it is made up of variables concrete in es1 but not in es2 . That           by backtracking to its ancestor execution state and combin-
is, CV = χ(M1 ) \ χ(M2 ) where χ(M ) = {(v, M (v)|M (v) ∈                  ing with newly added conditions denoted by ∆ along the
F(Const)}.                                                                 path. The results are used to construct path predicate for
   If es1 encodes multiple sub-states so that es1 ∈        / S or          each potential forked state and decide IT E selections of
ψ(es1 ) 6= ∅, we create a virtual state for every sub-state in             variables expressions. The original state es is then forked
ψ(es1 ) and add up CV of sub-state and exclusive concrete                  into several execution states as if some historical merging
variables of es1 to assign to newly created one.                           actions were eliminated. Besides, it is remarkable in real
   Creating virtual states for es2 is in the same way, except that         case that optimization of expressions and constraints based
exclusive concrete variables of es2 is calculated as χ(M2 ) \              on data structure, constant propagation, query cache and other
χ(M1 ).                                                                    practical techniques is adopted to cut down efforts and boost
   All these generated virtual states are then added to G,                 speed of calculation in state forking.
together with corresponding transitions. ψ(es3 ) is set to be
all these new states. ∆ of them can be simply set to empty                           IV. DYNAMIC M ERGE - FORK A NALYSIS
since there is no change between their and their predecessors’               Active state forking makes it possible to raise dynamic
path predicates.                                                           merging strategies, and next task is to identify ”good” merging
   Update of Assignment. Updating an execution state es in G               actions. In this section, we will explain how to identify
can be done immediately. However, if es ∈      / S, set of concrete        possibilities of concrete execution and how to make decisions
variables of each virtual state vs = (es, CV ) ∈ ψ(es) requires            on merging by analyzing code piece by piece.
update. Those not concrete or not exclusively concrete any-
more should be removed from concrete variables, while those                A. Identify Concrete Values and Concrete Execution
becoming exclusively concrete should be added to concrete                     State merging avoids repeated efforts in analysis but may
variables.                                                                 burden underlying solvers as trade-off. Considering examples
   E.g. for assignment v := e, (v, cv ) ∈ CV is removed if e               in Fig.2, the ideal solution would be that we can selectively
is not concrete, or it is but calculation does not involve any             merge states so that 1) variables turned symbolic are used to
concrete variables in CV . Otherwise v and its value is added              calculate future if-conditions as little as possible; 2) merging
to CV if there is no (v, cv ) ∈ CV , and e is concrete and                 actions with no influence on future if-conditions are as many
depends on certain variables and values in CV .                            as possible. The former prevents state merging from being
   When updating virtual states, ones might point to the                   obstacle to concrete execution while the latter indicates greater
same execution state and possess equal concrete variables                  benefit from merging.
and values. These states can be merged in the graph since                     A bottom-up estimation identifying variables frequently
distinguishing such sub-states makes no sense to merging and               used is presented in [14]. The basic idea is tracking variables
forking. If all virtual states of an execution state are merged,           used in if-conditions along use-def chain. Our method also
we can swap the production for the execution state itself,                 pays attention to use in if-condition, but works in top-down
which means that all its historical merging actions do not                 order instead because analysis will be break down in different


      Copyright © 2018 for this paper by its authors.                 18
                   6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)

pieces of code. A list of variables with concrete values is                                                      100
maintained and updated when executing. Thus, we can identify
possible concrete values and in turn, find out chances of                                                         10
concrete execution.


                                                                                  Path exploration ratio (%)
                                                                                                                   1
B. Local Merge-fork Analysis
   Active state forking makes it possible to redraw decisions                                                     0.1
on merging actions, so we can cut a program into pieces
                                                                                                                 0.01
and analyze code piece by piece. The division is randomly
made under these rules: difference in scale between pieces                                                                          echo
                                                                                                                0.001
is limited within a proper bound; cut points are located at                                                                         shuf
                                                                                                                                    seq
exit of if-blocks, end of inlined functions, or terminators of                                                 0.0001
sequential blocks. And a piece is divided further if it contains                                                        Locations
greatly different parts, e.g. variables used in one part are
totally disjoint with ones in another. Nevertheless, too detailed        Fig. 3: Ratio of explored paths for 3 representative programs.
division achieves little increase in performance while goes
time-consuming, so we limit the times of further division.
   After code is divided into pieces, we can make decision for           path exploration with no approximation, and path feasibility is
states whether they should be merged or not in a piece. There            checked at every conditional branch. State forking is invoked
are still situations where different parts of code have opposite         for each non-determined branch, and states at the same control
votes for a merging action in the same piece. To resolve the             location are never merged. KLEE guides exploration with a
issue, we propose another two types of decision: merge until             number of optional search strategies including random search,
a location and merge after a location. The former performs               traditional breadth/depth first search, strategy prioritizing vis-
merging temporally and invokes analysis again later, while the           iting previously unexplored code, and combination of various
latter make a delay to merging. Because of these choices of              strategies.
merging decision, analysis of how to combine paths is invoked               KLEE has built in models of execution state and symbolic
at junction node in control flow structure, at the beginning of          execution tree. We extend them to implement our active
each piece, and at locations specified by ”merge until” and              state forking method. Thanks to implementation of expres-
”merge after” decisions.                                                 sions and optimization of constraints in KLEE, it is easy
   Hence, our merge-fork analysis determines locally how to              to track updates of path predicates and we could perform
handle each merging opportunities within a piece of code. For            state forking by directly imposing conditions of dismissed
the simplified language used in our discussion, this process can         merging action on path predicates and symbolic store. To
be implemented by counting occurrences of if-condition and in            support dynamic merge-fork analysis, we make a call in state
turn estimating number of queries needed. Besides, unresolved            selection (selectState) to an LLVM Pass constructing con-
function call and code block difficult to reason can also be             crete dependence graph and estimating merging effects locally.
counted to influence strategy of merging and forking. When               If merging is suggested, search strategy is temporally fixed
decided to merge two states, path exploration is temporally              to continuously extending paths to target merging location.
taken over from searching heuristics and involving states are            Otherwise, states are explored according to original search
prioritized to visit.                                                    strategies of KLEE.
                                                                            The original work of KLEE presents an experiment on GNU
               V. E XPERIMENTAL E VALUATION                              Coreutils programs which are widely used UNIX command-
   In this section we illustrate some experimental evaluation on         line utilities. There are 101 files in total adding up to over
our proposition. We build a prototype to implement our merge-            70,000 lines of code. We test these programs using our
fork techniques (Section V-A), and run it with Coreutils pro-            prototype and collect information from output files.
grams to analyze performance in two scenarios: exhaustively
exploring all feasible paths (Section V-B) and incomplete                B. Speedup Path Exploration
exploration within certain time budget (Section V-C). Besides,              Our merge-fork framework is aimed at identifying valuable
we also make a comparison with previous work of Kuznetsov                merging opportunities and conducting state merging, so we are
et al. [14] (Section V-D).                                               first concerned with the influence of our method on number
                                                                         of explored paths.
A. Prototype                                                                A direct comparison is measuring number of explored paths
  To evaluate the effect of our merge-fork framework, we                 with different time spent. However, the number is hard to
implement active state forking and dynamic merge-fork anal-              figure out since there are paths explored partially if exhaustive
ysis in a prototype on the foundation of the famous symbolic             exploration is not completed. We therefore choose to calculate
execution engine KLEE. It takes LLVM bitcode compiled from               the number of paths at a specific location instead of with some
source code as input. The original executor of KLEE performs             time spent during the execution. The size of symbolic inputs


     Copyright © 2018 for this paper by its authors.                19
                                                                     6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)

                                                       3600                                                                                                100


         Completion time of merge-fork prototype (s)
                                                                                                                                                            80
                                                        500


                                                                                                                                 Decrease in queries (%)
                                                                                                                                                            60


                                                         50                                                                                                 40

                                                                                                                                                            20

                                                             5
                                                                 5             50               500         3600                                             0
                                                                        Completion time of plain KLEE (s)                                                        Files

Fig. 4: Compare completion time with plain KLEE for ex-                                                                       Fig. 7: Decrease in queries comparing with [14].
haustive exploration tasks.
                                                                                                                        efficiently joins branches and decrease paths. It is remarkable
                                                       15                                                               that paths getting through a certain location might enclose a
                                                                                                                        large gap in time.
         Statement coverage increase (%)


                                                       10                                                                   Fig.3 shows the growth of path exploration ratio correspond-
                                                                                                                        ing to program locations for 3 representative files. The final
                                                                                                                        ratio is obtained at the exit of code. It can be seen that the
                                                        5
                                                                                                                        effect in decreasing paths varies among the programs. While
                                                                                                                        a large part of paths are merged in echo, state merging makes
                                                        0                                                               little influence on seq. The trend of growth indicates structure
                                                                                                                        of control flow and involvement of active state forking. Since
                                                        -5                                                              merging and forking work simultaneously, change in path
                                                                                                                        numbers appear smooth.
                                                                                                                            The target of decreasing paths is to free underlying solvers’
                                                       -10
                                                                                      Files                             burden and achieve faster execution by omitting duplicated
                                                                                                                        analysis efforts. Hence, we run Coreutils under symbolic
Fig. 5: Compare statement coverage with plain KLEE for                                                                  inputs with growing size and compare completion time of plain
uncompleted exploration tasks.                                                                                          KLEE and our prototype. In order to compare with [14] easily,
                                                                                                                        we draw a similar graph Fig.4 showing the result within 1 hour
                                                       40                                                               time limit. The black dots correspond to experiment instances
                                                                                                                        and the gray disks indicate the relative size of the symbolic
                                                                                                                        inputs used in each instance. The result presents that in the
                                                       20                                                               majority of cases, our prototype achieves positive speedup over
                                                                                                                        KLEE. The rightmost dots represent those exceed time limit
         Speedup (%)


                                                                                                                        in KLEE but can be solved completely in our framework.
                                                        0
                                                                                                                        C. Exploration Guided by Statement Coverage
                                                                                                                           We now look into the situation where tasks can not be
                                                       -20                                                              finished completely. We choose a proper size of symbolic
                                                                                                                        inputs to keep the engine busy and measure statement coverage
                                                                                                                        in limited time. Plain KLEE uses coverage-oriented search
                                                       -40                                                              strategy to guide path exploration towards uncovered locations.
                                                                                      Files
                                                                                                                        Our merge-fork framework would interrupt the process and
    Fig. 6: Speedup over efficient state merging of [14].                                                               temporally drive exploration towards merging states. Fig.5
                                                                                                                        shows its influence on statement coverage. While state merging
                                                                                                                        saves analysis efforts, there are more paths can be visited in
is properly set to make it possible for exhaustive exploration,                                                         some cases, which leads to increase in coverage. However,
so the accurate number of feasible paths of the program is                                                              other cases suffer from change in search order and in turn
known. Then we run our prototype and count paths at given                                                               coverage is decreased. Data in Fig.5 confirms that estimation
control locations. In this way we can reveal how state merging                                                          of merging opportunities and fast-forwarding some states do


     Copyright © 2018 for this paper by its authors.                                                               20
                      6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)

not cause any obvious troubles to the original exploration goal                        [6] J. Liang, M. Wang, Y. Chen, Y. Jiang, and R. Zhang, “Fuzz testing in
of searching heuristics.                                                                   practice: Obstacles and solutions,” in 2018 IEEE 25th International Con-
                                                                                           ference on Software Analysis, Evolution and Reengineering (SANER).
D. Compare to Existed Work                                                                 IEEE, 2018, pp. 562–566.
                                                                                       [7] C. Wang, Y. Jiang, X. Zhao, X. Song, M. Gu, and J. Sun, “Weak-assert:
   While our proposition is on the foundation of the previous                              a weakness-oriented assertion recommendation toolkit for program anal-
work of Kuznetsov et al. [14], we make some comparison                                     ysis,” in Proceedings of the 40th International Conference on Software
                                                                                           Engineering: Companion Proceeedings. ACM, 2018, pp. 69–72.
with their experiment results. All the data is obtained from                           [8] J. GLiang, M. Wang, Y. Chen, and Y. Jiang, “Pafl: Extend fuzzing
their official website http://cloud9.epfl.ch. Since experimental                           optimizations of single mode to industrial parallel mode,” 2018.
environment and KLEE version may vary, all the data used in                            [9] J. Guo, Y. Jiang, Y. Zhao, Q. Chen, and J. Sun, “Dlfuzz: Differential
                                                                                           fuzzing testing of deep learning systems,” 2018.
our comparison is increase/decrease over our respective KLEE                          [10] J. Gao, X. Yang, Y. Fu, Y. Jiang, and J. Sun, “Vulseeker: a semantic
baseline.                                                                                  learning based vulnerability seeker for cross-platform binary,” in Pro-
   First we compare the completion time under the same size of                             ceedings of the 33rd ACM/IEEE International Conference on Automated
                                                                                           Software Engineering. ACM, 2018, pp. 896–899.
symbolic inputs. Fig.6 illustrates the result. We use completion                      [11] Y. Jiang, H. Song, Y. Yang, H. Liu, M. Gu, Y. Guan, J. Sun, and
time of plain KLEE to normalize data of both tools and then                                L. Sha, “Dependable model-driven development of cps: From stateflow
calculate the effect of speedup. Over 80% of the tasks get                                 simulation to verified implementation,” ACM Transactions on Cyber-
                                                                                           Physical Systems, vol. 3, no. 1, p. 12, 2018.
faster with our merge-fork method and average speedup is                              [12] C. Cadar and K. Sen, “Symbolic execution for software testing: three
around 10%. Most instances with negative speedup are finished                              decades later,” Communications of the ACM, vol. 56, no. 2, pp. 82–90,
in a rather short time (101 magnitude in seconds).                                         2013.
   We also collect queries generated during the execution.                            [13] C. Cadar, P. Godefroid, S. Khurshid, C. S. Păsăreanu, K. Sen, N. Till-
                                                                                           mann, and W. Visser, “Symbolic execution for software testing in prac-
Fig.7 shows the comparison. The number of queries is also                                  tice: preliminary assessment,” in Proceedings of the 33rd International
normalized against data from plain KLEE. It confirms the                                   Conference on Software Engineering. ACM, 2011, pp. 1066–1071.
effect of active state forking since some queries are omitted                         [14] V. Kuznetsov, J. Kinder, S. Bucur, and G. Candea, “Efficient state
                                                                                           merging in symbolic execution,” Acm Sigplan Notices, vol. 47, no. 6,
on separate paths by concrete execution. The improvement in                                pp. 193–204, 2012.
performance is achieved through decreasing paths via state                            [15] Q. Yi, Z. Yang, S. Guo, C. Wang, J. Liu, and C. Zhao, “Eliminating path
merging and invoking concrete execution by restoring merged                                redundancy via postconditioned symbolic execution,” IEEE Transactions
                                                                                           on Software Engineering, 2017.
paths when needed.                                                                    [16] T. Xie, N. Tillmann, J. de Halleux, and W. Schulte, “Fitness-guided path
                                                                                           exploration in dynamic symbolic execution,” in Dependable Systems
                      VI. C ONCLUSIONS                                                     & Networks, 2009. DSN’09. IEEE/IFIP International Conference on.
   In symbolic execution, state merging decreases the number                               IEEE, 2009, pp. 359–368.
                                                                                      [17] J. Burnim and K. Sen, “Heuristics for scalable dynamic test generation,”
of paths but eliminates the benefit of concrete execution in                               in Automated Software Engineering, 2008. ASE 2008. 23rd IEEE/ACM
contrast. To resolve the conflict and improve the performance,                             International Conference on. IEEE, 2008, pp. 443–446.
we proposed a merge-fork framework to enable a bi-directional                         [18] H. Wang, T. Liu, X. Guan, C. Shen, Q. Zheng, and Z. Yang, “De-
                                                                                           pendence guided symbolic execution,” IEEE Transactions on Software
switch for states to be merged and forked. State merging and                               Engineering, vol. 43, no. 3, pp. 252–271, 2017.
active state forking can make state and path combinations fit                         [19] T. Chen, X.-s. Zhang, S.-z. Guo, H.-y. Li, and Y. Wu, “State of the
different code dynamically. This in turn makes it possible                                 art: Dynamic symbolic execution for automated test generation,” Future
                                                                                           Generation Computer Systems, vol. 29, no. 7, pp. 1758–1773, 2013.
to divide code into pieces and dynamically draw decision                              [20] T. Hansen, P. Schachte, and H. Søndergaard, “State joining and splitting
on state merging piece by piece. Experiment on a prototype                                 for the symbolic execution of binaries,” in International Workshop on
trivially displays the potential of our approach. While explo-                             Runtime Verification. Springer, 2009, pp. 76–92.
                                                                                      [21] V. Ganesh and D. L. Dill, “A decision procedure for bit-vectors and ar-
sive development in dynamic symbolic execution and search-                                 rays,” in Proceedings of the 19th International Conference on Computer
based techniques has a huge impact on classical state merging                              Aided Verification, ser. CAV’07, 2007, pp. 519–531.
methodology, it shows promise combining our approach with                             [22] C. Cadar, D. Dunbar, D. R. Engler et al., “Klee: Unassisted and auto-
                                                                                           matic generation of high-coverage tests for complex systems programs.”
these techniques and making state merging an efficient method                              in OSDI, vol. 8, 2008, pp. 209–224.
to solve practical issues in symbolic execution.                                      [23] C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler,
                                                                                           “Exe: automatically generating inputs of death,” ACM Transactions on
                              R EFERENCES                                                  Information and System Security (TISSEC), vol. 12, no. 2, p. 10, 2008.
 [1] L. A. Clarke, “A system to generate test data and symbolically execute           [24] P. Godefroid, M. Y. Levin, D. A. Molnar et al., “Automated whitebox
     programs,” IEEE Transactions on software engineering, no. 3, pp. 215–                 fuzz testing.” in NDSS, vol. 8, 2008, pp. 151–166.
     222, 1976.                                                                       [25] S. Khurshid, C. S. Păsăreanu, and W. Visser, “Generalized symbolic
 [2] J. C. King, “Symbolic execution and program testing,” Communications                  execution for model checking and testing,” in International Conference
     of the ACM, vol. 19, no. 7, pp. 385–394, 1976.                                        on Tools and Algorithms for the Construction and Analysis of Systems.
 [3] P. Godefroid, N. Klarlund, and K. Sen, “Dart: directed automated                      Springer, 2003, pp. 553–568.
     random testing,” in ACM Sigplan Notices, vol. 40, no. 6. ACM, 2005,              [26] E. M. Clarke, D. Kroening, and F. Lerda, “A tool for checking ansi-
     pp. 213–223.                                                                          c programs,” tools and algorithms for construction and analysis of
 [4] V. Chipounov, V. Kuznetsov, and G. Candea, “S2e: A platform for in-                   systems, vol. 2988, pp. 168–176, 2004.
     vivo multi-path analysis of software systems,” ACM SIGPLAN Notices,              [27] T. Avgerinos, A. Rebert, S. K. Cha, and D. Brumley, “Enhancing
     vol. 46, no. 3, pp. 265–278, 2011.                                                    symbolic execution with veritesting,” in Proceedings of the 36th In-
 [5] M. Wang, J. Liang, Y. Chen, Y. Jiang, X. Jiao, H. Liu, X. Zhao, and                   ternational Conference on Software Engineering. ACM, 2014, pp.
     J. Sun, “Safl: increasing and accelerating testing coverage with symbolic             1083–1094.
     execution and guided fuzzing,” in Proceedings of the 40th International
     Conference on Software Engineering: Companion Proceeedings. ACM,
     2018, pp. 61–64.


      Copyright © 2018 for this paper by its authors.                            21