6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) Boost Symbolic Execution Using Dynamic State Merging and Forking Chao Zhang Weiliang Yin∗ Zhiqiang Lin† Beijing National Research Center for HUAWEI, China Information Science and Technology(BNRist) {yinweiliang.ywliang∗ ,linzhiqiang† }@huawei.com School of Software, Tsinghua University, Beijing 100084, China zhang-chao13@mails.tsinghua.edu.cn Abstract—Symbolic execution has achieved wide application in However on the other hand, path explosion is still the most software testing and analysis. However, path explosion remains significant bottleneck which limits scalability of symbolic the bottleneck limiting scalability of most symbolic execution execution. When traversing a program, each path encodes engines in practice. One of the promising solutions to address this issue is to merge explored states and decrease number of paths. decisions on all its historical branches and maintains them Nevertheless, state merging leads to increase in complexity of in its path predicate [14]. Thus, the number of paths to be path predicates at the same time, especially in the situation where explored grows exponentially with the number of branches variables with concrete values are turned symbolic and chances of and exhaustively covering all paths could be extremely time- concretely executing some statements are dissipated. As a result, consuming even for a small program [15]. calculating expressions and constraints becomes much more time- consuming and thus, the performance of symbolic execution State-of-the-art solutions to resolve path explosion mainly is weakened in contrast. To resolve the problem, we propose aim at directly decreasing number of paths or optimizing the a merge-fork framework enabling states under exploration to process within limited time budgets. A number of articles switch automatically between merging mode and forking mode. introduce various search heuristics to improve performance First, active state forking is introduced to enable forking a towards particular targets. For example, ideas from random state into multiple ones as if a certain merging action taken before were eliminated. Second, we perform dynamic merge- testing and evolutionary algorithms are frequently used to fork analysis to cut source code into pieces and continuously guide the search order of path exploration so that coverage evaluate efficiency of different merging strategies for each piece. gets higher in the same time [16], [17]. The second way Our approach dynamically combines paths under exploration is to apply some imprecise techniques from static analysis, to maximize opportunities for concrete execution and ease the such as abstract interpretation and summary. These techniques burden on underlying solvers. We implement the framework on the foundation of the symbolic execution engine KLEE, and calculate approximations to simplify analysis of code and de- conduct experiments on GNU Coreutils code using our prototype crease produced states [18], [19]. State merging is another way to present the effect of our proposition. Experiments show up to designed to cut down number of paths to be explored. When 30% speedup and 80% decrease in queries compared to existing path exploration encounters a branch statement, there are two works. succeeding states constructed and in turn the original path is Index Terms—State Merging, Active State Forking, Local Merge-fork Analysis split into to two sub-paths, corresponding to true and false case respectively. If the two states join at the same location later, they can be merged into one again, with path predicate updated I. I NTRODUCTION using disjunction and variable expressions using ITE operation The innovation of symbolic execution has been proposed [20]. Fig.1 shows an example where there are two paths in the over four decades ago [1], [2]. It shows great potential in given program: 1 → 2 → 3 → 5 → 6 corresponding to true software analysis and testing, and underlies a growing number case of if-statement and 1 → 2 → 4 → 5 → 6 to false of tools which are widely applied in practical cases [3]–[7]. For case. Succeeding states of 3 and 4 are merged at 5, with path example, in [5], [6], they apply symbolic execution to greatly predicate set to n < 0 ∨ ¬(n < 0) or simply T RU E, and improve the performance of traditional fuzz testing [8], [9]. In expression of ret set to IT E(n < 0, −n, n). [7], they incorporate symbolic execution as a bridge to reduce State merging is intended to decrease the number of paths the false positive of static analysis [10] and the false negative to be explored while no imprecision is introduced. Meanwhile, of dynamic analysis and run-time verification [11]The key it faces the challenge that path predicates and variable expres- idea behind symbolic execution is to replace input values with sions may get more complex. Constraints involving disjunction symbols, update symbolic expressions for program variables, and IT E (actually another form of disjunction) are extremely and calculate path predicates along explored execution paths unfriendly to underlying solvers [21]. Such negative impact to determine their feasibility [12], [13]. In this way, symbolic on performance is rather serious when expressions which can execution simulates how a program runs and explore feasible be evaluated concretely turn symbolic after merging. Modern paths in an efficient way. symbolic execution engines commonly adopt concrete execu- Copyright © 2018 for this paper by its authors. 14 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) state merging. Section III introduces active state forking, followed by description of dynamic merge-fork analysis in 1 Section IV. Next we display experimental evaluation of a int abs(int n) { int ret = 0; prototype implementing our merge-fork framework in Section int ret = 0; V. Finally we give our conclusion in Section VI. if (n < 0) 2 if (n < 0) if (n < 0) n < 0, true n < 0, false ret = -n; II. S YMBOLIC E XECUTION AND S TATE M ERGING else 3 4 ret = n; In this section, we first present a general process of symbolic ret = ‐n; ret = n; return ret; execution and then have a discussion on state merging. 5 } return ret; A. Classical Symbolic Execution 6 Symbolic execution starts from the program entry and keeps updating variable expressions and path predicates to systematically explore feasible paths. In this way the executor Fig. 1: An example of state merging. can reveal code reachability, find bugs or generate test inputs. Descriptions of a general work-flow of classical symbolic execution techniques are given in many works [14], [15]. tion when code to be interpreted only involves concrete values. We refer to these works and present a general algorithm in Improper merging actions may dismiss chances of concrete Algorithm 1. To be concise but without loss of generality, execution and lead to extra solving cost, which eliminates the we consider a language made up of three kinds of statements, benefit of decrease in path scale in reverse [20]. Actually, a i.e. assignments, conditional branches and exit statements with merging action will speed up symbolic execution or not, or exit code to indicate normal termination or failed assertion/er- even speed down it, is determined by the future code which is ror. By unwinding loops and inlining function calls we can still unexplored. Kuznetsov et al. have proposed an estimation transform a program to this form. In the paper, all programs algorithm in [14] to statically find variables frequently used used to describe our method are in such form if not explained in branch conditions and prevent any merging action erasing particularly. concrete values of these variables. In this way, state merging In symbolic execution, a state of the program is usually is selectively performed to exclude inefficient cases. We are encoded as a triple (l, P, M ) where l refers to the current inspired by this work and try to improve merging strategies program location, P to path predicate corresponding to the to handle more complicated situations. We build a framework path from the entry to l, and M to symbolic store mapping dynamically merging and forking states according to the code from variables to their current expressions. The algorithm under execution. During execution, combinations of paths vary keeps taking out a state from W orklist, extending the state to keep chances of concrete execution while exploration of and pushing newly found ones back to W orklist. Call to states and paths are joint as much as possible. selectState at line 3 determines which state is chosen to Contributions. In this paper, we present a framework be visited next, and thus implies the searching strategies of boosting symbolic execution by automatically switching be- path exploration. The chosen state is then updated according tween merging states together and forking them separately to the next statement just after its location (line 5-21). Call to throughout path exploration. We implement a prototype on the eval and checkCond in this procedure symbolically calculates foundation of KLEE [22] and conduct experiments on GNU expressions and checks satisfiability. Coreutils code to evaluate the method. Our main contributions Code at line 22-30 of Algorithm 1 is designed for merging are: states. For a newly found state s, if there is another state • We introduce active state forking to eliminate a historical s0 in W orklist sharing the same program location, s could merging action and restore the state into original separate be merged with s0 rather than directly added to W orklist. ones. By integrating symbolic execution with state merg- Definition of shouldM erge makes decisions on whether to ing and forking, we can change the way how states are merge and thus realizes strategies of state merging. combined and paths are explored at any moment during execution. B. State Merging: SSE vs. DSE • We present dynamic merge-fork analysis to evaluate By defining selectState and shouldM erge differently, the performance trade-offs for different path combinations. classical symbolic execution algorithm presented in Section Decisions are made by estimating chances of concrete II-A acts as two extremes in regard to design space of execution and efforts in reasoning future statements. The state merging. Dynamic symbolic execution (DSE, adopted source code is cut into pieces so analysis is simplified in [3], [22]–[24] etc.) skips merging and every execution state and conclusions can be drawn in each piece locally. encodes a unique path from the entry to its corresponding The remainder of this paper is organized as follows. In location. Static symbolic execution (SSE) ensures all paths Section II we have an overview of symbolic execution and joining at each junction node in the control flow structure Copyright © 2018 for this paper by its authors. 15 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) Algorithm 1 Classical symbolic execution from concrete execution. 1: W orklist = {(l0 , true, M0 )} Taking both SSE and DSE into consideration, it requires a 2: while W orklist 6= ∅ do compromise of the two extremes to use state merging to deal 3: (l, P, M ) = selectState(W orklist) with path explosion in real code. While dynamic techniques 4: W orklist = W orklist \ {(l, P, M )} remain to keep symbolic execution benefiting from concrete 5: switch nextInst(l) do execution and heuristic search order, states are selectively 6: case v := e: merged to eliminate duplicated efforts exploring paths joining 7: M 0 = M \ {(v, M [v])} ∪ {(v, eval(e, M ))} at a certain location. The work of [14] tries to introduce state 8: S = {(succ(l), P, M 0 )} merging in DSE and switch between different searching strate- 9: case if cond goto l0 : gies. The executor keeps discovering merging opportunities 10: sat = checkCond(cond, P, M ) with positive effects on performance and prioritizes visit of 11: if sat == CON ST T RU E then involving states. Veritesting [27] proposes combination of SSE 12: S = {(l0 , P, M )} and DSE by changing execution mode in process. SSE makes 13: else if sat == CON ST F ALSE then code analyzed one-pass instead of visiting all feasible paths 14: S = {(succ(l), P, M )} separately in DSE. In this way the executor is able to benefit 15: else if sat == N ON CON ST then from solvers and boost path exploration. 16: S = {(l0 , P ∧ eval(cond, M ), M ), (succ(l), P ∧ Our proposition tries to mix SSE and DSE on the foundation ¬eval(cond, M ), M )} of [14]. In this article, Kuznetsov et al. have constructed 17: end if a framework to estimate whether it is worth driving path 18: case exit: exploration towards a potential merging action. They present 19: S=∅ a static bottom-up analysis to find variables which may be 20: printP ath() frequently used later and suggest merging actions with no 21: end switch impact on values of these variables. Path exploration is then 22: for all s = (l, P, M ) ∈ S do under control of an SSE-like module temporally instead of a 23: for all s0 = (l0 , P 0 , M 0 ) ∈ W orklist do normal DSE process. We take a more complicated case into 24: if l = l0 ∧ shouldM erge(s, s0 ) then consideration where evaluation on a merging action varies 25: W orklist = W orklist \ {s0 } across the code. A static decision - either merging or not 26: M 00 = {(v, IT E(P, M [v], M 0 [v]))|(v, M [v]) ∈ - improves efficiency for some pieces of code but burdens M} performance for others. Actually, an optimal solution usually 27: S = S \ {s} ∪ {(l, P ∨ P 0 , M 00 )} consists of multiple parts which are drawn in different pieces 28: end if of code locally and combined together dynamically. 29: end for III. ACTIVE S TATE F ORKING 30: end for 31: W orklist = W orklist ∪ S In this section, we depict our active state forking technique. 32: end while First, we use a series of examples to explain motivation for forking and then describe definitions and method of it. A. Motivating Examples are merged completely, which is common in symbolic model As is discussed in last chapter, influence of a certain checking and verification condition generation [25], [26]. merging action varies throughout different pieces of code. We SSE systematically explores all feasible control paths with use some example code in Fig.2 to explain the issue in detail all possible input values taken into consideration [20]. It visits and present effect of active state forking. To be concise, some states in topological order to ensure that all corresponding function calls are not inlined. states are merged at every junction node in control flow of the In function f 1 state merging dismisses the opportunity of program. However, its major trade-offs are great difficulties in concrete execution. There are two if-blocks in f 1, namely calculating path predicates and strict restrictions on searching line 3 and line 4-5. State forks and path splits when reaching strategies. DSE techniques are then proposed to interleave con- the first if-block (this is passive state forking caused by non- crete and symbolic execution and focus on exploring possible concrete if-condition), and then two sub-paths join after it. execution paths one by one. A statement can be executed Expression of f lag gets different along two sub-paths, and concretely rather than symbolically if all involving variables is instantly used in if-condition at line 4. Expression of f lag are concrete. For an assignment, the result is calculated is 1 along true case of the first if-block, 0 along false case, directly if possible. For a conditional branch if cond goto l0 , and IT E(a == b, 1, 0) if states are merged before line 4. It concrete execution is performed if cond can be determined can be seen that both the two separate states lead concrete concretely. As a result, interpreting the statement does not execution on the second if-block since its condition can be need to split the current path. DSE techniques such as [3], decided concretely, but the merged one erases concrete value [23], [24] introduce concrete inputs to make analysis benefit of f lag and results in passive state forking. Therefore, the Copyright © 2018 for this paper by its authors. 16 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) 1 void f1(int a, int b) { possible to change how paths are combined during exploration. 2 int flag = 0; To maintain information of historical merging actions and 3 if (a == b) flag = 1; support forking states, we make extensions to the formalism 4 if (flag) g1(); in classical symbolic execution. 5 else g2(); A DSE method as shown in Algorithm 1 conducts path 6 exit(0); exploration by visiting execution states and creating new ones. 7 } This process can be efficiently represented as a graph in 8 which nodes are execution states and edges indicate transitions 9 void f2(int a, int b) { between them. Since DSE does not merge states, such a graph 10 int flag = 0; would be a tree, which is so called symbolic execution tree. 11 if (a == b) flag = 1; Definitions of execution state and symbolic execution tree are 12 if (flag) g1(); given in Definition 1 and 2. 13 else g2(); Definition 1: An execution state is defined as a tuple es = 14 g3(a, b); (l, P, M ) where 15 } - l ∈ L is the current program location. 16 - P ∈ F(Σ, Const) is the path predicate, which is a formula 17 void f3(int a, int b) { over symbolic inputs Σ and constants Const encoding all 18 int flag = 0; decisions of branches along the path from entry to l. 19 if (a == b) flag = 1; - M : V → F(Σ, Const) is the symbolic store, which 20 g3(a, b); maps a program variable to the formula corresponding to its 21 if (flag) g1(); expression. 22 else g2(); Definition 2: A symbolic execution tree is a tree T = 23 } (ES, T ) where - ES is the set of execution states. Fig. 2: Motivating examples. - T ⊆ ES × ES is the set of transitions between execution states. Symbolic execution tree records every explored state and merging action after the first if-block just makes expressions path, and is widely used to characterize the process of sym- and path predicate more complex while number of paths to bolic execution [25]. Our extended formalism is built on top explore is not changed. of it. While state merging is performed, an execution state In function f 2, exit(0) at line 6 in f 1 is replaced with could encode multiple nodes in the corresponding symbolic another function call g3(a, b). While things go the same as execution tree. Such relations naturally extract information of for f 1, it is required to reason about call to g3 on each merging actions, and thus, we introduce virtual states to hold path. If paths are merged before line 14, call to g3 will be the similar structure. Definition 3 presents how virtual states analyzed only once rather than once along each path. Hence, are defined. it is expected that states forked in the first if-block should be Definition 3: A virtual state is defined as a tuple vs = merged at the exit of the second if-block. (es, CV ) where For function f 3, call to g3 goes before the second if- - es ∈ ES is the corresponding execution state. block. As is discussed for f 1 and f 2, we can make a similar - CV ⊆ V × Const is the set of variables with concrete conclusion that states forked in the first if-block should be values together with their values. merged immediately to avoid repeated efforts reasoning about When two execution states are merged to be one, we create call to g3, but later ”restored” to two separate paths with virtual state for each sub-state respectively and map them concrete value for f lag along each. Common methods try to the produced state. Every virtual state tracks its exclusive to compare advantages and disadvantages to reveal a static concrete variables. Our formalism then constructs transitions decision on state merging, and our approach, in another way, between execution states and virtual states, resulting in a makes it possible to fork a merged state as if a historical directed acyclic graph. An execution state contains data of path merging action were not taken. In f 3, paths are made to join predicate and symbolic store, and is used to actually execute after line 19 and fork actively before line 21 with f lag restored statement. It appears as a node in the graph only if it never to concrete along each forked path. In this way, the benefit experiences merging or we are not concerned with its historical of state merging is maximized while its negative impact on merging actions anymore. Otherwise, its corresponding virtual concrete execution is avoid as far as possible. states are used instead to make up the graph. Definition 4 describes our extended execution graph formally. B. Merge and Fork States with Extended Execution Graph Definition 4: An extended execution graph is a directed The motivating examples figure out that an optimal merging acyclic graph G = (S, T, ψ, ∆) where strategy may promote contrast decisions for different pieces of - S ⊆ ES ∪ V S is the set of states. code. Active state forking together with state merging makes it - T ⊆ S × S denotes transitions between states. Copyright © 2018 for this paper by its authors. 17 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) - ψ : ES → 2V S maps a execution state to the set of its influence any concrete variable and hence, will never invoke corresponding virtual states. active forking. - ∆ : V S → F(Σ, Const) maps a virtual state to its newly Forking. For an execution state es in G, passive forking by added conditions compared to its predecessor’s path predicate. conditional branch is the same as in traditional DSE. Other- While execution states hold essential data and drive execu- wise, forking action is applied to every sub-state vs ∈ ψ(es) tion to explore paths, virtual states simulate the structure of and ∆ is updated with branch decision accordingly for forked symbolic execution tree and preserve merging history. If there productions. is no further operation on virtual state, our extended execution In the case of active forking aimed at eliminating historical graph would degenerate into a tree. However, virtual states merging actions, the work flow gets more complicated: it is are merged and swapped for execution states when updates required to partitioning sub-states, constructing path predicates on variables efface complexity introduced by merging. In the and updating expressions. remaining part of this section, we explain in detail how to The execution state es to be forked encodes multiple sub- perform state merging and forking with our extended execution states which are represented by virtual states in ψ(es). Active graph. forking is invoked because some variables are turned from Merging. When trying to merge es1 = (l, P1 , M1 ) and concrete to symbolic by merging actions. Thus, the first step es2 = (l, P2 , M2 ) into es3 = (l, P1 ∨ P2 , M ) where M (v) = is to identify concerned variables. Every virtual state vs ∈ IT E(P1 , M1 (v), M2 (v)) for each v ∈ V , we first create ψ(es) where these variables are concrete is separated to be a virtual states and then update the graph G = (S, T, ψ, ∆). forked state. For sake of efficiency, states with same concrete If es1 appears as a node in G (in other word es1 ∈ S and values for these variables can be joined. In this way, all sub- ψ(es1 ) = ∅), we create a virtual state directly to represent states of es are partitioned into several groups and each group its corresponding sub-state encoded by es3 : vs = (es3 , CV ). corresponds to a potential forked state. CV denotes exclusive concrete variables of this sub-state, so For each sub-state vs, its path predicate can be calculated it is made up of variables concrete in es1 but not in es2 . That by backtracking to its ancestor execution state and combin- is, CV = χ(M1 ) \ χ(M2 ) where χ(M ) = {(v, M (v)|M (v) ∈ ing with newly added conditions denoted by ∆ along the F(Const)}. path. The results are used to construct path predicate for If es1 encodes multiple sub-states so that es1 ∈ / S or each potential forked state and decide IT E selections of ψ(es1 ) 6= ∅, we create a virtual state for every sub-state in variables expressions. The original state es is then forked ψ(es1 ) and add up CV of sub-state and exclusive concrete into several execution states as if some historical merging variables of es1 to assign to newly created one. actions were eliminated. Besides, it is remarkable in real Creating virtual states for es2 is in the same way, except that case that optimization of expressions and constraints based exclusive concrete variables of es2 is calculated as χ(M2 ) \ on data structure, constant propagation, query cache and other χ(M1 ). practical techniques is adopted to cut down efforts and boost All these generated virtual states are then added to G, speed of calculation in state forking. together with corresponding transitions. ψ(es3 ) is set to be all these new states. ∆ of them can be simply set to empty IV. DYNAMIC M ERGE - FORK A NALYSIS since there is no change between their and their predecessors’ Active state forking makes it possible to raise dynamic path predicates. merging strategies, and next task is to identify ”good” merging Update of Assignment. Updating an execution state es in G actions. In this section, we will explain how to identify can be done immediately. However, if es ∈ / S, set of concrete possibilities of concrete execution and how to make decisions variables of each virtual state vs = (es, CV ) ∈ ψ(es) requires on merging by analyzing code piece by piece. update. Those not concrete or not exclusively concrete any- more should be removed from concrete variables, while those A. Identify Concrete Values and Concrete Execution becoming exclusively concrete should be added to concrete State merging avoids repeated efforts in analysis but may variables. burden underlying solvers as trade-off. Considering examples E.g. for assignment v := e, (v, cv ) ∈ CV is removed if e in Fig.2, the ideal solution would be that we can selectively is not concrete, or it is but calculation does not involve any merge states so that 1) variables turned symbolic are used to concrete variables in CV . Otherwise v and its value is added calculate future if-conditions as little as possible; 2) merging to CV if there is no (v, cv ) ∈ CV , and e is concrete and actions with no influence on future if-conditions are as many depends on certain variables and values in CV . as possible. The former prevents state merging from being When updating virtual states, ones might point to the obstacle to concrete execution while the latter indicates greater same execution state and possess equal concrete variables benefit from merging. and values. These states can be merged in the graph since A bottom-up estimation identifying variables frequently distinguishing such sub-states makes no sense to merging and used is presented in [14]. The basic idea is tracking variables forking. If all virtual states of an execution state are merged, used in if-conditions along use-def chain. Our method also we can swap the production for the execution state itself, pays attention to use in if-condition, but works in top-down which means that all its historical merging actions do not order instead because analysis will be break down in different Copyright © 2018 for this paper by its authors. 18 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) pieces of code. A list of variables with concrete values is 100 maintained and updated when executing. Thus, we can identify possible concrete values and in turn, find out chances of 10 concrete execution. Path exploration ratio (%) 1 B. Local Merge-fork Analysis Active state forking makes it possible to redraw decisions 0.1 on merging actions, so we can cut a program into pieces 0.01 and analyze code piece by piece. The division is randomly made under these rules: difference in scale between pieces echo 0.001 is limited within a proper bound; cut points are located at shuf seq exit of if-blocks, end of inlined functions, or terminators of 0.0001 sequential blocks. And a piece is divided further if it contains Locations greatly different parts, e.g. variables used in one part are totally disjoint with ones in another. Nevertheless, too detailed Fig. 3: Ratio of explored paths for 3 representative programs. division achieves little increase in performance while goes time-consuming, so we limit the times of further division. After code is divided into pieces, we can make decision for path exploration with no approximation, and path feasibility is states whether they should be merged or not in a piece. There checked at every conditional branch. State forking is invoked are still situations where different parts of code have opposite for each non-determined branch, and states at the same control votes for a merging action in the same piece. To resolve the location are never merged. KLEE guides exploration with a issue, we propose another two types of decision: merge until number of optional search strategies including random search, a location and merge after a location. The former performs traditional breadth/depth first search, strategy prioritizing vis- merging temporally and invokes analysis again later, while the iting previously unexplored code, and combination of various latter make a delay to merging. Because of these choices of strategies. merging decision, analysis of how to combine paths is invoked KLEE has built in models of execution state and symbolic at junction node in control flow structure, at the beginning of execution tree. We extend them to implement our active each piece, and at locations specified by ”merge until” and state forking method. Thanks to implementation of expres- ”merge after” decisions. sions and optimization of constraints in KLEE, it is easy Hence, our merge-fork analysis determines locally how to to track updates of path predicates and we could perform handle each merging opportunities within a piece of code. For state forking by directly imposing conditions of dismissed the simplified language used in our discussion, this process can merging action on path predicates and symbolic store. To be implemented by counting occurrences of if-condition and in support dynamic merge-fork analysis, we make a call in state turn estimating number of queries needed. Besides, unresolved selection (selectState) to an LLVM Pass constructing con- function call and code block difficult to reason can also be crete dependence graph and estimating merging effects locally. counted to influence strategy of merging and forking. When If merging is suggested, search strategy is temporally fixed decided to merge two states, path exploration is temporally to continuously extending paths to target merging location. taken over from searching heuristics and involving states are Otherwise, states are explored according to original search prioritized to visit. strategies of KLEE. The original work of KLEE presents an experiment on GNU V. E XPERIMENTAL E VALUATION Coreutils programs which are widely used UNIX command- In this section we illustrate some experimental evaluation on line utilities. There are 101 files in total adding up to over our proposition. We build a prototype to implement our merge- 70,000 lines of code. We test these programs using our fork techniques (Section V-A), and run it with Coreutils pro- prototype and collect information from output files. grams to analyze performance in two scenarios: exhaustively exploring all feasible paths (Section V-B) and incomplete B. Speedup Path Exploration exploration within certain time budget (Section V-C). Besides, Our merge-fork framework is aimed at identifying valuable we also make a comparison with previous work of Kuznetsov merging opportunities and conducting state merging, so we are et al. [14] (Section V-D). first concerned with the influence of our method on number of explored paths. A. Prototype A direct comparison is measuring number of explored paths To evaluate the effect of our merge-fork framework, we with different time spent. However, the number is hard to implement active state forking and dynamic merge-fork anal- figure out since there are paths explored partially if exhaustive ysis in a prototype on the foundation of the famous symbolic exploration is not completed. We therefore choose to calculate execution engine KLEE. It takes LLVM bitcode compiled from the number of paths at a specific location instead of with some source code as input. The original executor of KLEE performs time spent during the execution. The size of symbolic inputs Copyright © 2018 for this paper by its authors. 19 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) 3600 100 Completion time of merge-fork prototype (s) 80 500 Decrease in queries (%) 60 50 40 20 5 5 50 500 3600 0 Completion time of plain KLEE (s) Files Fig. 4: Compare completion time with plain KLEE for ex- Fig. 7: Decrease in queries comparing with [14]. haustive exploration tasks. efficiently joins branches and decrease paths. It is remarkable 15 that paths getting through a certain location might enclose a large gap in time. Statement coverage increase (%) 10 Fig.3 shows the growth of path exploration ratio correspond- ing to program locations for 3 representative files. The final ratio is obtained at the exit of code. It can be seen that the 5 effect in decreasing paths varies among the programs. While a large part of paths are merged in echo, state merging makes 0 little influence on seq. The trend of growth indicates structure of control flow and involvement of active state forking. Since -5 merging and forking work simultaneously, change in path numbers appear smooth. The target of decreasing paths is to free underlying solvers’ -10 Files burden and achieve faster execution by omitting duplicated analysis efforts. Hence, we run Coreutils under symbolic Fig. 5: Compare statement coverage with plain KLEE for inputs with growing size and compare completion time of plain uncompleted exploration tasks. KLEE and our prototype. In order to compare with [14] easily, we draw a similar graph Fig.4 showing the result within 1 hour 40 time limit. The black dots correspond to experiment instances and the gray disks indicate the relative size of the symbolic inputs used in each instance. The result presents that in the 20 majority of cases, our prototype achieves positive speedup over KLEE. The rightmost dots represent those exceed time limit Speedup (%) in KLEE but can be solved completely in our framework. 0 C. Exploration Guided by Statement Coverage We now look into the situation where tasks can not be -20 finished completely. We choose a proper size of symbolic inputs to keep the engine busy and measure statement coverage in limited time. Plain KLEE uses coverage-oriented search -40 strategy to guide path exploration towards uncovered locations. Files Our merge-fork framework would interrupt the process and Fig. 6: Speedup over efficient state merging of [14]. temporally drive exploration towards merging states. Fig.5 shows its influence on statement coverage. While state merging saves analysis efforts, there are more paths can be visited in is properly set to make it possible for exhaustive exploration, some cases, which leads to increase in coverage. However, so the accurate number of feasible paths of the program is other cases suffer from change in search order and in turn known. Then we run our prototype and count paths at given coverage is decreased. Data in Fig.5 confirms that estimation control locations. In this way we can reveal how state merging of merging opportunities and fast-forwarding some states do Copyright © 2018 for this paper by its authors. 20 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) not cause any obvious troubles to the original exploration goal [6] J. Liang, M. Wang, Y. Chen, Y. Jiang, and R. Zhang, “Fuzz testing in of searching heuristics. practice: Obstacles and solutions,” in 2018 IEEE 25th International Con- ference on Software Analysis, Evolution and Reengineering (SANER). D. Compare to Existed Work IEEE, 2018, pp. 562–566. [7] C. Wang, Y. Jiang, X. Zhao, X. Song, M. Gu, and J. Sun, “Weak-assert: While our proposition is on the foundation of the previous a weakness-oriented assertion recommendation toolkit for program anal- work of Kuznetsov et al. [14], we make some comparison ysis,” in Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. ACM, 2018, pp. 69–72. with their experiment results. All the data is obtained from [8] J. GLiang, M. Wang, Y. Chen, and Y. Jiang, “Pafl: Extend fuzzing their official website http://cloud9.epfl.ch. Since experimental optimizations of single mode to industrial parallel mode,” 2018. environment and KLEE version may vary, all the data used in [9] J. Guo, Y. Jiang, Y. Zhao, Q. Chen, and J. Sun, “Dlfuzz: Differential fuzzing testing of deep learning systems,” 2018. our comparison is increase/decrease over our respective KLEE [10] J. Gao, X. Yang, Y. Fu, Y. Jiang, and J. Sun, “Vulseeker: a semantic baseline. learning based vulnerability seeker for cross-platform binary,” in Pro- First we compare the completion time under the same size of ceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 2018, pp. 896–899. symbolic inputs. Fig.6 illustrates the result. We use completion [11] Y. Jiang, H. Song, Y. Yang, H. Liu, M. Gu, Y. Guan, J. Sun, and time of plain KLEE to normalize data of both tools and then L. Sha, “Dependable model-driven development of cps: From stateflow calculate the effect of speedup. Over 80% of the tasks get simulation to verified implementation,” ACM Transactions on Cyber- Physical Systems, vol. 3, no. 1, p. 12, 2018. faster with our merge-fork method and average speedup is [12] C. Cadar and K. Sen, “Symbolic execution for software testing: three around 10%. Most instances with negative speedup are finished decades later,” Communications of the ACM, vol. 56, no. 2, pp. 82–90, in a rather short time (101 magnitude in seconds). 2013. We also collect queries generated during the execution. [13] C. Cadar, P. Godefroid, S. Khurshid, C. S. Păsăreanu, K. Sen, N. Till- mann, and W. Visser, “Symbolic execution for software testing in prac- Fig.7 shows the comparison. The number of queries is also tice: preliminary assessment,” in Proceedings of the 33rd International normalized against data from plain KLEE. It confirms the Conference on Software Engineering. ACM, 2011, pp. 1066–1071. effect of active state forking since some queries are omitted [14] V. Kuznetsov, J. Kinder, S. Bucur, and G. Candea, “Efficient state merging in symbolic execution,” Acm Sigplan Notices, vol. 47, no. 6, on separate paths by concrete execution. The improvement in pp. 193–204, 2012. performance is achieved through decreasing paths via state [15] Q. Yi, Z. Yang, S. Guo, C. Wang, J. Liu, and C. Zhao, “Eliminating path merging and invoking concrete execution by restoring merged redundancy via postconditioned symbolic execution,” IEEE Transactions on Software Engineering, 2017. paths when needed. [16] T. Xie, N. Tillmann, J. de Halleux, and W. Schulte, “Fitness-guided path exploration in dynamic symbolic execution,” in Dependable Systems VI. C ONCLUSIONS & Networks, 2009. DSN’09. IEEE/IFIP International Conference on. In symbolic execution, state merging decreases the number IEEE, 2009, pp. 359–368. [17] J. Burnim and K. Sen, “Heuristics for scalable dynamic test generation,” of paths but eliminates the benefit of concrete execution in in Automated Software Engineering, 2008. ASE 2008. 23rd IEEE/ACM contrast. To resolve the conflict and improve the performance, International Conference on. IEEE, 2008, pp. 443–446. we proposed a merge-fork framework to enable a bi-directional [18] H. Wang, T. Liu, X. Guan, C. Shen, Q. Zheng, and Z. Yang, “De- pendence guided symbolic execution,” IEEE Transactions on Software switch for states to be merged and forked. State merging and Engineering, vol. 43, no. 3, pp. 252–271, 2017. active state forking can make state and path combinations fit [19] T. Chen, X.-s. Zhang, S.-z. Guo, H.-y. Li, and Y. Wu, “State of the different code dynamically. This in turn makes it possible art: Dynamic symbolic execution for automated test generation,” Future Generation Computer Systems, vol. 29, no. 7, pp. 1758–1773, 2013. to divide code into pieces and dynamically draw decision [20] T. Hansen, P. Schachte, and H. Søndergaard, “State joining and splitting on state merging piece by piece. Experiment on a prototype for the symbolic execution of binaries,” in International Workshop on trivially displays the potential of our approach. While explo- Runtime Verification. Springer, 2009, pp. 76–92. [21] V. Ganesh and D. L. Dill, “A decision procedure for bit-vectors and ar- sive development in dynamic symbolic execution and search- rays,” in Proceedings of the 19th International Conference on Computer based techniques has a huge impact on classical state merging Aided Verification, ser. CAV’07, 2007, pp. 519–531. methodology, it shows promise combining our approach with [22] C. Cadar, D. Dunbar, D. R. Engler et al., “Klee: Unassisted and auto- matic generation of high-coverage tests for complex systems programs.” these techniques and making state merging an efficient method in OSDI, vol. 8, 2008, pp. 209–224. to solve practical issues in symbolic execution. [23] C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler, “Exe: automatically generating inputs of death,” ACM Transactions on R EFERENCES Information and System Security (TISSEC), vol. 12, no. 2, p. 10, 2008. [1] L. A. Clarke, “A system to generate test data and symbolically execute [24] P. Godefroid, M. Y. Levin, D. A. Molnar et al., “Automated whitebox programs,” IEEE Transactions on software engineering, no. 3, pp. 215– fuzz testing.” in NDSS, vol. 8, 2008, pp. 151–166. 222, 1976. [25] S. Khurshid, C. S. Păsăreanu, and W. Visser, “Generalized symbolic [2] J. C. King, “Symbolic execution and program testing,” Communications execution for model checking and testing,” in International Conference of the ACM, vol. 19, no. 7, pp. 385–394, 1976. on Tools and Algorithms for the Construction and Analysis of Systems. [3] P. Godefroid, N. Klarlund, and K. Sen, “Dart: directed automated Springer, 2003, pp. 553–568. random testing,” in ACM Sigplan Notices, vol. 40, no. 6. ACM, 2005, [26] E. M. Clarke, D. Kroening, and F. Lerda, “A tool for checking ansi- pp. 213–223. c programs,” tools and algorithms for construction and analysis of [4] V. Chipounov, V. Kuznetsov, and G. Candea, “S2e: A platform for in- systems, vol. 2988, pp. 168–176, 2004. vivo multi-path analysis of software systems,” ACM SIGPLAN Notices, [27] T. Avgerinos, A. Rebert, S. K. Cha, and D. Brumley, “Enhancing vol. 46, no. 3, pp. 265–278, 2011. symbolic execution with veritesting,” in Proceedings of the 36th In- [5] M. Wang, J. Liang, Y. Chen, Y. Jiang, X. Jiao, H. Liu, X. Zhao, and ternational Conference on Software Engineering. ACM, 2014, pp. J. Sun, “Safl: increasing and accelerating testing coverage with symbolic 1083–1094. execution and guided fuzzing,” in Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. ACM, 2018, pp. 61–64. Copyright © 2018 for this paper by its authors. 21