Program Proceedings International Workshop on Design and Implementation of Formal Tools and Systems Portland, OR October 19, 2013 Co-located with FMCAD and MEMOCODE Preface The second DIFTS (Design and Implementation of Formal Tools and Systems) workshop was held at Portland, Oregon on Oct 19, 2013, co-located with Formal Methods in Computer- Aided Design Conference (FMCAD), and Formal Methods and Models for Co-design (MEMOCODE). The workshop emphasized the insightful experiences in tool and system design. The goal of the workshop is to provide a forum for sharing challenges and solutions that are original with ground breaking results. The workshop provided an opportunity for discussing engineering aspects and various design decisions required to put such formal tools and systems into practical use. It took a broad view of the formal tools/systems area, and solicited contributions from hardware and software domains such as decision procedures, verification, testing, validation, diagnosis, debugging, and synthesis. The workshop received 10 original submissions, out of which 3 were chosen under tool category, and 3 were chosen under system category. There were also three invited talks: first was given by Rance Cleaveland, Reactive Systems Inc., USA on “Approximate Formal verification using Model-based Testing”, second was given by Masahiro Fujita, University of Tokyo on “Diagnosis and correction of buggy hardware/software with formal approaches”, and third talk was given by Dhiraj Goswami, Synopsys Inc. on “Stimulus generation, enhancement and debug in constraint random verification.” First of all, we thank FMCAD’s steering committee for their continual support. We also thank FMCAD chairs Sandip Ray and Barbara Jobstmann, and MEMOCODE chairs Marly Roncken and Fei Xie for a seamless organization. We also thank Joe Leslie-Hurd for his help in local arrangements. We thank Boğaziçi University, Turkey for hosting the DIFTS website. We sincerely thank the program committee members and sub reviewers for selecting the papers and providing candid review feedbacks to the authors. Last but not least, we thank all the authors for contributing to the workshop and to all the participants of the workshop. Malay K. Ganai and Alper Sen Program Chairs DIFTS 2013 i DIFTS13 Program Committee General Program Chairs Malay Ganai NEC Labs America, USA Alper Sen Bogazici University, Turkey Program Committee Armin Biere Johannes Kepler University, Austria Gianpiero Cabodi Politecnico di Torino, Italy Franco Fummi University of Verona, Italy Malay Ganai NEC Labs America, USA Daniel Grosse University of Bremen, Germnay William Hung Synopsys Inc, USA Daniel Kroening Oxford University, UK Alper Sen Bogazici University, Turkey Ofer Strichman Technion - Israel Institute of Technology, Israel Chao Wang Virginia Tech, USA ii DIFTS13 Additional Reviewers Additional Reviewers Balakrishnan, Gogul Eldib, Hassan Horn, Alexander Ivrii, Alexander Kusano, Markus Le, Hoang Schrammel, Peter Sinz, Carsten Sousa, Marcelo Suelflow, Andre iii DIFTS13 Table of Contents Table of Contents Approximate Formal Verification using Model-based Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Rance Cleaveland (Invited speaker) Diagnosis and Correction of Buggy Hardware/Software with Formal Approaches . . . . . . . . . . 2 Masahiro Fujita (Invited speaker) Stimulus generation, enhancement and debug in constraint random verification . . . . . . . . . . . 3 Dhiraj Goswami (Invited speaker) A Fast Reparameterization Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Niklas Een and Alan Mishchenko LEC: Learning driven data-path equivalence checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Jiang Long, Robert Brayton and Michael Case Trading-off Incrementality and Dynamic Restart of Multiple Solvers in IC3 . . . . . . . . . . . . . . 19 Marco Palena, Gianpiero Cabodi and Alan Mishchenko Lemmas on Demand for Lambdas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Mathias Preiner, Aina Niemetz and Armin Biere CHIMP: a Tool for Assertion-Based Dynamic Verification of SystemC Models . . . . . . . . . . . . 38 Sonali Dutta, Moshe Y. Vardi and Deian Tabakov Abstraction-Based Livelock/Deadlock Checking for Hardware Verification . . . . . . . . . . . . . . . . 46 In-Ho Moon and Kevin Harer iv Approximate Formal Verification using Model-based Testing Rance Cleaveland University of Maryland, USA Abstract: In model-based testing, (semi-)formal models of systems are used to drive the derivation of test cases to be applied to the system-under-test (SUT). The technology has long been a part of the traditional hardware-design workflows, and it is beginning to find application in embedded-software development processes also. In automotive and land-vehicle control- system design in particular, models in languages such as MATLAB® / Simulink® /Stateflow® are used to drive the testing of the software used to control vehicle behavior, with tools like Reactis®, developed by a team including the speaker, providing automated test-case generation support for this endeavor. This talk will discuss how test-case generation capabilities may also be used to help verify that models meet formal specifications of their behavior. The method we advocate, Instrumentation-Based Verification (IBV), involves the formalizaton of behavior specifications as models that are used to instrument the model to be verified, and the use of coverage testing of the instrumented model to search for specification violations. The presentation will discuss the foundations of IBV, the test-generation approach and other features in Reactis® that are used to support IBV, and the results of several case studies involving the use of the methods. 1 Diagnosis and Correction of Buggy Hardware/Software with Formal Approaches Masahiro Fujita University of Tokyo, Japan Abstract: There have been intensive researches on debugging hardware as well as software. Some are very ad-hoc and based on simple heuristics, but others utilize formal methods and are mathematically modeled. In this talk, we first review various proposals on debugging from historical viewpoints, and then summarize the state-of-the-art in terms of both diagnosis and automatic correction of designs. In particular we show various approaches with SAT-based formulations of diagnosis and correction problems. We also discuss about them in relation to manufacturing test techniques. That is, if the design errors are within the pre-determined types and/or areas, there could be very efficient ways to formally verify, diagnosis and correction methods with small numbers of test vectors. In the last part of the talk, future perspectives including post-silicon issues are discussed. 2 Stimulus Generation, Enhancements and Debug in Constraint Random Verification Dhiraj Goswami Synopsys, USA Abstract: Verification cost in an IC design team occupies 60-80% of the entire working resources and efforts. Functional verification, posed at the foremost stage of the IC design flow, determines the customers' ability to find bugs quickly and thereby their time-to-results (TTR) and cost-of-results (COR). Consequently, functional verification has been the focus of the EDA industry for the last several decades. Constrained random simulation methodologies have become increasingly popular for functional verification of complex designs, as an alternative to directed-test based simulation. In a constrained random simulation methodology, random vectors are generated to satisfy certain operating constraints of the design. These constraints are usually specified as part of a testbench program (using industry-standard testbench languages, like SystemVerilog from Synopsys, e from Cadence, OpenVera, etc.). The testbench automation tool (TBA) is then expected to generate random solutions for specified random variables, such that all the specified constraints over these random variables are satisfied. These random solutions are then used to generate valid random stimulus for the Design Under Verification (DUV). This stimulus is simulated using industry-standard simulation tools, like VCS from Synopsys, NC-Verilog from Cadence, etc. The results of simulation are then typically examined within the testbench program to monitor functional coverage, which gives a measure of confidence on the verification quality and completeness. In this talk we will review the challenges of stimulus and configuration generation using constraint random verification methodology. We will also explore why state-of-the-art debug solutions are important to handle complexity and improve the quality of stimulus. 3 A Fast Reparameterization Procedure Niklas Een, Alan Mishchenko {een,alanmi}@eecs.berkeley.edu Berkeley Verification and Synthesis Research Center EECS Department University of California, Berkeley, USA. Abstract. Reparameterization, also known as range pre- the original node can be removed. The old primary in- serving logic synthesis, replaces a logic cone by another logic puts dominated by the given node are also removed by this cone, which has fewer inputs while producing the same out- procedure. put combinations as the original cone. It is expected that a smaller circuit leads to a shorter verification time. This Example. Suppose a design contains inputs x1 , x2 , paper describes an approach to reparameterization, which and a gate Xor(x1 , x2 ); and that furthermore, x1 has is faster but not as general as the previous work. The new no other fanouts besides this Xor-gate. Then, no mat- procedure is particularly well-suited for circuits derived by ter which value x2 takes, both a zero and a one can localization abstraction. be forced at the output of the Xor by setting x1 ap- propriately, and thus the Xor-gate can, for verification purposes, be replaced by a primary input. 1 Introduction The proposed method to find similar situations starts by The use of reparameterization as a circuit transformation in computing all dominators of the netlist graph, then for each the verification flow was pioneered by Baumgartner et. al. candidate node dominating at least one PI the following in [1]. In their work, new positions for the primary inputs quantification problem is solved: “for each assignment to (PIs) are determined by finding a minimum cut between the non-dominated gates, does there exist a pair of assign- the current PI positions and the next-state variables (flop ments to the dominated PIs that results in a zero and a inputs). BDDs are then used to compute the range (or im- one at the candidate node”. More formally, assuming that age) on the cut. Finally, a new logic cone with the same x represents non-dominated gates (“external inputs”) and range (but fewer PIs), is synthesized from the BDDs and yi represents dominated PIs (“internal inputs”), the follow- grafted onto the original design in place of the current logic ing is always true for the node’s function φ: cone. This is a very powerful transformation, but it has po- tential drawbacks: (i) the BDDs may blow up and exhaust ∀x ∃y0 , y1 . ¬φ(x, y0 ) ∧ φ(x, y1 ) the memory, (ii) the extracted circuit may be larger than the logic it replaces, and (iii) the runtime overhead may be Important features of this approach are: too high. In contrast, the proposed approach is based on greedy lo- (i) It is circuit based, while early work on reparameteri- cal transformations, capturing only a subset of optimization zation was based on transition relations [4]. opportunities. However, memory consumption is modest, (ii) In its simplest form, the proposed restructuring re- runtimes are very low, and the resulting design is always places some internal nodes by new primary inputs and smaller, or of the same size, as the original design. It is remove dangling logic. shown experimentally that the proposed method leads to sizeble reductions when applied for circuits produced by (iii) The analysis is completely combinational: no informa- localization abstraction [5]. tion on the reachable state-space is used. (iv) If the property was disproved after reparametrization, 2 Fast Reparameterization it is straight-forward to remap the resulting counter- example to depend on the original primary inputs. The fast reparameterization algorithm is based on the fol- lowing observation: if a node dominates1 a set of PIs, and It is important to realize that by analyzing and applying those PIs are sufficient to force both a zero and a one at reductions in topological order from PIs to POs, regions that node, regardless of the values given to the other PIs amenable to reparameterization are gradually reduced to and state variables, then that node can be replaced by a contain fewer gates and PIs. By this process, the result of new primary input, while the unused logic cone driving repeatedly applying local transformations can lead to a sub- stantial global reduction. In the current implementation, 1 A node n dominates another node m iff every path from m to a the above formula is evaluated by exhaustive simulation of primary output goes through node n. 4 [cand] [cand] Fast Reparameterization − Compute dominators. For a DAG with bounded & & in-degree (such as an And-Inverter-Graph), this is a linear operation in the number of nodes (see Figure 3). − For each PI, add all its dominators to the set of “candidates”. & & ! & − For each candidate c, in topological order from inputs to outputs: - Compute the set D of nodes dominated by c (Figure 4). ! ? ! ? - Denote the PIs in D “internal” inputs. [cand] - Denote any node outside D, but being a direct fanin of a node inside D, an “external” input (may be any gate type). - Simulate all possible assignments to the internal and ex- ternal inputs. If for all external assignments there exists & an internal assignment that gives a 0 at c, and another in- ternal assignment that gives a 1 at c, then substitute c by a new primary input. ! & Figure 2. Steps of the reparameterization algorithm. & & Compute Dominators − Initialize all POs to dominate themselves ! ? ! ? − Traverse the netlist in reverse topological order (from POs to PIs), and mark the children of each node as being domi- nated by the same dominator as yourself unless the child has Figure 1. Example of subgraphs that can be reduced. Here “!” already been assigned a dominator denote nodes we can control (a PI dominated by the output node); “?” denote nodes that can assume any value beyond our − For already marked children, compute the “meet” of the control. In the above examples, the output node “[cand]” can two dominators, i.e. find the first common dominator. If be forced to both a zero and a one by choosing the right values there is no common dominator, mark the node as dominat- for the “!”, regardless of the values of the “?”. In such cases, ing itself. the output node is replaced by a new PI. Figure 3. Review of the “finding direct dominators” algorithm for DAGs. For a more complete treatment of this topic, see [8], the logic cone rooted in the given node while the cone is or for a more precise description, the source code of “compute- limited to 8 inputs. Limiting the scope to cones with 8 in- Dominators()” in “ZZ/Netlist/StdLib.cc” of ABC-ZZ [11]. puts and simulating 256 bit patterns (or eight 32-bit words) seems to be enough to saturate the reduction achievable on the benchmarks where the method is applicable. Compute Dominated Area Some typical reductions are shown in Figure 1. The graphs should be understood as sub-circuits of a netlist be- area = {w dom} – init. set of gates to the dominator ing reparameterized. The exclamation marks denote “inter- count = [0, 0, . . ., 0] – count is a map “gate → integer” for w ∈ area: nal” PIs dominated by the top-node, and hence under our for v ∈ faninsOf (w ): control; and the question marks denote gates with fanouts count[v ]++ outside the displayed logic cone, for which no assumption if count[v ] == num of fanouts[v ]: on their values can be made. The full algorithm is described area = area ∪ {v } in Figure 2. Counterexample reconstruction. There are several Figure 4. Find nodes used only by gate “w dom”. This set is sometimes called maximum fanout free cone (MFFC). The outer ways that a trace on the reduced netlist can be lifted to the for-loop over the elements of area is meant to visit all elements original netlist. For instance, the removed logic between added by the inner for-loop. For this procedure, flops are treated the new PIs and the old PIs can be stored in a separate as sinks, i.e. having no fanins. netlist. The trace on the reduced netlist can then be pro- 2 5 jected onto the original PIs by rerunning the simulation ing Competition 2012 were considered. Localization ab- used to produce the reparameterized circuit, and for each straction [5] was applied with a timeout of one hour to each candidate pick an assignment that gives the correct value. benchmark and the resulting models meeting the following But even simpler, one can just put the original netlist into criteria were kept: a SAT solver and assert the values from the trace onto the appropriate variables and call the SAT solver to complete – At least half of the flops were removed by abstraction. the assignment. In practice, this seems to always work well. – The abstraction was accurate (no spurious counterex- amples). 3 Improvements – At least one of the verification engines could prove the The algorithm described in the previous section replaces property within one hour. internal nodes with inputs, and thus only removes logic. If we are prepared to forgo this admittedly nice property and The sizes of benchmarks selected in this way are listed in occasionally add a bit of new logic, then nodes that are table Table 1. All those models were given to the reparam- not completely controllable by their dominated inputs can eterization engine, both in weak mode and strong mode, still be reparameterized by the following method: for node the latter using the improvements described in section 3. with function φ(x, y), where x are external inputs and y are The reparameterized models were also post-processed with internal inputs, compute the following two functions: a quick simplification method called “shrink” which is part φ0 (x) ≡ ∀y.¬φ(x, y) of the ABC package [7]. The longest runtime for weak repa- φ1 (x) ≡ ∀y.φ (x, y) rameterization was 16 ms, for strong reparameterization 28 ms and for the simplification phase 50 ms.2 Reductions Using these two functions, φ can be resynthesized using a are listed in table Table 2. single input ynew by the expression: For comparison, Table 2 also include the results of run- ¬φ0 (x) ∧ (φ1 (x) ∨ ynew ) ning an industrial implementation of the BDD based algo- rithm of [1]. Because runtimes are significantly longer with In other words, if two or more inputs are dominated by this algorithm, they are given their own column. These the node φ, a reduction in the number of inputs is guar- results were given to us from IBM, and according to their anteed. Depending on the shape of the original logic, and statement “are not tweaked as much as they could be”. how well new logic for φ0 and φ1 is synthesized, the num- All benchmarks were given to three engines: Property Di- ber of logic gates may either increase or decrease. In our rected Reachability [2, 6], BDD-based reachability [3], and implementation, logic for φ0 and φ1 is created by the fast Interpolation-based Model Checking [9]. The complete table irredundant sum-of-product (“isop”) proposed by Shin-ichi of results is given in Table 3. A slice of this table, showing Minato in [10]. We greedily apply this extended method only results for PDR, with and without (strong) reparam- for all nodes with two or more dominated PIs, even if it eterization, is given in Table 4 together with a scatter plot. leads to a blow-up in logic size. To counter such cases, fast logic synthesis can be applied after the reparameterization. Obviously, there are many ways to refine this scheme. Analysis. Firstly, we see a speedup of 100x-1000x over previous work in the runtime of the reparameterization al- gorithm itself, with comparable quality of results for the 4 Future Work application under consideration (models resulting from lo- calization abstraction). This means the algorithm can al- Another possible improvement to the method is extending ways be applied without the need for careful orchestration. the reparameterization algorithm to work for multi-output Secondly, we see an average speedup of 2.5x in verifica- cones. As an example, consider a two-output cone where tion times when applying reparameterization in conjunc- the outputs can be forced to all four combinations {00, tion with PDR, which is also the best overall engine on 01, 10, 11} by choosing appropriate values for dominated these examples. For two benchmarks, 6s121 and 6s150, inputs. In such a case, the cone can be replaced by two BDD reachability do substantially better than PDR, and free inputs. If some of the four combinations at the out- for the latter (where runtimes are meaningful) the speedup puts are impossible under conditions expressed in terms of due to reparameterization is greater than 3x. Furthermore, non-controllable signals, a logic cone can be constructed to for BDD reachability one can see that on several occa- characterize these conditions and reduce the number of PIs sions (6s30 in particular), reparameterization is completely by adding logic similar to the case of a single-output cone. crucial for performance. Finally, interpolation based mod- elchecking (IMC) seems to be largely unaffected by repa- rameterization. 5 Experiments 2 Benchmarks from HWMCC’12 are quite small. For comparison: As part of the experimental evaluation, all benchmarks running reparameterization on a 7 million gate design from one of our from the single-property track of the Hardware Modelcheck- industrial collaborators took 4.1 s. 3 6 Abstraction Phase ! ! Design ! #And #PI #FF ! Depth ! ! 6s102 ! ! 6,594 72 1,121 → 56 ! ! 23 6s121 ! ! 1,636 99 419 → 110 ! ! 19 6s132 ! ! 1,216 94 139 → 113 ! ! 7 6s144 ! ! 41,862 480 3,337 → 236 ! ! 18 6s150 ! ! 5,448 146 1,044 → 323 ! ! 103 6s159 ! ! 1,469 13 252 → 18 ! ! 4 6s164 ! ! 1,095 91 198 → 93 ! ! 16 6s189 ! ! 36,851 479 2,434 → 259 ! ! 18 6s194 ! ! 12,049 532 2,389 → 198 ! ! 45 6s30 ! 1,043,139 ! 32,994 1,195 → 128 ! ! 32 6s43 ! ! 7,408 30 965 → 310 ! ! 25 6s50 ! ! 16,700 1,570 3,107 → 207 ! ! 52 6s51 ! ! 16,701 1,570 3,107 → 209 ! ! 65 bob05 ! ! 18,043 224 2,404 → 146 ! ! 106 bob1u05cu ! 32,063 224 4,377 → 146 ! 106 Table 1. Sizes of original designs and abstract models. Column #FF shows how many flip-flops were turned into unconstrained inputs by localization abstraction. The removed FFs show up as PIs in the other tables. Last column shows the BMC depth used by the abstraction engine (see [5] for more details). Reparameterization ! ! ! ! ! No Reparam. ! Weak Rep. ! Strong Rep. ! BDD Reparam. ! ! ! ! ! ! ! ! Design ! #And #PI ! #And #PI ! #And #PI ! #And #PI Runtime ! ! ! ! 6s102 ! ! 6,594 1,137 !! 1,247 331 !! 1,188 267 !! 1,283 285 71.91 s 6s121 ! ! 1,636 408 ! ! 627 120 ! ! 559 66 ! ! 732 41 0.27 s 6s132 ! ! 1,216 120 ! 1,108 ! 62 ! 1,102 ! 55 ! 2,731 ! 36 0.80 s 6s144 ! 41,862 3,580 !! 10,172 1,038 !! 9,494 812 !! 13,259 889 60.50 s ! 6s150 ! ! 5,448 867 !! 3,062 506 !! 2,213 89 !! 2,231 46 1.81 s 6s159 ! ! 1,469 247 !! 116 20 !! 114 19 !! 256 13 0.02 s 6s164 ! ! 1,077 196 !! 661 109 !! 499 41 !! 3,413 40 11.61 s 6s189 ! 36,851 2,654 !! 10,033 1,004 !! 9,552 794 !! 12,051 814 63.11 s ! 6s194 ! 12,049 2,723 ! 1,366 184 ! 1,348 166 !! 1,612 121 10.37 s ! ! ! 6s30 ! 102,535 34,061 !! 1,508 307 !! 1,184 205 !! 603 50 0.74 s ! 6s43 ! ! 7,408 685 !! 3,451 304 !! 3,218 202 !! 17,691 101 2.07 s 6s50 ! 16,700 4,470 !! 1,841 350 !! 1,652 270 !! 4,319 752 46.57 s ! 6s51 ! ! 16,701 4,468 ! ! 1,828 350 ! ! 1,639 268 ! 1,255 ! 62 16.34 s bob05 ! 18,043 2,358 ! 1,618 187 ! 1,388 52 ! 1,586 43 0.31 s ! ! ! ! bob1u05cu ! 32,063 4,455 ! 1,618 187 ! 1,388 52 ! 1,586 43 0.33 s Table 2. Effect of reparameterization on the abstracted models. “Weak” reparameterization refers to the basic method described in section 2, “Strong” additionally includes the improvements discussed in section 3. Runtimes are omitted as the average CPU time was 4-8 ms (and the longest 28 ms). However, BDD based reparameterization is not as scalable as the method presented in this paper, and runtimes (typically between 100x-1000x longer) are listed for reference. References [4] Pankaj Chauhan, Edmund Clarke, and Daniel Kroening. A SAT-Based Algorithm for Reparameterization in [1] J. Baumgartner and H. Mony. Maximal Input Reduc- Symbolic Simulation. In Proc. of DAC, 2004. tion of Sequential Netlists via Synergistic Reparam- eterization and Localization Strategies. In Proc. of [5] N. Een, A. Mishchenko, and N. Amla. A Single- CHARME, pages 222–237, 2005. Instance Incremental SAT Formulation of Proof- [2] Aaron Bradley. IC3: SAT-Based Model Checking and Counterexample-Based Abstraction. In FM- Without Unrolling. In Proc. of VMCAI, 2011. CAD, 2010. [3] R. E. Bryant. Graph-Based Algorithms for Boolean [6] Niklas Een, Alan Mishchenko, and Robert Brayton. Effi- Function Manipulation. In IEEE Transactions on Com- cient Implementation of Property Directed Reach- puters, vol. c-35, no.8, Aug., 1986. ability. In Proc. of FMCAD, 2011. 4 7 Verification Runtimes ! ! ! ! ! PDR ! ! BDD reach. ! ! IMC ! ! ! ! NoRep. Weak Strong ! NoRep. Weak Strong ! NoRep. Weak Strong ! ! ! 6s102 ! ! 1.7 0.4 0.4 !! 121.2 90.2 204.3 !! – 2619.2 – 6s121 ! ! 64.0 12.3 5.4 !! 0.5 0.3 0.4 !! 10.2 5.8 36.0 6s132 ! ! 7.8 8.1 7.9 !! 2327.3 1375.5 – !! 201.4 384.9 267.9 6s144 ! ! 10.3 12.4 9.6 !! – – – !! 1177.5 942.1 1413.0 6s150 ! ! – 2323.7 – !! 539.3 143.6 189.1 !! – – – 6s159 ! ! 0.0 0.0 0.0 !! 0.1 0.1 0.1 !! 0.1 0.1 0.1 6s164 ! ! 7.0 34.1 6.8 !! – 33.6 0.4 !! 4.4 8.5 3.0 6s189 ! ! 11.9 7.8 8.6 !! – – – !! 738.0 396.5 366.0 6s194 ! ! 4.7 4.6 3.7 !! 307.5 42.5 742.2 !! 798.9 1042.8 1103.3 6s30 ! ! 15.9 45.3 12.3 !! – 17.6 49.2 !! – – – 6s43 ! ! 13.6 11.7 9.1 !! – 1191.1 1390.4 !! – – – 6s50 ! ! 14.7 10.8 5.4 !! 941.6 241.0 1680.8 !! 1795.6 820.0 2348.9 6s51 ! ! 18.7 7.1 4.1 !! 303.8 147.6 1891.2 !! 1806.8 954.8 1737.8 bob05 ! ! 0.3 0.3 0.3 !! 2450.5 2371.4 1253.2 !! 30.3 26.3 25.1 bob1u05cu ! 0.3 0.2 0.3 ! – 2157.6 1088.3 ! 30.3 24.0 24.8 Table 3. Full table of results. Each design after abstraction is given to three engines: Property Directed Reachability (PDR), BDD-based reachability (BDD), and Interpolation-based Model Checking (IMC). Each engine is run on three versions of the model with no/weak/strong reparameterization applied. A dash represents a timeout after one hour. ! Design ! NoRep. Rep. PDR on reparametrized abstract model 6s102 ! ! 1.66 0.40 1000 ! 6s121 ! ! 64.00 5.43 6s132 ! ! 7.82 7.90 6s144 ! 10.34 9.56 100 ! 6s159 ! ! 0.04 0.03 6s164 ! ! 6.97 6.83 6s189 ! ! 11.86 8.59 10 6s194 ! ! 4.71 3.72 6s30 ! ! 15.92 12.29 6s43 ! ! 13.63 9.15 1 6s50 ! ! 14.66 5.44 6s51 ! ! 18.72 4.08 bob05 ! 0.33 0.26 ! 0.1 bob1u05cu ! 0.33 0.26 0.1 1 10 100 1000 PDR on abstract model Table 4. Runtime improvements for PDR. Column “NoRep.” shows runtimes (in seconds) for proving the property of each benchmark using PDR after abstraction, but without reparameterization; column “Rep.” shows runtimes after reparameterization. The scatter plot on the right places these runtime pairs on a log-scale. The average speedup is 2.5x. [7] Berkeley Logic Synthesis Group. ABC: A Sys- Products Forms from Binary Decision Diagrams. In tem for Sequential Synthesis and Verification. Proc. of SASIMI, 1992. http://www.eecs.berkeley.edu/˜alanmi/abc/, v00127p. [11] Berkeley Verification and Synthesis Research Center. [8] T. Lengauer and R.E. Tarjan. A Fast Algorithm for ABC-ZZ: A C++ framework for verification & syn- Finding Dominators in a Flowgraph. In ACM Trans- thesis. https://bitbucket.org/niklaseen/abc-zz. actions on Programming Languages and Systems, Vol. 1, No. 1, July, pages 121–141, 1979. [9] K. L McMillan. Interpolation and SAT-based Model Checking. In Proc. of CAV, 2003. [10] S. Minato. Fast Generation of Irredundant Sum-Of- 5 8 LEC: Learning-Driven Data-path Equivalence Checking Jiang Long∗ , Robert K. Brayton∗ , Michael Case† ∗ EECS Department, UC-Berkeley {jlong, brayton}@eecs.berkeley.edu † Calypto Design Systems {mcase}@calypto.com Abstract— equivalence problem is NP-complete. However, on the extreme In the LEC system, we employ a learning-driven approach for end, the complexity becomes O(1) of the size of the network solving combinational data-path equivalence checking problems. if the two designs are structurally the same. An NP-complete The data-path logic is specified using Boolean and word-level operators in VHDL/Verilog. The targeted application area are C- problem can be tackled by using SAT-solvers as a general to-RTL equivalence checking problems found in an industrial set- procedure. To counter the capacity limitation of SAT-solving, ting. These are difficult because of the algebraic transformations it is crucial to reduce the complexity by identifying internal done on the data-path logic for highly optimized implementations. match points and by conducting transformations to bring in Without high level knowledge, existing techniques in bit-level more structural similarity between the two sides of the miter equivalence checking and QF BV SMT solving are unable to solve these problems effectively. It is crucial to reverse engineer logic. such transformations to bring more similarity between the two sides of the logic. However, it is difficult to extract algebraic Verilog operators SMT QF BV operators logic embedded in a cloud of Boolean and word-level arithmetic Boolean &&, k, !, ⊕, mux and, or, not, xor, ite operators. To address this, LEC uses a compositional proof bit-wise &, |, ∼, ⊕, mux bvand, bvor, bvnot, bvxor, bvite methodology and analysis beyond the bit and word level by incor- arithmetic +, −, ∗, /, % bvadd, bvsub, bvmul, bvdiv, bvmod porating algebraic reasoning through polynomial reconstruction. extract [] extract LEC’s open architecture allows new solver techniques to be concat {} concat integrated progressively. It builds sub-model trees, recursively comparator <, >, ≤, ≥ bvugt, bvult, bvuge, bvule transformating the sub-problems to simplify and expose the shifter ,  bvshl, bvshr actual bottleneck arithmetic logic. In addition to rewriting TABLE I rules that normalize the arithmetic operators, LEC supports S UPPORTED OPERATORS ( UNSIGNED ) conditional rewriting, where the application of a rule is dependent on the existence of invariants in the design itself. LEC utilizes A. Motivation both functional and structural information of the data-path logic to recognize and reconstruct algebraic transformations. A The differences between the two data-path logics under case-study illustrates the steps used to extract the arithmetic equivalence checking are introduced by various arithmetic embedded in a data-path design as a linear sum of signed transformations for timing, area and power optimizations. integers, and shows the procedures that collaboratively led to a successful compositional proof. These optimizations are domain specific and can be very spe- cialized towards a particular data-path design and underlying technology. They have the following characteristics: I. I NTRODUCTION • The two sides of the miter logic are architecturally With the increasing popularity of high-level design method- different and have no internal match points. ologies there is renewed interest in data-path equivalence • Many expensive operators such as adders and multipliers checking [3][13][18][20]. In such an application, a design are converted to cheaper but more complex implemen- prototype is first implemented and validated in C/C++, and tations and the order of computations are changed. It is then used as the golden specification. A corresponding Ver- not a scalable solution to rely on SAT solving on the ilog/VHDL design is implemented either manually or au- bit-blasted model. tomatically through high-level synthesis tool [2][4][15]. In • The parts of the transformed portion are embedded in both cases, a miter logic for equivalence checking is formed a cloud of bit and word level operators. Algebraic ex- to prove the correctness of the generated RTL model by traction [8][26] of arithmetic logic based on structural comparing it against the original C/C++ implementation. patterns is generally too restrictive to handle real-world The data-path logic targeted in this paper is specified post-optimization data-path logic. using Verilog/VHDL. The bit and word-level operators in • Word-level rewriting uses local transformation. Without Verilog/VHDL have the same semantic expressiveness as SMT high-level information, local rewriting is not able to make QF BV theory[5]. Table I gives a one-to-one correspondence the two sides of the miter logic structurally more similar. between Verilog and QF BV unsigned operators. Signed arith- Lacking high-level knowledge of the data-path logic, the metic operators are also supported. The complexity of such an equivalence problems can be very difficult for gate-level equiv- 9 alence checking and general QF BV SMT solvers. Strategi- Original miter cally, LEC views the bottleneck of such problems as having new miter target been introduced by high-level optimizations and employs a Verific Parser Frontend collaborative approach to isolate, recognize and reconstruct the high-level transformations to simplify the miter model by VeriABC bringing in more structural similarities. SSA Network B. Contributions Word−level Bit −level Learning−based The LEC system incorporates compositional proof strate- Simulator AIG Transformations gies, uses rewriting to normalize arithmetic operators, and ABC Transformed conducts analysis beyond bit and word level. The collaborating solvers RTL procedures help to expose the actual bottleneck in a proof of equivalence. The novel aspects of this system are: SAT UNSAT 1) It uses global algebraic reasoning through polynomial Fig. 1. Overall tool flow reconstruction. In the case-study, it uses the functional information of the design to reverse engineer the arith- metic expression as a linear sum and also uses a struc- LEC tries to solve the miter directly using random simula- tural skeleton of the original data-path to achieve the tion on the word-level simulator or by ABC[1]’s equivalence equivalence proof. checking procedure dcec, which is a re-implementation of 2) It supports conditional rewriting and proves required iprove[24]. If unresolved, LEC applies transformations to the invariants as pre-conditions. SSA and produces sub-models in Verilog miter format from 3) It uses recursive transformations that target making both which LEC can be recursively applied. The overall system sides of the miter logic structurally more similar and integration is described in Section IV. hence more amenable to bit-level SAT sweeping. 4) It has an open architecture, allowing new solver tech- III. L EARNING TECHNIQUES niques to be integrated progressively. In this section, we present the techniques implemented in Through a case study, we demonstrate the steps that were LEC. Even though some are simple and intuitive, they are used to reconstruct the arithmetic embedded in a data-path powerful when integrated together as demonstrated in the design as a linear sum of signed integers, as well as all the pro- experimental results. All techniques are essential bdecause cedures that compositionally led to a successful equivalence LEC may not achieve a final proof if any one is omitted. proof. The experimental results demonstrate the effectiveness Their interactions are illustrated in the case-study in Section of these collaborating procedures. V. miter miter miter C. Overview The overall tool flow is described in Section II. Learning F=G F=G F=G techniques and system integration are presented in Section III F (̄x ) G(̄x ) F (̄x ) G(̄x ) F (̄x ) G(̄x ) and IV. A case study is presented in Section V. Experimental results is presented in Section VI followed by a comparison red blue red blue with related work and conclusion. purple kept II. T OOL FLOW input : ̄x abstract input x̄' LEC takes Verilog/VHDL as the input language for the data- input : ̄x redundant path logic under comparison. Internally, a miter network, as in Figure 2(a), is constructed comparing combinational logic input : ̄x (a)miter network (b )structurally hashed functions F and G. Figure 1 illustrates the overall tool flow. (c )abstraction First, the Verific RTL parser front-end[6] is used to compile Fig. 2. Miter network input RTL into the Verific Netlist Database. VeriABC[23] processes the Verific netlist, flattens the hierarchy and produces an intermediate DAG representation in static single assignment A. Structural information (SSA) form, consisting of Boolean and word-level operators An SSA netlist is a DAG of bit and word-level operators as shown in Table I. Except for the hierarchical information, annotated with bit-width and sign information. In the tool flow, the SSA is a close-to-verbatim representation of the original both Verific and VeriABC perform simple structural hashing RTL description. From SSA, a bit-blasting procedure generates at the SSA level, merging common sub-expressions. After a corresponding bit-level network as an AIG (And-inverter merging, the miter logic is divided into three colored regions graph). Word-level simulation models can be created at the using cone of influence (COI) relations, as in Figure 2 (b). SSA level. ABC[1] equivalence checking solvers are integrated • Red: if the node is in the COI of F only as external solvers. 10 • Blue: if the node is in the COI of G only removes the logic between the cut and the inputs. An abstract • Purple: the node is in the COI of both sides of the miter model is formed by replacing the cut signals with free inputs i.e. common logic x̄0 . If this abstracted miter is UNSAT, then the original miter The purple region is the portion of the miter logic that is UNSAT. In our current implementation, LEC traverses the has been proved equivalent already, while the red and blue SSA network in topological order from the inputs. As each regions are the unresolved ones. LEC makes progress by node is tentatively replaced with new PIs, simulation is used reducing the red/blue regions and increasing the purple region. to validate the replacement. If successful, the node is omitted The common logic constrains the logic for the red and blue and replaced with the new PIs and the next node is processed regions, which may be abstracted (see Section III-E) to reduce similarly. redundancy and possibly expose the real bottleneck in a proof. A successful abstraction step removes irrelevant logic and exposes a smaller unresolved region of the miter logic, al- lowing LEC to continue using other procedures. In addition, B. Simulation model as seen from experimental results, the reduction of common Two word-level simulators are generated from the SSA logic can reduce significantly the amount of complexity for network. One is a native interpreted model. The other uses downstream SAT-solving, e.g. when common multipliers being the open-source Verilator[29] for compiled simulation. From removed from the miter logic. An unsuccessful abstraction the SSA network, LEC automatically generates C++ code for when the abstract miter becomes SAT, indicates the existence pseudo-random input drivers and for monitoring design be- of a rare event not being captured during random simulations. havior. Verilator compiles the Verilog miter logic, links in the Often, this gives hints for selecting case-splitting candidates. generated C++ code and produces a simulator as a standalone executable. Efficient and effective simulation is crucial in our current flow in capturing potential constants and potential F. Rewriting internal equivalent points at the SSA level. Simulation is also Similar to [20], word-level rewriting transforms an SSA used to reduce common logic in the abstraction computation network into a structurally different but functionally equivalent procedure. one. Through rewriting, certain equivalence checking problems can become much simpler. In our experience, a multiplier is C. Bit-level model often a source of difficulty in data-path equivalence checking. If two multipliers from opposite sides of the miter are matched As shown in Figure 1, an AIG is created from the SSA exactly, LEC can simplify the miter through structural hashing network by bit-blasting. LEC calls ABC[1]’s SAT sweeping and treat them as common logic. This is most effective when procedure dcec to perform direct solving at the bit level. Using combined with the abstraction procedure as the common the AIG model, the native SAT solver embedded in LEC can multiplier can now be totally removed. be used to obtain a formal proof for a particular query. Typical In LEC, a few rules are hard-coded through pattern match- queries are for extracting constant nodes, proving suspected ing applied to the SSA network. The goal is to process equivalent pairs of points or conducting particular learning for multiplications so that they can be matched exactly. This rewriting. Book-keeping information between the SSA nodes rewriting is implementation specific; for illustration purposes, and the AIG nodes allows queries to be constructed at the we list a few rewriting rules in Table II using Verilog notation word-level and verified at the bit-level. The result is then used and the semantics of the operators. to simplify the SSA network. The first rule is the normalization of multiplier operands. If a multiplier uses a partial product generator and a compressor D. Constants and Potential Equivalent Points (PEPs) tree, switching the operands of the multiplication becomes a At the word-level, candidates for constants and PEPs are very hard SAT problem because at the bit level the imple- identified through simulation and SAT queries are posed. Each mentation is not symmetrical. It is almost imperative to apply such SAT query is constructed and checked at the bit-level. this rule whenever possible. The second and third rules use SAT-solving is configured at a low-effort level (run for a few the distributive laws of multiplication and multiplexing. Rules seconds) for these types of queries. Proven constants and PEPs 4 and 5 remove the shift operator  when it is used with are used immediately to simplify the SSA network, leading extract and concat because it is hard for multiplication to to a new sub-model of less complexity. LEC then continues be restructured through the  operator. Rule 6 distributes to process the sub-model. In the presence of unproven PEPs, multiplication through the concat of two bit vectors using LEC can choose one as the next miter target, normally the +. It uses the fact that the concatenation {a, b[n − 1 : 0]} is smallest in terms of the number of nodes in its COI. The proof equivalent to a ∗ 2n + b[n − 1 : 0]. progresses as constants and PEPs are identified and used to The following is a more complex rule that distributes + over simplify the miter model. the extract operator. The right hand side is corrected with a third term, which is the carry bit from adding the lower n bits of a and b. E. Abstraction (a + b)[m : n] = As illustrated in Figure 2 (c), in computing an abstraction, (1) LEC computes a cut in the purple region (common logic), and a[m : n] + b[m : n] + (a[n − 1 : 0] + b[n − 1 : 0])[n] 11 Before After implemented to trace back from the sel port of a mux node 1 a∗b b∗a 2 mux(cond, d0, d1) ∗ c mux(cond, d0 ∗ c, d1 ∗ c) through its Boolean fanins and choose the candidates that have 3 mux(cond, d0, d1)[m : n] mux(cond, d0[m : n], d1[m : n]) the highest controllability. 4 a[m : 0]  n { (m-n)’b0, a[m:n] } Another advantage of case-splitting is that the co-factored 5 (a[m : 0]  n)[m − n : 0] a[m : n] sub-models contain new candidates for constants and PEPs, 6 {a, b[n − 1 : 0]} ∗ c a ∗ c  n + b[n − 1 : 0] ∗ c which lead to other down-stream transformations not possible TABLE II before. Case-splitting also reduces the amount of Boolean R EWRITING RULES logic in the SSA network and exposes the data-path logic to high-level learning such as polynomial construction. Repeatedly applying the above rules, LEC transforms the SSA network and keeps only the ∗ and + operators, enhancing H. Polynomial construction the possibility of multipliers to be matched. Note that the above rule (1) and Rule 4-6 in Table II are correct for Reasoning at the word-level, rewriting rules are based on unsigned operators. Currently, for signed operators, due to the arithmetic properties of the corresponding operators such sign extension and the two’s complement representation of the as the commutative law of integer multiplication. However, operands, we have not implemented a good set of rewriting rewriting applies only local transformations and does not have rules. a global view. In situations when the miter logic is constructed 1) Conditional rewriting: The following equation from arithmetic optimization at the polynomial level, local rewriting is not able to bring similarity into the miter for (a  c) ∗ b = (a ∗ b)  c (2) further simplification. In such a situation, LEC tries to recon- struct the polynomial of the whole miter model to establish reduces the bit-width of a multiplier on the left hand side to equivalence through arithmetic or algebraic equivalences and a smaller one on the right. It is correct if a, b, c are integers then use high level transformations to prove the equivalence but incorrect in Verilog semantics, which uses modulo integer of the original miter. arithmetic. However, if the following is true within the miter As a generic procedure, LEC follows four steps to prove a model in modulo integer semantics miter network F (x̄) = G(x̄) where F and G are the top level ( (a  c)  c) == a (3) signals being compared, and x̄ is the vector of input variables (bit-vectors): then equation (2) is valid. In such a situation, LEC identifies the pattern on the left hand side of (2) in the SSA network 1) Conjecture (possibly by design knowledge) about the and executes a SAT query concerning (3) using the AIG model algebraic domain of the polynomial, e.g. signed vs. through bit-level solvers. The transformation to the left hand unsigned integer, modulo integer arithmetic, the order side of (2) is carried out only if the query is proven to be an of the polynomial etc. These conjectures set up the invariant. Such a transformation may produce an exact match framework and semantics for polynomial reconstruction of a∗b afterwards, which can be crucial for achieving the final as illustrated in the case-study of Section V. proof. 2) Determine a polynomial f and create a logic network F 0 such that the following can be proved formally. G. Case-split F 0 implements f (5) Case-splitting on a binary signal, cofactors the original miter F 0 = F (6) model into two sub-models. The miter is proven if both sub- models are proven, or falsified if any sub-model is falsified. Al- How f is constructed is domain and test-case dependent. though exponential in nature, if many signals are chosen, case- In the case-study of Section V, we use simulation splitting can simplify the underlying bit-level SAT solving patterns to probe for the coefficients of a linear function. significantly. For example, it is difficult to prove the following 3) Determine a polynomial g and create a logic network miter structure directly through bit-blasting and SAT solving G0 such that the following can be proved formally. at the AIG level G0 implements g (7) (x + y) ∗ (x + y) == x ∗ x + 2 ∗ x ∗ y + y ∗ y (4) miter G0 = G (8) where x is a 32-bit integer and y a single binary signal. 4) Establish the following equivalence formally at the al- However, it can be proven easily if case-splitting is done on gebraic level. y = 0 and y = 1. After constant propagation, the bit-level solver can prove both sub-models easily. f =g (9) The current case-splitting mechanism supports cofactoring on an input bit or input bit-vector. In verifying the test cases The combination of Items 2, 3, and 4 establishes the equiva- experienced so far, the case splits are conducted on a bit, a lence proof of the original miter model F = G. In constructing bit-vector equal to zero or not, or on the lsb or msb of a F 0 and G0 , we try to make them as structurally similar to F bit-vector equals to zero or not. A heuristic procedure can be and G as possible. Details are given in Section V. 12 Sub-model Return IV. S YSTEM I NTEGRATION simulator SAT The above learning techniques are integrated in LEC as a set AIG SAT/UNSAT of logically independent procedures. Each procedure produces simplified SAT/UNSAT one or more sub-models, illustrated as a tree in Figure 3. abstraction UNSAT The root node is the current targeted Verilog miter model. rewrite SAT/UNSAT case-split SAT/UNSAT It has eight children. The simulator and AIG models are linear construction SAT/UNSAT the ones described in Figure 1. The simplif ied sub-model is PEP SIMPLIFY generated by constant propagation and merging proven PEPs. TABLE III The abstraction and rewrite sub-models are created by the S UB MODEL RETURN VALUE abstraction and rewrite procedures in the previous section. The case-split sub-model consists of a set of sub-models, corresponding to the cofactoring variables selected. In the is disjunctive, which includes the root and all the leaf nodes current implementation, the user needs to input the set of in bold font. The case-split and linear construction nodes signals to case-split on; eventually they will be selected by are conjunctive; a P EP node is disjunctive. The semantics, heuristics. The linear-construction node has two sub-models shown in the following tables, are used to resolve the proof which will be explained in detail in Section V. When PEPs result of the parent model from its immediate sub-models. To are identified through simulation, a P EP node is create with complete the calculus, we introduced two values: CON and the set of unproven-PEPs as sub-models. BOT, where CON stands for an internal conflict indicating a potential LEC software bug and BOT is the bottom of the Verilog Miter Model simulator value lattice and acts like an uninitialized value. AIG simplified k SAT UNS UNK SMP CON BOT abstraction SAT SAT CON SAT SAT CON SAT rewrite UNS CON UNS UNS UNS CON UNS case-split case 0 UNK SAT UNS UNK SMP CON UNK case ... 1 SMP SAT UNS SMP SMP CON SMP case n CON CON CON CON CON CON CON linear-construction case F BOT SAT UNS UNK SMP CON BOT case G PEP pep 0 TABLE IV pep 1 D ISJUNCTIONS OF MODELS ... pep m & SAT UNS UNK SMP CON BOT SAT SAT SAT SAT n/a CON SAT LEC UNS SAT UNS UNK n/a CON UNS Fig. 3. Branching sub-model tree UNK SAT UNK UNK n/a CON UNK SMP n/a n/a n/a n/a n/a n/a Two nodes in the sub-model tree are terminal. One is the CON CON CON CON n/a CON CON simulator model which can falsify the miter through random BOT SAT UNS UNK n/a CON BOT simulation. The other is the AIG model where ABC’s bit-level TABLE V dcec procedure is applied. The rest of the leaf models (in bold C ONJUNCTION OF MODELS font) are generated as Verilog miter models, which have the same format as the root node. LEC procedures can be applied Tables IV and V are the truth tables for the disjunction and recursively to these leaf nodes to extend the sub-model trees to conjunction semantics of the return values, in which UNS, simpler ones. The LEC proof process progresses by expanding UNK, SMP stand for UNSAT, UNKNOWN, and SIMPLIFY. the sub-model tree. A sub-model is resolved as SAT or UNSAT Assuming a bug free situation, at a disjunctive node, if either from its sub-models’ proof results. SAT or UNSAT is returned from a sub-model, this is the final Since there are no logical dependencies between sibling proof result for the parent. In conjunction, the parent must wait sub-models, any branch can be chosen to continue the proof until all sub-models are resolved as UNSAT before deciding process. Sibling sub-models can be forked in parallel from that its result is UNSAT, while any SAT sub-model implies a parent process. A node in the sub-model tree determines the current model is SAT. A P EP node returns SIMPLIFY its proof result from its children. Table III gives the possible to its parent if one of its sub-models, say pepi , is proven return values from the first level sub-models. SIMPLIFY is UNSAT. In this case, the parent model can apply another round returned by a P EP node to its parent model when at least of simplification to obtain a new simplif ied sub-model by one of its sub-models, pepi , is proven UNSAT, notifying the merging the node pair in the just-proved pepi . The proof log in parent node to simplify further with the newly proved pepi . Figure VI is a sample sub-model tree where only the branches Depending on the logical relationships between a parent and that contributed to the final proof are shown. Indentation its immediate sub-models, a node is either disjunctive or con- indicates the parent-child relationship. Recursively, the proof junctive in semantics. In Figure 3, a Verilog miter model node result of the top level target is evaluated as UNSAT. 13 { "case split": { three numbers on the right are the node counts in the red, "case_0": "UNSAT by AIG" blue and purple regions (common logic) of the SSA network "case_1": { as distinguished in Figure 2(a). Only those sub-models that "simplified": { contributed to the final proof are shown in the figure. Others "abstraction": { are ignored. As seen in Figure 5, the case-split procedure is "case split": { "case_00": "UNSAT by AIG", 1.original model : 366 332 776 "case_01": "UNSAT by AIG", 2. case-split "case_10": "UNSAT by AIG", 3. case_0 : 366 331 844 "case_11": "UNSAT by AIG" 4. AIG : UNSAT }, 6. case_1 : 366 332 776 }, 7. simplified : 344 289 675 }, 8. abstraction : 344 289 29 }, 9. case-split }, 10. case_0 : 344 289 31 } 11. AIG : UNSAT ------------------------------------- 12. case_1 : 344 289 31 Miter proof result: [Resolved: UNSAT] 13. simplified : 343 288 27 ------------------------------------- 14. PEP Fig. 4. Illustration of proof log 15. pep_0 : 335 280 27 16. linear construction 17. case_F Using this sub-model tree infrastructure, any new proce- 18. AIG : UNSAT dures discovered in the future can be plugged into the system 19. case_G easily. Also, the system is fully parallelizable in that siblings 20. AIG : UNSAT 21. simplified : 10 10 305 can be executed at the same time. The proof process can be 22. AIG : UNSAT retrieved from the expanded sub-model tree. Fig. 5. Sub-model proof tree V. C ASE S TUDY The design in this case-study is an industrial example taken applied twice, at lines 2 and 9. Both models have a single- from the image processing domain. We verify specification bit input port, which was selected for cofactoring. ABC[1] = implementation where the “specification” is a manually- immediately proved the first cofactored case, case 0 (3 and specified high-level description of the design. “Implemen- 10) , using the AIG model at 4 and 11. The time-out for the tation” is a machine-generated and highly optimized RTL dcec run was set to two seconds. Abstraction was applied at 8, implementation of the same design using[2]. The miter logic significantly reducing the common logic from 675 to 29 SSA is obtained through SLEC[3]. Therefore, the miter problem nodes, and effectively removing all the comparator logic. We is verifying that the high-level synthesis (HLS) tool did not tried abstraction on the original model without the case-split modify the design behavior. procedure and it failed to produce any result. The case-split at This miter is sequential in nature, but here we examine a 2 removed enough Boolean logic and eliminated some corner bounded model checking (BMC) problem which checks the cases such that the abstraction procedure was able to produce correctness of the implementation at cycle N. This renders the an abstract model successfully. problem combinational. This is industrially relevant because Model 15 is the smallest unproved PEP from model 13. the sequential problem is too hard to solve in general, and It is proved using the linear construction procedure at 16, even the BMC problem at cycle N becomes too difficult for which we shall describe in detail in Section V-A. Model industrial tools. 21 is the simplified model of model 13 after merging the The original design (specification) consists of 150 lines just-proved pep0 . After simplification, most of the logic in of C++. It went through the Calypto frontend[3] and was model 21 became common logic through structural hashing, synthesized into a word-level netlist in Verilog. The generated leaving only 10 nodes in each of the blue and red regions. miter model has 1090 lines of structural Verilog code with 36 Model 21 was proved quickly by ABC which concludes the input ports: 29 of which are 7 bits wide, 2 are 9 bits, 4 are proof of the original miter. In this case, the linear-construction 28 bits and one is a single-bit wire. The miter is comparing procedure was crucial in attaining the proof. However, the two 28-bit values. We do not have knowledge about what the case-split, simplification, abstraction, and PEP models also are design does except through structural statistics: no multipliers, very important because they collaborate in removing Boolean, many adders, subtractors, comparators, shifters etc., together mux and comparator logic etc, but keeping only the part of the with Boolean logic. From a schematic produced from the original miter logic which constitutes a linear function. Only Verilog, there seems to be a sorting network implemented at this point, can a proof by the linear construction procedure using comparators, but we can not tell anything further. succeed. Figure 5 illustrates the compositional proof produced by the LEC system by showing the sub-model tree created during A. Linear construction the proof process. Indentations indicate parent and sub-model For model 15 in Figure 5, the SSA network contains many relations and are listed in the order they were created. The +,−,  and  operators along with extract and concat oper- 14 ators, but contains no Boolean operators or muxes. The input input variables to compute the coefficients: ports consist of twenty-five 7-bit or 12-bit wide ports. The b = F (0, 0, ..., 0) miter is comparing two 15-bit wide output ports. At this point, simplification and abstraction can not simplify the model a0 = F (1, 0, ..., 0) − b further. Also, there are no good candidates for case-splitting. a1 = F (0, 1, ..., 0) − b The local rewriting rules can not be applied effectively without ... having some global information to help converge the two sides an−1 = F (0, 0, ..., 1) − b of the miter logic. High-level information must be extracted and applied to prove this miter model. Another round of random simulation on both the logic and After the linear construction procedure through LEC, the the polynomial can be done to increase the likelihood of the miter logic is found to be implementing the following linear conjecture. The same is repeated for G(x̄) to obtain g(x̄). sum in the signed integer domain using two’s complement In integer arithmetic, f is equal to g if and only if the representation: coefficients match exactly for each term: f = g <=> ∀ i ai = a0i and b = b0 (12) −16 ∗ x0 + 2 ∗ x1 + 2 ∗ x2 + 2 ∗ x3 + 2 ∗ x4 + 2 ∗ x5 So checking of f = g is trivial in this case. In other algebraic +2 ∗ x6 + 2 ∗ x7 + 2 ∗ x7 + 2 ∗ x8 + 2 ∗ x9 domains, domain specific reasoning may have to be applied +2 ∗ x10 + x11 + x12 + 2 ∗ x13 + 2 ∗ x14 + 2 ∗ x15 to derive algebraic equivalence e.g. in [28]. +2 ∗ x16 + 2 ∗ x17 + 2 ∗ x18 + 2 ∗ x19 − 2 ∗ x20 + 2 ∗ x21 3) Synthesizing implementations F 0 /G0 for f (f =g), struc- +2 ∗ x22 + 2 ∗ x23 + 2 ∗ x24 + 14 turally similar to F/G: We want to find a Verilog implemen- tation F 0 (x̄) of f such that 1) F 0 implements f 2) F 0 is structurally similar to F One side of the miter implements the above sum as a plain To do this, all nodes in the SSA network with arithmetic linear adder chain (Figure 6(a)), the other side is a highly op- operators +, − are marked, and edges connecting single bits timized implementation using a balanced binary tree structure are removed. A reduced graph is then created from the marked (Figure 6(b)) and optimization tricks, which we don’t fully un- nodes in the remaining graph maintaining the input/output derstand. This is a hard problem for bit-level engines because relations between marked nodes. This graph is a skeleton of the implementation structure of F . For each of its nodes, we annotate it with a conjectured linear sum computed in the same way as in the above steps. The root node F is annotated with f and internal nodes annotated with linear sums fs , ft , etc. For illustration purposes, Figure 7(a) shows such an ... annotated reduced graph for node w. For an arbitrary node w ... (a)linear adder chain (b) balanced adder tree u=... u=... + + Fig. 6. Addition implementation v=... w=c s⋅s+c t⋅t + f st ( ̄x ) w=f w ( ̄x ) v=... + + + + there are no internal match points to utilize. Therefore, LEC s=f s ( ̄x ) s=... t =... resorts to trying a high-level method to establish equivalence + + t=f ( ̄ + + t x) at the polynomial level. The following are the detailed steps ̄x ̄x for this specific case. (a)annotated reduced graph (b) substituted annotation 1) The conjecture: Assume the miter logic is F (x̄) = G(x̄) Fig. 7. Annotated reduced graph as in Figure 2(a). LEC conjectures the following for the arithmetic domain. in the reduced graph with inputs from nodes s and t, from the • Signed integer arithmetic. The numbers are in 2’s com- annotation we have the following: plement representation. s = fs (x̄) • Assume F (x̄) and G(x̄) are implementing the linear sums t = ft (x̄) f and g of the forms w = fw (x̄) X f (x̄) = ai · xi + b (10) We would like to substitute fw with variable s and t, such X that w is a function of s and t in order to follow the structure g(x̄) = a0i · xi + b0 (11) of the skeleton reduced graph. Because all the functions are linear sums, we can compute, using algebraic division, two 2) Determining the coefficients of f , g and proving f = g constants cs and ct such that the following holds: algebraically: Given the data-path logic F (x̄) and the linear sum formula (10), it takes n + 1 simulation patterns on the n w = cs · s + ct · t + fst (x̄) 15 cs is the quotient of fw /fs and ct = (fw − cs · fs )/ft , while c[m : 0] ={a[n : 0]}[m : 0]; fst is the remainder of the previous division. The substitution assert a[n : 0] == {(n − m) ∗ {c[m − 1]}, c[m : 0]} is conducted in one topological traversal from the inputs to the miter output. After substitution, the annotated reduced graph The first two sets of assertions ensure there is no overflow of is essentially a multi-level implementation of the above linear signed integer add and no non-zero remainder of division. sum. Because the linear sum annotated at each node in the The third one ensures that extraction does not change the reduced graph is only a conjecture, it may not be exactly the value in two’s complement representation. The SVA checkers same function as in the original miter logic. However, in re- are formally verified separately using the VeriABC[23] flow, implementing f using this structure, certain similarities are which in turn uses ABCs model checker. still captured through the construction process. From the above procedures, we established the following: This multi-level circuit is implemented by traversing the f (x̄) = g(x̄) (13) substituted reduced graph, and creating a corresponding Ver- 0 ilog file for F 0 . It is generated by allocating a bit-vector at F (x̄) implements f (x̄) (14) each internal node with its bit-width equivalent to output port G0 (x̄) implements g(x̄) (15) of F . The same can be done for G to obtain G0 . The reason F = F0 (16) why LEC goes through so much trouble to obtain separate G = G0 (17) RTL implementations for F 0 and G0 (even though they are both implementing f ), is that we need to prove F = F 0 and Altogether they establish the proof for F = G. G = G0 separately next. Without the structural similarities Combining the above procedures into a single run, LEC took created through this procedure, proving equivalence with an about 10 minutes on an i7 processor to complete the full proof. arbitrary F 0 and G0 would be as difficult as proving the Roughly 80% of the time is spent on compiling and running original F = G. Generally, only with these extra similarities random simulation, the rest are used for SAT solving. In table injected, can the miter model be simplified enough to allow VI, we also compare this run-time against Boolector[11], z3 SAT sweeping to succeed for F = F 0 and G = G0 . [16] and ABC’ iprove [24] solver, all run on the same server. 4) Proving F = F 0 and G = G0 : We construct two miter It is clear that LEC expedites the proof significantly by using models F = F 0 and G = G0 as caseF and caseG in Figure the knowledge of the linear sum formulation inside the miter 3, and apply LEC separately to each. By construction, each logic. miter should be simpler than the original F = G because of the increased structural similarity between the two sides of the Miter Boolector Z3 iprove LEC miter. Another round of LEC might reduce this miter logic, if model 1 time-out time-out time-out 10min not prove it directly through bit-level solvers. In the present TABLE VI case, F = F 0 was proven instantly because F is a simple C OMPARISON WITH OTHER SOLVES ( TIME - OUT IN 24 HOURS ) linear adder chain, and so is F 0 . Proving G = G0 takes more time using ABCs dcec because G is a highly optimized imple- In summary, polynomial reconstruction was a key tech- mentation of f and the multi-level implementation from the nique to prove the underlying miter problem. The case- annotated reduced graph only captures part of the similarity. study illustrates the major steps and the proof obligations But, the injected similarity was sufficient enough to reduce the encountered during the process. The actual techniques used complexity to be within dcec’s capacity. for different domains of the underlying arithmetic would 5) Proving that F 0 /G0 implements f /g : To complete the differ. Each algebraic domain would require special purpose proof, we still have the proof obligation that F 0 and G0 heuristics and automated proof procedures to guarantee the implement f and g respectively. By construction from the correctness of the reconstructions and transformations used reduced graph, the generated Verilog is a verbatim translation in the method. The goal of the linear construction procedure from the multi-level form of f . However, we need to bridge was to inject increased structural similarity by using global the gap between Verilog’s bit-vector arithmetic vs. the integer algebraic transformations. arithmetic of the linear sum. To do so, we created SVA assertions to check that every Verilog statement captures the integer value in full without losing precision due to underflows VI. E XPERIMENTAL RESULTS or overflows. Table VI shows the experimental results comparing Boolector[11], Z3[16] and iprove[24] using a 24-hour time- c[n : 0] =a[n − 1 : 0] + b[n − 1 : 0]; out limit on an 2.6Ghz Intel Xeon processor. These models assert (a[n − 1] & b[n − 1] =⇒ c[n]) are generated directly using SLEC[3] for checking C-to-RTL assert (!a[n − 1] & !b[n − 1] =⇒ !c[n]) equivalence or extracted as a sub-target from PEPs. The first column is the miter design name. The second column is the c[n : 0] =a[n : 0]/b[m : 0]; number of lines of Verilog for the miter model specification. assert (a[n : 0] == (c[n : 0] ∗ b[m : 0])[n : 0]) Run-time or time-out results are reported for each solver in columns 3 to 6. Although the miter models are not big in terms of lines of Verilog, they are quite challenging for Boolector, Z3 and iprove. The run-time of LEC is the total CPU time 16 including Verilog compilation. It was expected that iprove it in 9 hours while both iprove and Boolector time out. would not prove any of them because it works on the bit- This shows that LEC’s transformations through collaborating blasted model without any high-level information that the other procedures successfully reduce the surrounding logic in the solvers have. original model, which was preventing Z3 to prove it in 24 hours. Design Lines Boolector z3 iprove LEC mul 64 64 125 20 sec 200 sec timeout 10 sec The above experiments demonstrate the effectiveness of d1 24 time-out time-out time-out 15 sec LEC’s collaborating procedures of simplification, rewriting, d2 507 time-out time-out time-out 2 min case-splitting and abstraction computations. The LEC architec- d3 191 time-out time-out time-out 15 min d4 473 time-out time-out time-out 60 sec ture allows these procedures to be applied recursively through d5 pep 0 674 time-out 9 hour time-out 4 min a sub-model tree: the model obtained by one procedure TABLE VII introduces new opportunities for applying other procedures B ENCHMARK COMPARISON (T IMEOUT 24 HOURS ) in the next iteration. As exemplified in miter d4, the initial case-split gives rise to new opportunities for simplification The miter, mul 64 64, is comparing a 64x64 multiplier as new constants are introduced by cofactoring. Then a new with an implementation using four 32x32 multipliers as the round of abstraction is able to remove enough common logic following: and expose three one-bit inputs as case-split candidates in {aH , aL } ∗ {bH , bL } = (aH ∗ bH ) << 64 + the reduced model, which in turn gives rise to another case- split transformation that leads to the final proof. None of (aH ∗ bL + aL ∗ bH ) << 32 + aL ∗ bL this is possible without the transformations being applied in where aH , aL , bH , bL are the 32-bit slices of the original sequence. 64-bit bit-vectors. Both Boolector and Z3 are able to prove it. LEC proves it by first utilizing rewriting rules to transform VII. C OMPARISON WITH RELATED WORK the 64x64 multiplier into four 32x32 multipliers, matching the other four in the RHS of the miter. As they are matched In bit-level equivalence-checking procedures [24][25], sim- exactly, they become common logic in the miter model. LEC ulation, SAT-sweeping, AIG rewriting and internal equiva- then produces an abstraction and obtains a reduced model lence identification are all relevant to data-path equivalence- with all the multipliers removed: the outputs of the multipliers checking. In LEC, these types of procedures are conducted become free inputs in the abstract model. The abstract model at the word-level. Word-level rewriting is difficult if only a is then proven instantly by ABC’s dcec on the AIG model. bit-level model is available. For example, with no knowledge The miter d1, extracted from a PEP sub-model, is a demon- of the boundary of a multiplier, normalizing its operands is stration of rewrite rule 6 in Table II using 32-bit multiplication. impractical at the bit-level . Although abstraction and case- As both Boolector and Z3 fail to prove equivalence within split techniques in LEC can be applied at the bit-level in the time-limit, they likely do not have this rewriting rule theory, these are not used due to the difficulty of comput- implemented. ing an abstraction boundary or of finding good cofactoring To prove d2, LEC conducts conditional rewriting using rule candidates. (2) by first statically proving an invariant in the form of (3). SMT solving is relevant because a data-path is a subset After the transformation, the multipliers are matched exactly of QF BV theory. Methods such as [7][11][16][14][17][19], on both sides of the miter and removed in the subsequent are state-of-art QF BV solvers. These employ different imple- abstract model. The final miter model is proved instantly by mentations of word-level techniques in rewriting, abstraction, ABC on the bit level AIG. case-splitting, and simplification, and interleave Boolean and The miter model d3 has part of its logic similar to word-level reasoning via a generalized DPLL framework or mul 64 64 embedded inside. LEC proves d3 by first applying through abstraction refinements of various forms. Hector[20] is rewriting rules repeatedly until no more rewriting is possible. closest to LEC in terms of technology and targeted application Then, LEC computes a reduced model through abstraction. In domains, and has a rich set of word-level rewriting rules along the reduced model, LEC conducts a case-split on a one-bit with some theorem prover [7] procedures to validate every input. The case-0 AIG model is proven instantly, while case-1 rewriting applied. Hector also has an orchestration of a set of is proven in about 10 minutes by ABC. bit-level solvers using SAT and BDD engines to employ once The miter d4 is proven by first conducting a case-split of two the bit-level miter model is constructed. Strategically, LEC bit-vector inputs: cofactoring on whether the bit-vector equals relies less on the capacity of SAT solver; instead it builds zero or not. Three of the four cofactored cases are proven a compositional proof infrastructure and employs iterative instantly. The one unresolved goes through a round of simpli- transformations to finally obtain a proof through sub-model fication and abstraction. On the then obtained sub-model, three trees. The goal of these LEC learning procedures is to reverse one-bit inputs are identified and cofactored through case-split engineer the embedded high-level algebraic transformations procedures. LEC prove all eight cases quickly within a few and bring more similarity between both sides of the miter seconds. model. Miter d5 is extracted from model 15 in Figure 5 which The techniques in [26] [31][33] also try to reconstruct an contains the purely linear sum miter described in the case- algebraic model from the underlying logic, but they employ a study section. For this simpler miter, Z3 is able to prove bottom up approach and their primitive element is a half-adder. 17 The method in [8] simplifies the algebraic construction by [11] R. Brummayer and A. Biere. Boolector: An efficient smt solver for bit- solving an integer linear programming problem. The limitation vectors and arrays. In Tools and Algorithms for the Construction and Analysis of Systems, pages 174–177. Springer, 2009. of these approaches is that they rely on the structural pattern [12] M. L. Case, A. Mishchenko, and R. K. Brayton. Automated extraction of the underlying logic to reconstruct the algebraic model. of inductive invariants to aid model checking. In Formal Methods On the other hand, the linear construction case-study in in Computer Aided Design, 2007. FMCAD’07, pages 165–172. IEEE, 2007. Section V-A constructs the polynomial through probing with [13] P. Chauhan, D. Goyal, G. Hasteer, A. Mathur, and N. Sharma. Non- simulation patterns. This is more general as it uses only the cycle-accurate sequential equivalence checking. In Proceedings of the functional information of the data-path logic. For different 46th Annual Design Automation Conference, pages 460–465. ACM, 2009. domains, other techniques may well be more applicable such [14] A. Cimatti, A. Griggio, B. J. Schaafsma, and R. Sebastiani. The mathsat5 as the bottom-up approach. The use of vanishing polynomials smt solver. In Tools and Algorithms for the Construction and Analysis and Grobner bases in [27][28] to prove equivalence between of Systems, pages 93–107. Springer, 2013. [15] J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. polynomials in the modulo integer domain can be utilized once High-level synthesis for fpgas: From prototyping to deployment. a polynomial form is reconstructed in LEC. In many data-path Computer-Aided Design of Integrated Circuits and Systems, IEEE Trans- miter models, such a polynomial in a certain domain or theory actions on, 30(4):473–491, 2011. [16] L. De Moura and N. Bjørner. Z3: An efficient smt solver. In Tools is likely embedded in other control and data-path logic. Direct and Algorithms for the Construction and Analysis of Systems, pages application of algebraic techniques is often not practical. Thus 337–340. Springer, 2008. the collaborating procedures in LEC are designed to bridge [17] B. Dutertre and L. De Moura. The yices smt solver. Tool paper at http://yices. csl. sri. com/tool-paper. pdf, 2:2, 2006. this gap and isolate such polynomials so that these high level [18] M. Fujita. Equivalence checking between behavioral and rtl descriptions theories can then be applied. with virtual controllers and datapaths. ACM Transactions on Design In conducting consistency checking between C and Verilog Automation of Electronic Systems (TODAES), 10(4):610–626, 2005. [19] S. Jha, R. Limaye, and S. A. Seshia. Beaver: Engineering an efficient RTL, the work [21] focuses on how to process a C program smt solver for bit-vector arithmetic. In Computer Aided Verification, to generate formal models. The tool relies on SMT solvers pages 668–674. Springer, 2009. [11][16][14] as the back-end solving engines. [20] A. Koelbl, R. Jacoby, H. Jain, and C. Pixley. Solver technology for system-level to rtl equivalence checking. In Design, Automation & Test In terms of tool architecture, [9] [10] [22], all employ in Europe Conference & Exhibition, 2009. DATE’09., pages 196–201. a sophisticated set of transformations to simplify the target IEEE, 2009. model during verification. These are done at the bit-level. The [21] D. Kroening, E. Clarke, and K. Yorav. Behavioral consistency of C and Verilog programs using bounded model checking. In Proceedings of LEC infrastructure allows future extension to take advantage DAC 2003, pages 368–371. ACM Press, 2003. of multi-core parallelization as demonstrated in [30]. [12] [22] A. Kuehlmann and J. Baumgartner. Transformation-based verification [32], use a dedicated data-structures to represent the proof- using generalized retiming. In Computer Aided Verification, pages 104– 117. Springer, 2001. obligations, while LEC relies on the sub-model tree to track [23] J. Long, S. Ray, B. Sterin, A. Mishchenko, and R. Brayton. Enhanc- the compositional proof strategy used at each node. ing abc for ltl stabilization verification of systemverilog/vhdl models. Ganesh Gopalakrishnan University of Utah USA, page 38, 2011. [24] A. Mishchenko, S. Chatterjee, R. Brayton, and N. Een. Improvements to VIII. C ONCLUSION combinational equivalence checking. In Computer-Aided Design, 2006. In LEC, we build a system of collaborating procedures ICCAD ’06. IEEE/ACM International Conference on, pages 836–843, 2006. for data-path equivalence-checking problems found from an [25] V. Paruthi and A. Kuehlmann. Equivalence checking combining a industrial setting. The strategy is to utilize Boolean level structural sat-solver, bdds, and simulation. In Computer Design, 2000. solvers, conduct the transformations at the word-level and Proceedings. 2000 International Conference on, pages 459–464, 2000. [26] E. Pavlenko, M. Wedler, D. Stoffel, W. Kunz, A. Dreyer, F. Seelisch, to synthesize internal similarities by lifting the reasoning to and G. Greuel. Stable: A new qf-bv smt solver for hard verification the algebraic level . Using a real industrial case-study, we problems combining boolean reasoning with computer algebra. In demonstrated the applicability of the sub-tree infrastructure for Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011, pages 1–6. IEEE, 2011. integrating a compositional proof methodology using LEC. [27] N. Shekhar, P. Kalla, and F. Enescu. Equivalence verification of polynomial datapaths using ideal membership testing. Computer-Aided R EFERENCES Design of Integrated Circuits and Systems, IEEE Transactions on, [1] ABC - a system for sequential synthesis and verification. Berkeley 26(7):1320–1330, 2007. Verification and Synthesis Research Center, http://www.bvsrc.org. [28] N. Shekhar, P. Kalla, F. Enescu, and S. Gopalakrishnan. Equivalence [2] Calypto R Catapult Design Product. http://www.calypto.com. verification of polynomial datapaths with fixed-size bit-vectors using [3] Calypto R SLEC. http://www.calypto.com. finite ring algebra. In Computer-Aided Design, 2005. ICCAD-2005. [4] Forte design systems. http://www.forteds.com. IEEE/ACM International Conference on, pages 291–296, 2005. [5] smtlib. http://www.smt-lib.org. [29] W. Snyder, P. Wasson, and D. Galbi. Verilator: Convert verilog code to [6] Verific Design Automation: http://www.verific.com. c++/systemc, 2012. [7] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanović, T. King, [30] B. Sterin, N. Een, A. Mishchenko, and R. Brayton. The benefit of A. Reynolds, and C. Tinelli. Cvc4. In Computer Aided Verification, concurrency in model checking. IWLS, 2011. pages 171–177. Springer, 2011. [31] D. Stoffel and W. Kunz. Equivalence checking of arithmetic circuits on [8] M. A. Basith, T. Ahmad, A. Rossi, and M. Ciesielski. Algebraic the arithmetic bit level. Computer-Aided Design of Integrated Circuits approach to arithmetic design verification. In Formal Methods in and Systems, IEEE Transactions on, 23(5):586–597, 2004. Computer-Aided Design (FMCAD), 2011, pages 67–71. IEEE, 2011. [32] D. Wang and J. Levitt. Automatic assume guarantee analysis for [9] J. Baumgartner, H. Mony, V. Paruthi, R. Kanzelman, and G. Janssen. assertion-based formal verification. In Proceedings of the 2005 Asia Scalable sequential equivalence checking across arbitrary design trans- and South Pacific Design Automation Conference, pages 561–566. ACM, formations. In Computer Design, 2006. ICCD 2006. International 2005. Conference on, pages 259–266. IEEE, 2007. [33] M. Wedler, D. Stoffel, R. Brinkmann, and W. Kunz. A normalization [10] R. Brayton and A. Mishchenko. Abc: An academic industrial-strength method for arithmetic data-path verification. Computer-Aided Design of verification tool. In Computer Aided Verification, pages 24–40. Springer, Integrated Circuits and Systems, IEEE Transactions on, 26(11):1909– 2010. 1922, 2007. 18 Trading-off incrementality and dynamic restart of multiple solvers in IC3 G. Cabodi (*), A. Mishchenko (**), M. Palena (*) (*) Dip. di Automatica ed Informatica Politecnico di Torino - Torino, Italy (**) Dept. of EECS, University of California, Berkeley, CA, USA Abstract—This paper1 addresses the problem of SAT solver process by constantly refining a set of over-approximations performance in IC3, one of the major recent breakthroughs to forward reachable states with new inductive clauses; at the in Model Checking algorithms. Unlike other Bounded and bottom level, a SAT solving framework is exploited by the Unbounded Model Checking algorithms, IC3 is characterized by numerous SAT solver queries on small sets of problem top-level algorithm to respond to queries about the system. clauses. Besides algorithmic issues, the above scenario poses As shown in [7], these two layers can be separated by means serious performance challenges for SAT solver configuration of a clean interface. and tuning. As well known in other application fields, finding Performance of IC3 turns out to be both highly sensitive a good compromise between learning and overhead is key to performance. We address solver cleanup and restart heuristics, to the various internal behaviours of SAT solvers, and strictly as well as clause database minimality, based on on-demand clause dependent on the way the top-level algorithm is integrated with loading: transition relation clauses are loaded in solver based the underlying SAT solving framework. on structural dependency and phase analysis. We also compare The peculiar characteristics exposed by the SAT queries of different solutions for multiple specialized solvers, and we provide IC3 can thus be exploited to improve the overall performance an experimental evaluation on benchmarks from the HWMCC suite. Though not finding a clear winner, the work outlines several of the algorithm in two different manners: potential improvements for a portfolio-based verification tool 1) Tuning the internal behaviours of the particular SAT with multiple engines and tunings. solver employed to better fit IC3 needs. I. I NTRODUCTION 2) Defining better strategies to manage the SAT solving work required by IC3. IC3 [1] is a SAT-based invariant verification algorithm for bit-level Unbounded Model Checking (UMC). Since its In this paper we address this second issue, proposing and introduction, IC3 has immediately generated strong interest, comparing different implementation strategies for handling and is now considered one of the major recent breakthroughs SAT queries in IC3. The aim of this paper is to identify the in Model Checking. IC3 proved to be impressively effective on most efficient way to manage SAT solving in IC3. To achieve solving industrial verification problems. Our experience with this goal we experimentally compare a number of different the algorithm shows that IC3 is the single invariant verification implementation strategies over a selected set of benchmarks algorithm capable of solving the largest number of instances from the recent HWMCC. among the benchmarks of the last editions of the Hardware The experimental work has been done by two different Model Checking Competition (HWMCC). research groups, on two different state-of-the-art verification tools, ABC [8] and PdTRAV [9], that share similar architec- A. Motivations tural and algorithmnic choices in their implementation of IC3. IC3 heavily relies on SAT solvers to drive several parts of The focus of this paper is neither on the IC3 algorithm the verification algorithm: a typical run of IC3 is characterized itself nor on the internal details of the SAT solving procedures by a huge amount of SAT queries. As stated by Bradley in [2], employed, but rather on the implementation details of the the queries posed by IC3 to SAT solvers differ significantly integration between IC3 and the underlying SAT solving in character from those posed by other SAT-based invariant framework. verification algorithms (such as Bounded Model Checking [3], k-induction [4] [5] or interpolation [6]). Most notably, SAT B. Contributions queries posed by IC3 don’t involve the unrolling of the transi- tion relation for more than one step and are thus characterized The main contributions of this paper are: by small-sized formulas. • A characterization of SAT queries posed by IC3. IC3 can be thought as composed of two different layers: • Novel approaches to solver allocation, loading and clean at the top level, the algorithm itself drives the verification up in IC3. 1 This work was supported in part by SRC Contracts No. 2012-TJ-2328 and • An experimental evaluation of performance using two No. 2265.001 verification tools. 19 C. Outline P holds for every reachable state in S: ∀s ∈ R(S) : s |= P . First in Section II we introduce the notation used and give An algorithm used to solve the invariant verification problem some background on IC3. Then, in Section III we present a is called an invariant verificatio algorithm. systematic characterization of the SAT solving work required Definition 8. Given a transition system S = hM, I, T i, a by IC3. Section IV introduces the problem of handling SAT boolean predicate F over M is called an inductive invariant queries posed by IC3 efficiently. Both commonly used and for S if the following two conditions hold: novel approaches to the allocation, loading and cleaning up of • Base case: I → F SAT solvers in IC3 are discussed in Sections V, VI and VII • Inductive case: F ∧ T → F ′ respectively. Experimental data comparing these approaches A boolean predicate F over M is called an inductive invariant are presented in Section VIII. Finally, in Section IX we draw for S relative to another boolean predicate G if the following some conclusions and give summarizing remarks. two conditions hold: II. BACKGROUND • Base case: I → F ′ • Relative inductive case: G ∧ F ∧ T → F A. Notation Lemma 1. Given a transition system S = hM, I, T i, an Definition 1. A transition system is the triple S = hM, I, T i inductive invariant F for S is an over-approximation to the where M is a set of boolean variables called state variables of set of reachable states R(S). the system, I(M ) is a boolean predicate over M representing the set of initial states of the system and T (M, M ′ ) is a Definition 9. Given a transition system S = hM, I, T i and a predicate over M, M ′ that represents the transition relation boolean predicate P over M , an inductive strengthening of P of the system. for S is an inductive invariant F for S such that F is stronger than P . Definition 2. A state of the system is represented by a complete assignment s to the state variables M . A set of Lemma 2. Given a transition system S = hM, I, T i and a states of the system is represented by a boolean predicate boolean predicate P over M , if an inductive strengthening of over M . Given a boolean predicate F over M , a complete P can be found, then the property P holds for every reachable assignement s such that s satisfies F (i.e. s |= F ) represents state of S. The invariant verification problem can be seen as a state contained in F and is called an F -state. Primed the problem to find an inductive strengthening of P for S. state variables M ′ are used to represent future states and, B. IC3 accordingly, a boolean predicate over M ′ represent a set of future states. Given a transition system S = hM, I, T i and a safety prop- erty P over M, IC3 aims to find an inductive strengthening of Definition 3. A boolean predicate F is said to be stronger than P for S. For this purpose, it maintains a sequence of formulas another boolean predicate G if F → G, i.e. every F -state is Fk = F0 , F1 , . . . Fk such that, for every 0 ≤ i < k, Fi is an also a G-state. over-approximation of the set of states reachable in at most Definition 4. A literal is a boolean variable or the negation of i steps in S. Each of these over-approximations is called a a boolean variable. A disjunction of literals is called a clause time frame and is represented by a set of clauses, denoted by while a conjunction of literals is called a cube. A formula clauses(Fi ). The sequence of time frames Fk is called trace is said to be in Conjunctive Normal Form (CNF) if it is a and is maintained by IC3 in such a way that the following conjunction of clauses. conditions hold throughout the algorithm: (1) F0 = I Definition 5. Given a transition system S = hM, I, T i, if an (2) Fi → Fi+1 , for all 0 ≤ i < k assignment s, t′ satisfies the transition relation T (i.e. if s, t′ |= (3) Fi ∧ T → Fi+1 ′ , for all 0 ≤ i < k T ) then s is said to be a predecessor of t and t is said to be (4) Fi → P , for all 0 ≤ i < k a successor of s. A sequence of states s0 , s1 , . . . , sn is said Condition (1) states that the first time frame of the trace is to be a path in S if every couple of adjacent states si , si+1 , special and is simply assigned to the set of initial states of i ≤ 0 < n satisfies the transition relation (i.e. if si , s′i+1 |= T ). S. The remaining conditions, claim that for every time frame Definition 6. Given a transition system S = hM, I, T i, a state Fi but the last one: (2) every Fi -state is also a Fi+1 -state, (3) s is said to be reachable in S if there exists a path s0 , s1 , . . . , s, every successor of an Fi -state is an Fi+1 -state and (4) every such that s0 is an initial state (i.e. s0 |= I). We denote with Fi -state is safe. Condition (2) is maintained syntactically, Rn (S) the set of states that are reachable in S in at most n enforcing the condition (2’) clauses(Fi+1 ) ⊆ clauses(Fi ). steps. We denote with R(S) the overall S set of states that are Lemma 3. Let S = hM, I, T i be a transition system, reachable in S. Note that R(S) = i≥0 Ri (S). Fk = F0 , F1 , . . . Fk a sequence of boolean formulas over Definition 7. Given a transition system S = hM, I, T i and a M and let conditions (1-3) hold for Fk . Then each Fi , boolean predicate P over M (called A safety property), the with 0 ≤ i < k, is an over-approximation to the set of states invariant verification problem is the problem of determining if reachable within i steps in S. 20 Lemma 4. Let S = hM, I, T i be a transition system, P a to a bad cube s. This is done by means of the Extend(t) safety property over M, Fk = F0 , F1 , . . . Fk a sequence of procedure (line 5), not reported here, that employs ternary boolean formulas over M and let conditions (1-4) hold for simulation to remove some literals from t. The resulting cube Fk . Then P is satisfied up to k − 1 steps in S (i.e. there s includes t and violates the property P , it is thus a bad doesn’t exist any counter-example to P of length less or equal cube. The algorithm then tries to block the whole bad cube than k − 1). s rather than t. It is showed in [7] that extending bad states into bad cubes before blocking them dramatically improves The main procedure of IC3 is described in Algorithm 1 and IC3 performance. is composed of two nested iterations. Major iterations (lines 3- Once a bad cube s is found, it is blocked in Fk calling 16) try to prove that P is satisfied up to k steps in S, for the BlockCube(s, Q, Fk ) procedure (line 6). This procedure increasing values of k. To prove so, in minor iterations (lines 4- is described in Algorithm 2. 9), IC3 refines the trace Fk computed so far, by adding new relative inductive clauses to some of its time frames. The algorithm iterates until either (i) an inductive strengthening Input: s: bad cube in Fk ; Q: priority queue; Fk : trace of the property is produced (line 4), or (ii) a counter-example Output: SUCCESS or FAIL(σ), with σ counter-example to the property is found (line 7). 1: add a proof-obligation (s, k) to the queue Q 2: while Q is not empty do Input: S = hM, I, T i ; P (M) 3: extract (s, j) with minimal j from Q Output: SUCCESS or FAIL(σ), with σ counter-example 4: if j > k or t 6|= Fj then continue; 1: k ← 0 5: if j = 0 then return FAIL(σ) 2: Fk ← I 6: if ∃t, v ′ : t, v ′ |= Fj−1 ∧ T ∧ ¬s ∧ s′ then 3: repeat 7: p ← Extend(t) 4: while ∃t : t |= Fk ∧ ¬P do 8: add (p, j − 1) and (s, j) to Q 5: s ← Extend(t) 9: else 6: if BlockCube(s, Q, Fk ) = FAIL(σ) then 10: c ← Generalize(j, s, Fk ) 7: return FAIL(σ) 11: Fi ← Fi ∪ c for 0 < i ≤ j 8: end if 12: add (j + 1, c) to Q 9: end while 13: end if 10: Fk+1 ← ∅ 14: end while 11: k ←k+1 15: return SUCCESS 12: Fk ← Propagate(Fk ) Algorithm 2. BlockCube(s, Q, Fk ) 13: if Fi = Fi+1 for some 0 ≤ i < k then 14: return SUCCESS Otherwise, if Q1 is UNSAT, every bad state of Fk has been 15: end if blocked so far, conditions (1-4) hold for k + 1 and IC3 can 16: until forever safely move to the next major iteration, trying to prove that Algorithm 1. IC3(S, P ) P is satisfied up to k + 1 steps. Before moving to the next iteration, a new empty time frame Fk+1 is created (line 10). At major iteration k, the algorithm has computed a trace Initially, clauses(Fk+1 ) = ∅ and such time frame represent Fk such that conditions (1-4) hold. From Lemma 4, it follows the entire state space, i.e. Fk+1 = Space(S). Note that that P is satisfied up to k − 1 steps in S. IC3 then tries to Space(S) is a valid over-approximation to the set of states prove that P is satisfied up to k steps as well. This is done by reachable within k + 1 steps in S. Then a phase called prop- enumerating Fk -states that violate P and then trying to block agation takes place (line 12). The procedure Propagate(Fk) them in Fk . (Algorithm 4), which is discussed later, handles that phase. During propagation, IC3 tries to refine every time frame Fi , Definition 10. Blocking a state (or, more generally, a cube) with i < 0 ≤ k, by checking if some clauses of one time s in a time frame Fk means proving s unreachable within k frame can be pushed forward to the following time frame. steps in S, and consequently refine Fk to exclude it. Possibly, propagation refines the outmost timeframe Fk so that To enumerate each state of Fk that violates P (line 4), the Fk ⊂ Space(S). Propagation can lead to two adjacent time algorithm poses SAT queries to the underlying SAT solving frames becoming equivalent. If that happens, the algorithm framework in the following form: has found an inductive strengthening of P S (equal to those time frames), so the property P holds for for every reachable SAT ?(Fk ∧ ¬P ) (Q1 ) state of S and IC3 return success (line 13). If Q1 is SAT, a bad state t in Fk can be extracted from the The procedure BlockCube(s, Q, Fk ) (Algorithm 2) is re- satisfying assignment. That state must be blocked in Fk . To sponsible for refining the trace Fk in order to block a bad cube increase performance of the algorithm, as suggested in [7], s found in Fk . To preserve condition (3), prior to blocking a the bad state t generated this way is first (possibly) extended cube in a certain time frame, IC3 has to recursively block its 21 predecessor states in the preceding time frames. To keep track in clause c = ¬s while maintaining its inductiveness relative of the states (or cubes) that must be blocked in certain time to Fj−1 , in order to preserve condition (2). The resulting frames, IC3 uses the formalism of proof-obligations. clause is added not only to Fj , but also to every time frame Fi , 0 < i < j (line 11). Doing so discharges the proof- Definition 11. Given a cube s and a time frame Fj , a proof- obligation (s, j). In fact, this refinement rule out s from every obligation is a couple (s, j) formalizing the fact that s must Fi with 0 < i ≤ j. Since the sets Fi with i > j are larger be blocked in Fj . than Fj , s may still be present in one of them and (s, j + 1) Given a proof obligation (s, j), the cube s can either may become a new proof-obligation. To address this issue, represent a set of bad states or a set of states that can all Algorithm 2 adds (s, j + 1) to the priority queue (line 12). reach a bad state in one or more transitions. The number j Otherwise, if Q3 is SAT (lines 7-8), a predecessor t of s indicates the position in the trace where s must be proved in Fj−1 can be extracted from the satisfying assignment. To unreachable, or else the property fails. preserve condition (3), before blocking a cube s in a time frame Fj , every predecessor of s must be blocked in Fj−1 . So, Definition 12. A proof obligation (s, j) is said to be dis- the predecessor t is extended with ternary simulation (line 7) charged when s becomes blocked in Fj . into the cube p, and then both proof-obligations (p, j − 1) and IC3 maintains a priority queue Q of proof-obligations. (s, j) are added to the queue (line 8). During the blocking of a cube, proof-obligations (s, j) are extracted from Q and discharged for increasing values of Input: j: time frame index; s: cube such that ¬s is j, ensuring that every predecessor of a bad cube s will be inductive relative to Fj−1 ; Fk : trace blocked in Fj (j < k) before s will be blocked in Fk . In Output: c : a sub-clause of ¬s the BlockCube(s, Q, Fk ) procedure, first a proof-obligation 1: c ← ¬s encoding the fact that s must be blocked in Fk is added to 2: for all literals l in c do Q (line 1). Then proof-obligations are iteratively extracted 3: try ← the clause obtained by deleting l from c from the queue and discharged (lines 2-14). 4: if 6 ∃t, v ′ : t, v ′ |= Fj−1 ∧ T ∧ try ∧ ¬try ′ then Prior to discharge the proof-obligation (s, j) extracted, IC3 5: if 6 ∃t |= I ∧ ¬try then checks if that proof-obligation still needs to be discharged. 6: c ← try It is in fact possible for an enqueued proof-obligation to 7: end if become discharged as a result of the discharging of some 8: end if previously extracted proof-obligations. To perform this check, 9: end for the following SAT query is posed (line 4): 10: return c SAT ?(Fj ∧ s) (Q2 ) Algorithm 3. Generalize(j, s, Fk ) If the result of Q2 is SAT, then the cube s is still in Fj and (s, j) still needs to be discharged. Otherwise, s has already The Generalize(j, s, Fk ) procedure (Algorithm 3) tries to been blocked in Fj and the procedure can move on to the remove literals from ¬s while keeping it relatively inductive next iteration. to Fj−1 . To do so, a clause c intialized with ¬s (line 1) is used If the proof-obligation (s, j) still needs to be discharged, to represent the current minimal inductive clause. For every then IC3 checks if the time frame identified by j is the initial literal of c, the clause try obtained by dropping that literal time frame (line 5). If so, the states represented by s are initial from c (line 3), is checked for inductiveness relative to Fj−1 states that can reach a violation of the property P . A counter- by means of the following SAT query (line 4): example σ to P can be constructed going up the chain of proof-obligations that led to (s, 0). In that case, the procedure SAT ?(Fj−1 ∧ try ∧ T ∧ ¬try ′ ) (Q4 ) terminates with failure returning the counter-example found. If Q4 is UNSAT, the iductive case for the reduced formula To discharge a proof-obligation (s, j), i.e. to block a cube still holds. Since dropping literals from a relatively inductive s in Fj , IC3 tries to derive a clause c such that c ⊆ ¬s and clause can break both the inductive case and the base case, the c is inductive relative to Fj−1 . This is done by posing the latter must be explicilty checked too for the reduced clause following SAT query (line 6): try (line 5). This is done by posing the following SAT query: SAT ?(Fj−1 ∧ ¬s ∧ T ∧ s′ ) (Q3 ) SAT ?(I ∧ ¬try) (Q5 ) If Q3 is UNSAT (lines 10-12), then the clause ¬s is inductive relative to Fj−1 and can be used to refine Fj , ruling If both the inductive case and the base case hold for the out s. To pursue a stronger refinement of Fj , the inductive reduced clause try, the currently minimal inductive clause c clause found undergoes a process called inductive generaliza- is updated with try (line 6). tion (line 10) prior to being added to Fi . Inductive gener- The Propagate(Fk) procedure (Algorithm 4) handles the alization is carried out by the procedure Generalize(j, s, Fk ) propagation phase. For every clause c of each time frame Fj , (Algorithm 3), which tries to minimize the number of literals with 0 ≤ j < k − 1, the procedure checks if c can be pushed 22 Input: Fk : trace Name Query Type Query Output: Fk : updated trace Q1 Target Intersection SAT ?(Fk ∧ ¬P ) Q2 Blocked Cube SAT ?(Fi ∧ s) 1: for j = 0 to k − 1 do Q3 Relative Induction SAT ?(Fi ∧ ¬s ∧ T ∧ s′ ) 2: for all c ∈ Fj do Q4 Inductive Generalization SAT ?(Fi ∧ c ∧ T ∧ ¬c′ ) 3: if ∃t, v ′ : t, v ′ |= Fj ∧ T ∧ c′ then Q5 Base of Induction SAT ?(I ∧ ¬c) 4: Fj+1 ← Fj+1 ∪ {c} Q6 Clause Propagation SAT ?(Fi ∧ T ∧ ¬c′ ) 5: end if 6: end for Table I: SAT Queries Breakdown in IC3 7: end for 8: return Fk • SAT calls involved in inductive generalization are by Algorithm 4. Propagate(Fk ) far the most frequent ones. These are in fact the calls that appears at the finest granularity. In the worst case scenario, one call is posed for every literal of every forward to Fj+1 (line 3). To do so, it poses the following SAT inductive clause found. query: • Inductive generalization and propagation checks are the SAT ?(Fj ∧ T ∧ c′ ) (Q6 ) most expensive queries in terms of average SAT solving If the result of Q6 is SAT, then it is safe, with respect to time required. condition (3), to push clause c forward to Fi+1 . Otherwise, c • Target intersection calls are very infrequent and don’t take can’t be pushed and the procedure iterates to the next clause. much time to be solved. • Blocked cube and relative induction checks are close in C. Related works the number of calls and solving time. In [2], Aaron Bradley outlined the opportunity for SAT and SMT researchers to directly address the problem of improving IC3’s performance by exploiting the peculiar character of the Query Calls Avg Time [Number] [%] [ms] SAT queries it poses. A description of the IC3 algorithm, Q1 483 0.1 81 specifically addressing implementation issues, is given in [7]. Q2 27891 6.8 219 Q3 31172 7.6 334 III. SAT SOLVING IN IC3 Q4 142327 34.7 575 SAT queries posed by IC3 have some specific characteris- Q5 147248 35.9 112 tics: Q6 61114 14.9 681 • Small-sized formulas: they employ no more than a single instance of the transition relation; Table II: SAT queries statistics in IC3: Number of calls, • Large number of calls: reasoning during the verification percentage, and average time spent in different SAT queries process is highly localized and takes place at relatively- during an IC3 run. small-cubes granularity; • Separated contexts: each SAT query is relative to a single IV. H ANDLE SAT SOLVING IN IC3 time frame; • Related calls: subsequent calls may expose a certain Subsequent SAT calls in IC3 are often applied to highly correlation (for instance, inductive generalization calls different formulas. In the general case, two subsequent calls take place on progressively reduced formulas). can in fact be posed in the context of different time frames, We performed an analysis of the implementation of IC3 thus involving different sets of clauses. In addition, one of within the academic model checking suite PdTRAV, closely them may require the use of the transition relation, while following the description of IC3 given in [7] (there called the other may not, and each query can involve the use of PDR: Property Directed Reachability). The experimental anal- temporary clauses/cubes that are needed only to respond to that ysis lead us to identify six different types of SAT queries that particular query (e.g. the candidate inductive clause used to the algorithm poses during its execution. These queries are check relative inductiveness during cube blocking or inductive the ones already outlined in Section II-B. The type of these generalization). queries is reported in Table I. In the general case, whenever a new query is posed by For each of the queries identified, we have measured the IC3 to the underlying SAT solving framework, the formula average number of calls and the average solving time. These currently loaded in the solver must be modified to accomodate statistics are reported in Table II. The results were collected by the new query. For this reason, IC3 requires the use of running PdTRAV’s implementation of IC3 on the complete set SAT solvers that expose an incremental SAT interface. An of 310 single property benchmarks of the HWMCC’12, using incremental SAT interface for IC3 must support the following time and memory limits of 900 s and 2 GB, respectively. features: Such statistics can be summarized as follows: • Pushing and popping clauses to or from the formula. 23 • Specifying literal assumptions. The reason why inductive generalization calls are so ex- Many state-of-the-art SAT solvers, like MiniSAT [10], fea- pensive can be due to the fact that during the inductive ture an incremental interface capable of pushing new clauses generalization of a clause, at every iteration a slightly changing into the formula and allowing literal assumptions. Removing formula is queried for a satisfying assignment in increasingly clauses from the current formula is a more difficult task since larger subspaces. Given two subsequent queries in inductive one have to remove not only the single clause specified, generalization, in fact, it can be noticed that their formulas but also every learned clause that has been derived from can differ only for one literal of the present state clause try it. Although solvers such as zchaff [11] directly support and one literal of the next state cube ¬try. As the subspace clause removal, the majority of the state-of-the-art SAT solvers becomes larger solving times for those queries increases. feature an interface like the one of MiniSAT, in which clause The average expensiveness of clause propagation calls can removal can only be simulated. This is done through the use of be intuitively motivated by noticing that they are posed one literal assumptions and the introduction of auxiliary variables time for every clause of every time frame. The innermost known as activation variables, as described in [12]. In such time frames are the ones with the largest number of clauses, solvers, clauses aren’t actually removed from the formula and thus require the largest number of propagation calls. but only made redundant for the purpose of determining the Unfortunately, given the high number of clauses in those time satisfiability of the rest of the formula. Since such clauses frames, the CNF formulas upon which such calls act are highly are not removed from the formula, they still participate in constrained and usually harder to solve. So during propagation the Boolean Constraint Propagation (BCP) and, thus, degrade there are, in general, more hard queries than simple queries, the overall SAT solving performance. In order to minimize making the average SAT solving time for those queries high. this degradation, each solver employed by IC3 must be pe- In an attempt to reduce the burden of each time frame’s SAT riodically cleaned up, i.e. emptied and re-loaded with only solver, we have experimented the use of specialized solvers for relevant clauses. In this work we assume the use of a SAT handling such queries. For every time frame, a second solver is solver exposing an incremental interface similar to the one of instantiated and used to respond a particular type of query (Q4 MiniSAT. or Q6 ). Table III summarize the results of such experiment. Once an efficient incremental SAT solving tool has been VI. S OLVER LOADING chosen, any implementation of IC3 must face the problem of deciding how to integrate the top-level algorithm with the To minimize the size of the formula to be loaded into each underlying SAT solving layer. Such problem can be divided solver, a common approach is to load, for every SAT call that into the following three sub-problems: queries the dynamic behavior of the system, only the necessary • SAT solvers allocation: decide how many different SAT part of the transition relation. solvers to employ and how to distribute workload among It is easy to observe that every SAT call that uses the them. transition relation involves a constraint on the next state • SAT solvers loading: decide which clauses must be loaded variables of the system in the form of a cube c′ . Such queries in each solver to make them capable of responding ask if there is a state of the system satisfying some constraints correctly to the SAT queries posed by IC3. on the present state, which can reach in one transition states • SAT solvers clean up: decide when and how often the represented by c′ . Since c′ is a cube, the desired state must algorithm must clean up each solver, in order to avoid have a successor p such that p correspond to c′ for the value of performance degradation. every variable of c′ . It’s easy to see that the only present state variables that are relevant in determining if such a state exists, V. SAT SOLVERS ALLOCATION are those in the structural support of the next state variables We assume in this work the use of multiple SAT solvers, of c′ . It follows that the only portions of the transition relation one for each different time frame. Using multiple solvers, we required to answer such queries are the logic cones of the next observed that performance is highly related to: state variables of c′ . • Solver cleanup frequency: cleaning up the solver means Such loading strategy, known as lazy loading of transition removal of incrementally loaded problem clauses and relation, is commonly employed in various implementations learned clauses of IC3, as the ones of PdTRAV and ABC. We observed in • Clause loading strategy: on-demand loading of transition average 50% reduction in the size of the CNF formula for the relation clauses based on topological dependency transition relation using lazy loading of transition relation. We noticed that, for these queries, the portions of the A. Specialized solvers transition relation loaded can be further minimized employing From the statistical results of reported in Table II it’s easy a CNF encoding technique, called Plaisted-Greenbaum CNF to see that inductive generalization and clause propagation encoding [13] (henceforth simply called PG encoding). The queries are by far the most expensive ones in terms of average AIG representation of the transition relation together with SAT solving time. Inductive generalization queries, in addition the assumptions specified by the next state cube c′ can be of being expensive, are also the most frequent type of query viewed as a constrained boolean circuit [14], i.e. a boolean posed. circuit in which some of the outputs are constrained to assume 24 certain values. The Plaisted-Greenbaum encoding is a special These heuristics clean up each solver as soon as the computed encoding that can be applied in the translation of a constrained estimate meets some, often static, threshold. boolean circuit into a CNF formula. Our experiments with IC3 prove that the frequency of Below we give an informal description of the PG encoding. the cleanups play a crucial role in determining the overall For a complete description refer to [13] or [14]. verification performance. We explored the use of new cleanup Given an AIG representation of a circuit, a CNF encoding heuristics based on more precise measures of the number of first subdivides that circuit into a set of functional blocks, i.e. irrelevant clauses loaded and able to exploit correlation among gates or group of connected gates representing certain boolean different SAT calls to dynamically adjust the frequency of functions, and introduces a new CNF variable a for each of cleanups. these blocks. For each functional block representing a function For SAT calls in IC3, notice that there are two sources of f on the input variables x1 , x2 , . . . , xn , a set of clauses is irrelevant clauses loaded in a solver: derived translating into CNF the formula: 1) Deactivated clauses loaded for previous inductive checks a ↔ f (x1 , x2 , . . . , xn ) (1) (Q3 and Q4 queries). 2) Portions of logic cones loaded for previous calls query- The final CNF formula is obtained by the conjunction of these ing the dynamic behavour of the system. sets of clauses. Different CNF encodings differ in the way the gates are grouped together to form functional blocks and in Cleanup heuristics commonly used, such as the ones used the way these blocks are translated into clauses. The idea of in baseline versions of ABC and PdTRAV, typically take into PG encoding is to start from a base CNF encoding, and then account only the number of deactivated clauses in the solver to use both output assumptions and topological information of compute an estimate of the number of irrelevant clauses. We the circuit to get rid of some of the derived clauses, while investigated the use of a new heuristic measure taking into still producing an equi-satisfiable CNF formula. Based on the account the second source of irrelevant clauses, i.e. irrelevant output constraints and the number of negations that can be portions of previously loaded cones. found in every path from a gate to an output node, it can be Every time a new query requires to load a new portion of the shown that, for some gates of the circuit, an equi-satisfiable transition relation, to compute a measure of the number of the encoding can be produced by only translating one of the two irrelevant transition relation’s clauses, the following quantities sides of the bi-implication (1). The CNF formula produced are computed (assuming that c′ is the cube constraining the by PG encoding will be a subset of the one produced by the next state variables for that query): traditional encoding. • A: the number of transition relation clauses already PG encoding proves to be effective in reducing the size loaded into the solver (before loading the logic cones of loaded formulas, but it is not certain whether it is more required by the current query); efficient for SAT solving, since it may have worst propagation ′ • S(c ): the size (number of clauses) of the logic cones behaviour [15]. required for solving the current query; We observed a 10-20% reduction in the size of the CNF ′ • L(c ): the number of clauses in the required logic cones to formula for the transition relation. be loaded into the solver (equal to the size of the required VII. S OLVERS CLEAN UP logic cone minus the number of clauses that such cone shares with previously loaded cones); A natural question arises regarding how frequently and at what conditions SAT solvers cleanups should be scheduled. A measure of the number of irrelevant transition relation’s Cleaning up a SAT solver, in fact, introduces a certain over- clauses loaded for c′ , denoted by u(c′ ), can be computed as head. This is because: follows: • Relevant clauses for the current query must be reloaded. u(c′ ) = A − (S(c′ ) − L(c′ )) (2) • Relevant inferences previously derived must be derived again from those clauses. Such a heuristic measure, divided by the number of clauses A heuristic cleanup strategy is needed in order to achieve a loaded into the solver, indicates the percentage of irrelevant trade-off between the overhead introduced and the slowdown clauses loaded in the solver w.r.t the current query. In Sec- in BCP avoided. The purpose of that heuristic is to determine tion VIII we investigate the use new cleanup strategies based when the number of irrelevant clauses (w.r.t. the current query) on this measure. In order to take into account correlation loaded in a solver becomes large enough to justify a cleanup. between subsequent SAT calls, we consider such measure To do so, a heuristic measure representing an estimate of the averaged on a time window of the last SAT calls. number of currently loaded irrelevant clauses is compared to a certain threshold. If that measure exceeds the threshold, then VIII. E XPERIMENTAL R ESULTS the solver is cleaned up. Clean up heuristics currently used in state-of-the-art tools, We have compared different cleanup and clause loading like ABC and PdTRAV, rely on loose estimates of the size heuristics in both PdTRAV and ABC. In this section, we of irrelevant portions of the formula loaded into each solver. briefly summarize the obtained results. 25 A. PG encoding • H4: H2 || H3; A first set of experiments was done on the full set of Heuristic H1 is the cleanup heuristic used in the baseline 310 HWMCC’12 benchmarks [16], with a lower timeout, of versions of both PdTRAV and ABC. The static threshold of 900 seconds, in order to evaluate the use of PG encoding. 300 activation literals was determined experimentally. Heuris- Results were controversial. A run in PdTRAV, with 900 tic H2 cleans up each solver as soon as half of its variables seconds timeout, showed a reduction in the number of solved are used for activation literals. It can be seen as a first simple instances from 79 to 63 (3 of which previously unsolved). The attempt to adopt a dynamic cleanup threshold. Heuristic H3 is percentage reduction of loaded transition relation clauses was the first heuristic proposed to take into account the second 21.32%. source of irrelevant clauses described in Section VII, i.e. A similar run on ABC, showed a more stable compari- irrelevant portions of previously loaded cones. In H3 a solver son, from 80 to 79 solved instances (3 of which previously is cleaned up as soon as the percentage of irrelevant transition unsolved). So a first conclusion is that, PG encoding was relation’s clauses loaded, averaged on a window of the last not able to win over the standard encoding, suggesting it can 1000 calls, reaches 50%. Heuristic H4 takes into account both indeed suffer of worst propagation behaviour. Nonetheless, it sources of irrelevant clauses, combining H2 and H3. was very interesting to observe that the overall number of Configuration Solved [#] New [#] Avg Time [s] solved problems grew from 79 to 82 in PdTRAV and from 80 P0 (H1) 64 137.00 to 85 in ABC. P3 (H2) 60 1 122.19 Different results between the two tools in this experi- P4 (H3) 44 116.28 mentation could be partially due to different light-weight P5 (H4) 62 3 125.94 preprocessing done by default within them. We started a more detailed comparison, in order to better understand the partially Table IV: Tests on clean up heuristics. controversial results. Table IV shows a comparison among H1 (row P 0), H2, B. Experiments with PdTRAV H3, and H4, in rows P 3, P 4, and P 5, respectively. No PG We then focused on a subset of 70 selected circuits, for encoding, nor specialized solvers were used. Heuristic H1, which preprocessing was done off-line, and the tools were that employs a static threshold, was able to solve the largest run on the same problem instances. In the following tables, number of instances. Among dynamic threshold heuristics, the P 0 rows always represent the default setting of PdTRAV. both H2 and H3 take into account a single source of irrelevant We report number of solver instances (out of 70) and average loaded clauses, respectively the deactivated clauses in H2 execution time (time limit 900 seconds). Column labeled and the unused portions of transition relation in H3. Data N ew, when present, shows the number of instances not solved clearly indicates that H3 has worse performance. This sug- by P 0. gests that deactivated clauses play a bigger role in degrading SAT solvers’ performance than irrelevant transition relation’s Configuration Solved [#] New [#] Avg Time [s] clauses do. Taking into account only the latter source of P0 (baseline) 64 137.00 irrelevant clauses it’s not sufficient. It can be noticed that P1 (Q4 spec.) 66 4 144.18 P2 (Q6 spec.) 60 3 134.25 heuristic H4, that takes into account both sources, outperforms both H2 and H3. This proves that considering irrelevant Table III: Tests on specialized solvers. clauses arising from previoulsy loaded cones in addition to deactivated clauses can be beneficial. In addition Table IV shows that dynamic heuristics were able solve some instances Table III shows two different implementations with special- that can’t be solved by the static heuristic H1 and viceversa. In ized solvers (so two solvers per time frame): in the first one terms of overall number of solved instances, the static heuristic (P 1) the second solver handles generalization calls while in H1 outperformes our best dynamic heuristic H4. This can be the second one (P 2) the it handles propagation calls. Overall, due to the fact that the threshold parameter of H1 results from we see a little improvement by P 1, with two more instances extensive experimentation while to determine the parameters solved w.r.t P 0, including 4 instances not solved by P 0. of H4 (window size and percentage thresholds) a narrower The next two tables show an evaluation of different solver experimentation has been performed. cleanup heuristics. Let a be the number of activation variables We then focused on H4, and benchmarked it in different in the solver, |vars| the total number of variables in the solver, setups. Results are showed in table V. |clauses| the total number of clauses in the solver, u(c′ ) the Here PG encoding was used in configurations P 6 and P 8, heuristic measure discussed in Section VII and W (x, n) a single solver per time frame in P 6, additional specialized sliding window containing the values of x for the last n SAT solver for generalization in P 7 and P 8. We see that the calls. The heuristics compared are the following: specialized solver configuration appears to perform worse • H1: a > 300; with PG encoding. Also, adding a specialized solver for 1 • H2: a > 2 |vars|; generalization to the dynamic heuristic H4 doesn’t seem to u(c′ )  • H3: avg W |clause| , 1000 > 0.5 be as effective as it is when using the static heuristic H1. 26 Configuration Solved [#] New [#] Avg Time [s] static heuristic H1 hasn’t been found yet, we believe that a P0 (baseline) 64 137.00 P6 (PG) 59 3 208.85 more extensive experimentation could lead to better results. P7 (Q4 spec) 58 1 111.59 Nonetheless, the results are more interesting if we consider P8 (PG+Q4 spec) 50 1 178.56 them from the standpoint of a portfolio-based tool, since the overall coverage (by the union of all setups) is definitely Table V: Tests on mixed strategies for cleanup heuristic H4. higher. So we believe that more effort in implementation, experi- mentation, and detailed analysis of case sudies, needs to be This can be due to the fact that irrelevant clauses arising from done. We also deem that this work contributes to the discussion generalization calls are not taken into account to schedule the of new developments in the research related to IC3. clean up of the main solver that, in turn, is cleaned up less frequently. R EFERENCES [1] A. R. Bradley, “SAT-based model checking without unrolling,” in C. Experiments with ABC VMCAI, Austin, Texas, Jan. 2011, pp. 70–87. [2] A. R. Bradley, “Understanding IC3,” in SAT, ser. Lecture Notes in The 70 selected circuits were also benchmarked with ABC, Computer Science, A. Cimatti and R. Sebastiani, Eds., vol. 7317. with same pre-processing used in PdTRAV: Table VI report in Springer, 2012, pp. 1–14. row A0 the default setting of ABC. Row A1 shows the variant [3] A. Biere, A. Cimatti, E. M. Clarke, M. Fujita, and Y. Zhu, “Symbolic Model Checking using SAT procedures instead of BDDs,” in Proc. 36th with PG encoding, row A2 shows a run without dynamic TR Design Automation Conf. New Orleans, Louisiana: IEEE Computer clause loading. Row A3 finally shows a different period for Society, Jun. 1999, pp. 317–320. solver cleanup (1000 variables instead of 300). [4] M. Sheeran, S. Singh, and G. Stålmarck, “Checking Safety Properties Using Induction and a SAT Solver,” in Proc. Formal Methods in Computer-Aided Design, ser. LNCS, W. A. Hunt and S. D. Johnson, Configuration Solved [#] New [#] Avg Time [s] Eds., vol. 1954. Austin, Texas, USA: Springer, Nov. 2000, pp. 108– A0 64 138.66 125. A1 63 1 152.18 [5] P. Bjesse and K. Claessen, “SAT–Based Verification without State Space A2 63 2 158.75 Traversal,” in Proc. Formal Methods in Computer-Aided Design, ser. A3 64 138.45 LNCS, vol. 1954. Austin, TX, USA: Springer, 2000. [6] K. L. McMillan, “Interpolation and SAT-Based Model Checking,” in Proc. Computer Aided Verification, ser. LNCS, vol. 2725. Boulder, Table VI: Tests on ABC with different strategies. CO, USA: Springer, 2003, pp. 1–13. [7] N. Eén, A. Mishchenko, and R. K. Brayton, “Efficient implementation of property directed reachability,” in FMCAD, 2011, pp. 125–134. Overall, results show little variance among different settings, [8] A. Mishchenko, “ABC: A System for Sequential Synthesis and Verifi- which could suggest lesser sensitivity of ABC to different cation, http://www.eecs.berkeley.edu/∼alanmi/abc/,” 2005. tunings. Nonetheless, a further experimentation with ABC on [9] G. Cabodi, S. Nocco, and S. Quer, “Benchmarking a model checker for algorithmic improvements and tuning for performance,” Formal Methods the full set of 310 benchmarks (with 300 seconds time limit), in System Design, vol. 39, no. 2, pp. 205–227, 2011. showed a 14% improvement in the number of solved problems [10] N. Eén and N. Sörensson, “The Minisat SAT Solver, http://minisat.se,” (from 71 to 81), which indicate a potential improvement for Apr. 2009. [11] M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik, “Chaff: a portfolio-based tool, able to concurrently exploit multiple Engineering an Efficient SAT Solver,” in Proc. 38th Design Automation settings. Conf. Las Vegas, Nevada: IEEE Computer Society, Jun. 2001. [12] N. Eén and N. Sörensson, “Temporal induction by incremental SAT IX. C ONCLUSIONS solving,” Electr. Notes Theor. Comput. Sci., vol. 89, no. 4, pp. 543–560, 2003. The paper shows a detailed analysis and characterization of [13] D. A. Plaisted and S. Greenbaum, “A structure-preserving clause form SAT queries posed by IC3. We also discuss new ideas for translation,” J. Symb. Comput., vol. 2, no. 3, pp. 293–304, 1986. [14] M. Järvisalo, A. Biere, and M. Heule, “Simulating circuit-level simpli- solver allocation/loading/restarting. The experimental evalu- fications on CNF,” J. Autom. Reasoning, vol. 49, no. 4, pp. 583–619, ation done on two different state-of-the-art academic tools, 2012. shows lights and shadows, as no breakthrough or clear winner [15] N. Eén, “Practical SAT: a tutorial on applied satisfiability solving,” Slides of invited talk at FMCAD, 2007. emerges from the new ideas. [16] A. Biere and T. Jussila, “The Model Checking Competition Web Page, PG encoding showed to be less effective than expected. http://fmv.jku.at/hwmcc.” This is probably because the benefits introduced in terms of loaded formula size will be overwhelmed by the supposed worst propagation behaviour of that formula. The use of specialized solvers seems to be effective when a static cleanup heuristic is used, less effective when combined with PG encoding or a dynamic heuristic. Our experiments showed that, when a dynamic cleanup heuristic is used, IC3’s performance can be improved by taking into account both deactivated clauses and irrelevant portions of previously loaded cones. Even if a parameter configuration for H4 that is able to outperform the currently used, well-rounded 27 Lemmas on Demand for Lambdas Mathias Preiner, Aina Niemetz, and Armin Biere Institute for Formal Models and Verification Johannes Kepler University, Linz, Austria Abstract—We generalize the lemmas on demand de- proposed [10], which uses λ -terms similarly to UCLID. cision procedure for array logic as implemented in This extension provides support for modeling memset, Boolector to handle non-recursive and non-extensional lambda terms. We focus on the implementation aspects memcpy and loop summarizations. However, it does of our new approach and discuss the involved algorithms not make use of native support of λ -terms provided and optimizations in more detail. Further, we show how arrays, array operations and SMT-LIB v2 macros are by an SMT solver. Instead, it reduces instances in the represented as lambda terms and lazily handled with theory of arrays with λ -terms to a theory combination lemmas on demand. We provide experimental results that supported by solvers such as Boolector [3] (without demonstrate the effect of native lambda support within an SMT solver and give an outlook on future work. native support for λ -terms), CVC4, STP [12], and Z3 [6]. In this paper, we generalize the decision procedure I. I NTRODUCTION for the theory of arrays with bit vectors as intro- The theory of arrays as axiomatized by Mc- duced in [3] to lazily handle non-recursive and non- Carthy [14] enables us to reason about memory (com- extensional λ -terms. We show how arrays, array op- ponents) in software and hardware verification, and erations and SMT-LIB v2 macros are represented in is particularly important in the context of deciding Boolector as λ -terms and introduce a lemmas on de- satisfiability of first order formulas w.r.t. first order mand procedure for lazily handling λ -terms in Boolec- theories, also known as Satisfiability Modulo Theories tor in detail. We summarize an experimental evaluation (SMT). However, it is restricted to array operations on and compare our results to solvers with SMT-LIB v2 single array indices and lacks support for efficiently macro support (CVC4, MathSAT [5], SONOLAR [13] modeling operations such as memory initialization and and Z3) and finally, give an outlook on future work. parallel updates (memset and memcpy in the standard C library). II. P RELIMINARIES In 2002, Seshia et al. [4] introduced an approach to We assume the usual notions and terminology of first overcome these limitations by using restricted λ -terms order logic and are mainly interested in many-sorted to model array expressions (such as memset and mem- languages, where bit vectors of different bit width cor- cpy), ordered data structures and partially interpreted respond to different sorts and array sorts correspond to a functions within the SMT solver UCLID [17]. The mapping (τi ⇒ τe ) from index sort τi to element sort τe . SMT solver UCLID employs an eager SMT solving Our approach is focused primarily on the quantifier-free approach and therefore eliminates all λ -terms through first order theories of fixed size bit vectors, arrays and β -reduction, which replaces each argument variable equality with uninterpreted functions, but not restricted with the corresponding argument term as a preliminary to the above. rewriting step. Other SMT solvers that employ a lazy We call 0-arity function symbols constant symbols SMT solving approach and natively support λ -terms and a, b, i, j, and e denote constants, where a and b are such as CVC4 [1] or Yices [8] also treat them eagerly, used for array constants, i and j for array indices, and similarly to UCLID, and eliminate all occurrences of e for an array value. For each bit vector of size n, the λ -terms by substituting them with their instantiated equality =n is interpreted as the identity relation over bit function body (cf. C-style macros). Eagerly eliminating vectors of size n. We further interpret the if-then-else bit λ -terms via β -reduction, however, may result in an vector term iten as ite(>,t, e) =n t and ite(⊥,t, e) =n e. exponential blow-up in the size of the formula [17]. As a notational convention, the subscript might be Recently, an extension of the theory of arrays was omitted in the following. We identify read and write This work was funded by the Austrian Science Fund (FWF) as basic operations on array elements, where read(a, i) under NFN Grant S11408-N23 (RiSE). denotes the value of array a at index i, and write(a, i, e) 28 denotes the modified array a overwritten at position i III. λ - TERMS IN B OOLECTOR with value e. The theory of arrays (without extensional- ity) is axiomatized by the following axioms, originally In contrast to λ -term handling in other SMT solvers introduced by McCarthy in [14]: such as e.g. UCLID or CVC4, where λ -terms are eagerly eliminated, in Boolector we provide a lazy λ - i = j → read(a, i) = read(a, j) (A1) term handling with lemmas on demand. We generalized i = j → read(write(a, i, e), j) = e (A2) the lemmas on demand decision procedure for the i 6= j → read(write(a, i, e), j) = read(a, j) (A3) extensional theory of arrays introduced in [3] to handle The array congruence axiom A1 asserts that accessing lemmas on demand for λ -terms as follows. array a at two equal indices i and j produces the In order to provide a uniform handling of arrays same element. The read-over-write Axioms A2 and A3 and λ -terms within Boolector, we generalized all arrays ensure a basic characteristic of arrays: A2 asserts that (and array operations) to λ -terms (and operations on λ - accessing a modification to an array a at the index it terms) by representing array variables as uninterpreted has most recently been updated (i), produces the value it functions (UF), read operations as function applica- has been updated with (e). A3 captures the case when tions, and write and if-then-else operations on arrays a modification to an array a is accessed at an index as λ -terms. We further interpret macros (as provided other than the one it has most recently been updated at by the command define-fun in the SMT-LIB v2 format) ( j), which produces the unchanged value of the original as (curried) λ -terms. Note that in contrast to [3], our array a at position j. Note that we assume that all implementation currently does not support extensional- variables a, i, j and e in axioms A1, A2 and A3 are ity (equality) over arrays (λ -terms). universally quantified. We represent an array as exactly one λ -term with From the theory of equality with uninterpreted func- exactly one bound variable (parameter) and define its tions we primarily focus on the following axiom: representation as λ j . t( j). Given an array of sort (τi ⇒ n ^ τe ) and its λ -term representation λ j . t( j), we require ∀x̄, ȳ. xi = yi → f (x̄) = f (ȳ) (EUF) i=1 that bound variable j is of sort index τi and term t( j) is The function congruence axiom (EUF) asserts that a of sort element τe . Term t( j) is not required to contain function evaluates to the same value for the same j and if it does not contain j, it represents a constant λ - argument values. term (e.g. λ j . 0). In contrast to SMT-LIB v2 macros, We only consider a non-recursive λ -calculus, as- it is not required to represent arrays with curried λ - suming the usual notation and terminology, includ- chains, as arrays are accessed at one single index at a ing the notion of function application, currying and time (cf. read and write operations on arrays). β -reduction. In general, we denote a λ -term λx as We treat array variables as UF with exactly one λ x.t(x), where x is a variable bound by λx and t(x) argument and represent them as fa for array variable a. is a term in which x may or might not occur. We We interpret read operations as function applica- interpret t(x) as defining the scope of bound variable tions on either UF or λ -terms with read index i as x. Without loss of generality, the number of bound argument and represent them as read(a, i) ≡ fa (i) and variables per λ -term is restricted to exactly one. Func- read(λ j . t( j), i) ≡ (λ j . t( j))(i), respectively. tions with more than one parameter are transformed We interpret write operations as λ -terms model- into a chain of nested λ -terms by means of currying ing the result of the write operation on array a (e.g. f (x, y) := x + y is rewritten as λ x . λ y . x + y). As at index i with value e, and represent them as a notational convention, we will use λx̄ as a shorthand write(a, i, e) ≡ λ j . ite(i = j, e, fa ( j)). A function appli- for λ x0 . . . λ xk . t(x0 , . . . , xk ) for k ≥ 0. We identify the cation on a λ -term λw representing a write operation function application as an explicit operation on λ -terms yields value e if j is equal to the modified index and interpret it as instantiating a bound variable (all i, and the unmodified value fa ( j), otherwise. Note bound variables) of a λ -term (a curried λ -chain). We that applying β -reduction to a λ -term λw yields the interpret β -reduction as a form of function application, same behaviour described by array axioms A2 and where all formal parameter variables (bound variables) A3. Consider a function application on λw (k), where are substituted with their actual parameter terms. We k represents the position to be read from. If k = i will use λx̄ [x0 \a0 , . . . , xn \an ] to indicate β -reduction of (A2), β -reduction yields the written value e, whereas if a λ -term λx̄ , where the formal parameters x0 , . . . , xn are k 6= i (A3), β -reduction returns the unmodified value of substituted with the actual argument terms a0 , . . . , an . array a at position k represented by fa (k). Hence, these 29 axioms do not need to be explicitly checked during consistency checking. This is in essence the approach eq 2 1 to handle arrays taken by UCLID [17]. We interpret if-then-else operations on arrays applyi applyj 1 2 1 a and b as λ -terms, and represent them as 2 ite(c, a, b) ≡ λ j . ite(c, fa ( j), fb ( j)). Condition c yields lambda args eq args 21 2 1 either function application fa ( j) or fb ( j), which repre- var var sent the values of arrays a and b at index j, respectively. ite i j 2 13 In addition to the base array operations introduced apply slt above, λ -terms enable us to succinctly model array 1 2 2 1 operations like e.g. memcpy and memset from the lambda args const standard C library, which we previously were not able 12 to efficiently express by means of read, write and ite param param y x operations on arrays. In particular, both memcpy and memset could only be represented by a fixed sequence Fig. 1: DAG representation of formula ψ1 . of read and write operations within a constant index range, such as copying exactly 5 words etc. It was In contrast to treating SMT-LIB v2 macros as C-style not possible to express a variable range, e.g. copying n macros, i.e., substituting every function application with words, where n is a symbolic (bit vector) variable. the instantiated function body, in Boolector, we directly With λ -terms however, we do not require a sequence translate SMT-LIB v2 macros into λ -terms, which are of array operations as it usually suffices to model a par- then handled lazily via lemmas on demand. Formulas allel array operation by means of exactly one λ -term. are represented as directed acyclic graphs (DAG) of Further, the index range does not have to be fixed and bit vector and array expressions. Further, in this paper, can therefore be within a variable range. This type of we propose to treat arrays and array operations as λ - high level modeling turned out to be useful for applica- terms and operations on λ -terms, which results in an tions in software model checking [10]. See also [17] for expression graph with no expressions of sort array (τi ⇒ more examples. For instance, the memset with signature τe ). Instead, we introduce the following four additional memset (a, i, n, e), which sets each element of array a expression types of sort bit vector: within the range [i, i + n[ to value e, can be represented • a param expression serves as a placeholder variable as λ j . ite(i ≤ j ∧ j < i + n, e, fa ( j)). Note, n can be for a variable bound by a λ -term symbolic, and does not have to be a constant. In the • a lambda expression binds exactly one param ex- same way, memcpy with signature memcpy (a, b, i, k, n), pression, which may occur in a bit vector expression which copies all elements of array a within the range that represents the body of the λ -term [i, i + n[ to array b, starting from index k, is represented • an args expression is a list of function arguments as λ j . ite(k ≤ j ∧ j < k + n, fa (i + j − k), fb ( j)). As a • an apply expression represents a function application special case of memset, we represent array initialization that applies arguments args to a lambda expression operations, where all elements of an array are initialized Example 1: Consider ψ1 ≡ f (i) = f ( j) ∧ i 6= j with with some (constant or symbolic) value e, as λ j . e. functions f (x) := ite(x < 0, g(x), x), g(y) := −y as de- Introducing λ -terms does not only enable us to model picted in Fig. 1. Both functions are represented as λ - terms, where function g(y) returns the negation of y arrays and array operations, but further provides support and is used in function f (x), which computes the abso- for arbitrary functions (macros) by means of currying, lute value of x. Dotted nodes indicate parameterized with the following restrictions: (1) functions may not expressions, i.e., expressions that depend on param be recursive and (2) arguments to functions may not be expressions, and serve as templates that are instantiated functions. The first restriction enables keeping the im- as soon as β -reduction is applied. plementation of λ -term handling in Boolector as simple In order to lazily evaluate λ -terms in Boolector we as possible, whereas the second restriction limits λ -term implemented two β -reduction approaches, which we handling in Boolector to non-higher order functions. will discuss in the next section in more detail. Relaxing these restrictions will turn the considered λ - calculus to be Turing-complete and in general render IV. β - REDUCTION the decision problem to be undecidable. As future work In this section we discuss how concepts from the it might be interesting to consider some relaxations. λ -calculus have been adapted and implemented in 30 our SMT solver Boolector. We focus on reduction algorithms for the non-recursive λ -calculus, which is eq 1 rather atypical for the (vast) literature on λ -calculus. 2 In the context of Boolector, we distinguish between 2 apply apply 1 2 full and partial β -reduction. They mainly differ in 1 args args their application and the depth up to which λ -terms 1 2 lambda 2 1 1 2 are expanded. In essence, given a function application var var var λx̄ (a0 , . . . , an ) partial β -reduction reduces only the top- j 1 lambda k l 2 most λ -term λx̄ , whereas full β -reduction reduces λx̄ and every λ -term in the scope of λx̄ . 3 ite 1 2 Full β -reduction of a function application on λ -term apply apply λx̄ consists of a series of β -reductions, where λ -term 2 1 2 1 λx̄ and every λ -term λȳ within the scope of λx̄ are in- args args stantiated, substituting all formal parameters with actual 1 ult 2 lambda 2 1 parameter terms. Since we do not allow partial function param param ite applications, full β -reduction guarantees to yield a term y 1 x 2 3 which is free of λ -terms. Given a formula with λ -terms, eq var mul we usually employ full β -reduction in order to eliminate e 1 1 2 2 all λ -terms by substituting every function application var param const with the term obtained by applying full β -reduction i x on that function application. In the worst case, full β - (a) Original formula ψ2 . reduction results in an exponential blow-up. However, in practice, it is often beneficial to employ full β - reduction, since it usually leads to significant simplifi- eq 2 cations through rewriting. In Boolector, we incorporate 1 this method as an optional rewriting step. We will use 1 ite 3 ite 2 3 1 2 λx̄ [x0 \a0 , . . . , xn \an ]f as a shorthand for applying full β -reduction to λx̄ with arguments a0 , . . . , an . 1 and 1 ite 2 ite 2 3 Partial β -reduction of a λ -term λx̄ , on the other 3 2 1 var hand, essentially works in the same way as what is eq ult 1 mul 1 eq mul e 1 eq 2 mul ult 2 2 1 1 2 2 2 1 2 1 1 2 referred to as β -reduction in the λ -calculus. Given a var var var var function application λx̄ (a0 , . . . , an ), partial β -reduction j i const l k substitutes formal parameters x0 , . . . , xn with the actual (b) Formula ψ20 after full β -reduction of ψ2 . argument terms a0 , . . . , an without applying β -reduction Fig. 2: Full β -reduction of formula ψ2 . to other λ -terms within the scope of λx̄ . This has the effect that λ -terms are expanded function-wise, which a) Visit arguments b0 , . . . , bm first, and obtain rebuilt we require for consistency checking. In the following, arguments b00 , . . . , b0m . we use λx̄ [x0 \a0 , . . . , xn \an ]p to denote the application b) Assign rebuilt arguments b00 , . . . , b0m to λȳ and of partial β -reduction to λx̄ with arguments a0 , . . . , an . apply β -reduction to λȳ (b00 , . . . , b0m ). This ensures a bottom-up construction of the β - A. Full β -reduction reduced DAG (see step 3.), since all arguments Given a function application λx̄ (a0 , . . . , an ) and a b00 , . . . , b0m passed to a λ -term λȳ are β -reduced and DAG representation of λx̄ . Full β -reduction of λx̄ rebuilt prior to applying β -reduction to λȳ . consecutively substitutes formal parameters with actual 3) During up-traversal of the DAG we rebuild all argument terms while traversing and rebuilding the visited expressions bottom-up and require special DAG in depth-first-search (DFS) post-order as follows. handling for the following expressions: 1) Initially, we instantiate λx̄ by assigning arguments • param: substitute param expression yi with cur- a0 , . . . , an to the formal parameters x0 , . . . , xn . rent instantiation b0i 2) While traversing down, for any λ -term λȳ in the • apply: substitute expression λȳ (b0 , . . . , bm ) with scope of λx̄ , we need special handling for each λȳ [y0 \b00 , . . . , ym \b0m ]f function application λȳ (b0 , . . . , bm ) as follows. 31 We further employ following optimizations to improve the performance of the full β -reduction algorithm. eq • Skip expressions that do not need rebuilding 2 1 Given an expression e within the scope of a λ -term ite ite λx̄ . If e is not parameterized and does not contain 3 1 2 2 3 1 any λ -term, e is not dependent on arguments passed apply apply apply 1 to λx̄ and may therefore be skipped. 2 1 2 1 2 • λ -scope caching args ult args lambda args ult 1 1 We cache rebuilt expressions in a λ -scope to prevent 2 1 2 2 rebuilding parameterized expressions several times. var l var k ite var j Example 2: Given a formula ψ2 ≡ f (i, j) = f (k, l) 3 1 2 and two functions g(x) := ite(x = i, e, 2 ∗ x) and mul eq var e 2 f (x, y) := ite(y < x, g(x), g(y)) as depicted in Fig. 2a. 2 1 1 Applying full β -reduction to formula ψ2 yields formula param const var i ψ20 as illustrated in Fig. 2b. Function application f (i, j) has been reduced to ite( j ≥ i ∧ i 6= j, 2 ∗ j, e) and f (k, l) Fig. 3: Partial β -reduction of formula ψ2 . to ite(l < k, ite(k = i, e, 2 ∗ k), ite(l = i, e, 2 ∗ l)). • if-then-else: substitute expression ite(c,t1 ,t2 ) with t1 if c = >, and t2 otherwise B. Partial β -reduction For partial β -reduction, we have to modify the first of Given a function application λx̄ (a0 , . . . , an ) and a the two optimizations introduced for full β -reduction. DAG representation of λx̄ . The scope of a partial β - • Skip expressions that do not need rebuilding reduction β p (λx̄ ) is defined as the sub-DAG obtained Given an expression e in the scope of partial β - by cutting off all λ -terms in the scope of λx̄ . Assume reduction β p (λx̄ ). If e is not parameterized, in the that we have an assignment for arguments a0 , . . . , an , context of partial β -reduction, e is not dependent and for all non-parameterized expressions in the scope on arguments passed to λx̄ and may be skipped. of β p (λx̄ ). The partial β -reduction algorithm substi- Example 3: Consider ψ2 from Ex. 2. Applying partial tutes param expressions x0 , . . . , xn with a0 , . . . , an and β -reduction to ψ2 yields the formula depicted in Fig. 3, rebuilds λx̄ . Similar to full β -reduction, we perform a where function application f (i, j) has been reduced to DFS post-order traversal of the DAG as follows. ite( j < i, e, g( j)) and f (k, l) to ite(l < k, g(k), g(l)). 1) Initially, we instantiate λx̄ by assigning arguments V. D ECISION P ROCEDURE a0 , . . . , an to the formal parameters x0 , . . . , xn . 2) While traversing down the DAG, we require special The idea of lemmas on demand goes back to [7] handling for the following expressions: and actually represents one extreme variant of the lazy • function applications λȳ (b0 , . . . , bm ) SMT approach [16]. Around the same time, a related technique was developed in the context of bounded a) Visit arguments b0 , . . . , bm , obtain rebuilt ar- model checking [9], which lazily encodes all-different guments b00 , . . . , b0m . constraints over bit vectors (see also [2]). In constraint b) Do not assign rebuilt arguments b00 , . . . , b0m to programming the related technique of lazy clause gen- λȳ and stop down-traversal at λȳ . eration [15] is effective too. • ite(c,t1 ,t2 ) In this section, we introduce lemmas on demand for Since we have an assignment for all non- non-recursive λ -terms based on the algorithm intro- parameterized expressions within the scope of duced in [3]. A top-level view of our lemmas on demand β p (λx̄ ), we are able to evaluate condition c. Based decision procedure for λ -terms (DPλ ) is illustrated in on that we either select t1 or t2 to further traverse Fig. 4 and proceeds as follows. Given a formula φ , down (the other branch is omitted). DPλ uses a bit vector skeleton of the preprocessed 3) During up-traversal of the DAG we rebuild all formula π as formula abstraction αλ (π). In each itera- visited expressions bottom-up and require special tion, an underlying decision procedure DPB determines handling for the following expressions: the satisfiability of the formula abstraction refined by • param: substitute param expression yi with cur- formula refinement ξ , i.e., in DPB , we eagerly encode rent instantiation b0i the refined formula abstraction Γ to SAT and determine 32 p r o c e d u r e DPλ ( φ ) π ← preprocess(φ ) eq eq ξ ←> 1 2 1 2 loop var var αλ (applyi ) αλ (applyj ) Γ ← αλ (π) ∧ ξ i j (r, σ ) ← DPB (Γ) i f r = unsatisfiable r e t u r n unsatisfiable Fig. 5: Formula abstraction αλ (ψ1 ). i f consistentλ (π, σ ) r e t u r n satisfiable ξ ← ξ ∧ αλ (lemmaλ (π, σ )) Example 4: Consider formula ψ1 from Ex. 1, which has two roots. The abstraction function αλ performs a Fig. 4: Lemmas on demand for λ -terms DPλ . consecutive down-traversal of the DAG from both roots. The resulting abstraction is a mapping of all bit vector its satisfiability by means of a SAT solver. As Γ is terms encountered during traversal, according to the an over-approximation of φ , we immediately conclude rules 1-3 above. For function applications (e.g. applyi ) with unsatisfiable if Γ is unsatisfiable. If Γ is satisfiable, fresh bit vector variables (e.g. αλ (applyi )) are intro- we have to check if the current satisfying assign- duced, where the remaining sub-DAGs are therefore cut ment σ (as provided by procedure DPB ) is consistent off. The resulting abstraction αλ (ψ1 ) is given in Fig. 5. w.r.t. preprocessed formula π. If σ is consistent, i.e., if it can be extended to a valid satisfying assignment for VII. C ONSISTENCY C HECKING the preprocessed formula π, we immediately conclude In this section, we introduce a consistency checking with satisfiable. Otherwise, assignment σ is spurious, algorithm consistentλ as a generalization of the con- consistentλ (π, σ ) identifies a violation of the function sistency checking approach presented in [3]. However, congruence axiom EUF, and we generate a symbolic in contrast to [3], we do not propagate so-called access lemma lemmaλ (π, σ ) which is added to formula re- nodes but function applications and further check axiom finement ξ in its abstracted form αλ (lemmaλ (π, σ )). EUF (while applying partial β -reduction to evaluate Note that in φ , in contrast to the decision procedure function applications under a current assignment) in- introduced in [3], all array variables and array opera- stead of checking array axioms A1 and A2. Given a tions in the original input have been abstracted away satisfiable over-approximated and refined formula Γ, and replaced by corresponding λ -terms and operations procedure consistentλ determines whether a current on λ -terms. Hence, various integral components of the satisfying assignment σ (as provided by the under- original procedure (αλ , consistentλ , lemmaλ ) have been lying decision procedure DPB ) is spurious, or if it adapted to handle λ -terms as follows. can be extended to a valid satisfying assignment for the preprocessed input formula π. Therefore, for each VI. F ORMULA A BSTRACTION function application in π, we have to check both if In this section, we introduce a partial formula ab- the assignment of the corresponding abstraction vari- straction function αλ as a generalization of the ab- able is consistent with the value obtained by applying straction approach presented in [3]. Analogous to [3], partial β -reduction, and if axiom EUF is violated. If we replace function applications by fresh bit vector consistentλ does not find any conflict, we immediately variables and generate a bit vector skeleton as for- conclude that formula π is satisfiable. However, if mula abstraction. Given π as the preprocessed input current assignment σ is spurious w.r.t. preprocessed formula φ , our abstraction function αλ traverses down formula π, either axiom EUF is violated or partial β - the DAG structure starting from the roots, and generates reduction yields a conflicting value for some function an over-approximation of π as follows. application in π. In both cases, we generate a lemma 1) Each bit vector variable and symbolic constant is as formula refinement. In the following we will equally mapped to itself. use function symbols f , g, and h for UF symbols and 2) Each function application λx̄ (a0 , . . . , an ) is mapped λ -terms. to a fresh bit vector variable. In order to check axiom EUF, for each λ -term and UF 3) Each bit vector term t(y0 , . . . , ym ) is mapped to symbol we maintain a hash table ρ, which maps λ - t(αλ (y0 ), . . . , αλ (ym )). terms and UF symbols to function applications. We Note that by introducing fresh variables for function check consistency w.r.t. π by applying the following applications, we essentially cut off λ -terms and UF rules. and therefore yield a pure bit vector skeleton, which I: For each f (ā), if ā is not parameterized, is subsequently eagerly encoded to SAT. add f (ā) to ρ( f ) 33 C: For any pair s := g(ā), t := h(b̄) ∈ ρ( f ) check p r o c e d u r e consistentλ (π, σ ) n ^ S ← nonparam_apps ( π ) σ (αλ (ai )) = σ (αλ (bi )) → σ (αλ (s)) = σ (αλ (t)) w h i l e S 6= 0/ i=0 (g, f (a0 , . . . , an )) ← pop ( S ) encode ( f (a0 , . . . , an ) ) B: For any s := λȳ (a0 , . . . , an ) ∈ ρ(λx̄ ) with /∗ check r u l e C ∗/ t := λx̄ [x0 \a0 , . . . , xn \an ]p , i f n o t congruent ( g, f (a0 , . . . , an ) ) return ⊥ check rule P, if P fails, check eval(t) = σ (αλ (s)) add ( f (a0 , . . . , an ), ρ(g) ) P: For any s := λȳ (a0 , . . . , an ) ∈ ρ(λx̄ ) with i f is_UF ( g ) c o n t i n u e encode ( g ) t := g(b0 , . . . , bm ) = λx̄ [x0 \a0 , . . . , xn \an ]p , /∗ check r u l e B ∗/ n ^ t ← g[x0 \a0 , . . . , xn \an ]p if n = m ∧ ai = bi , propagate s to ρ(g) i f assigned ( t ) i f σ (t) 6= σ (αλ ( f (a0 , . . . , an ))) i=0 return ⊥ e l i f t = h(a0 , . . . , an ) / ∗ c h e c k r u l e P ∗ / Given a λ -term (UF symbol) f and a correspond- push ( S, (h, f (a0 , . . . , an )) ) ing hash table ρ( f ). Rule I, the initialization rule, continue else initializes ρ( f ) with all non-parameterized function apps ← f resh apps(t) applications on f . Rule C corresponds to the function f o r a ∈ apps encode ( a ) congruence axiom and is applied whenever we add i f eval ( t ) 6= σ (αλ ( f (a0 , . . . , an ))) a function application g(a0 , . . . , an ) to ρ( f ). Rule B return ⊥ f o r h(b0 , . . . , bm ) ∈ apps is a consistency check w.r.t. the current assignment push ( S, (h, h(b0 , . . . , bm )) ) σ , i.e., for every function application s in ρ( f ), we return > check if the assignment of σ (αλ (s)) corresponds to Fig. 6: Procedure consistentλ in pseudo-code. the assignment evaluated by the partially β -reduced term λx̄ [x0 \a0 , . . . , xn \an ]p . Finally, rule P represents a loop, we check rules C and B for each tuple as follows. crucial optimization of consistentλ , as it avoids unnec- First we check if f (a0 , . . . , an ) violates the function con- essary conflicts while checking B. If P applies, both gruence axiom EUF w.r.t. function g and return ⊥ if this function applications s and t have the same arguments. is the case. Note that for checking rule C, we require an As function application s ∈ ρ(λx̄ ), rule C implies assignment for arguments a0 , . . . , an , hence we encode that s = λx̄ (a0 , . . . , an ). Therefore, function applications them on-the-fly. If rule C is not violated and function f s and t must produce the same function value as is an uninterpreted function, we continue to check the t := λx̄ [x0 \a0 , . . . , xn \an ]p = λȳ [x0 \a0 , . . . , xn \an ]p , i.e., next tuple on stack S. However, if f is a λ -term we function application t must be equal to the result of still need to check rule B, i.e., we need to check if the applying partial β -reduction to function application s. assignment σ (αλ ( f (a0 , . . . , an ))) is consistent with the Assume we encode t and add it to the formula. If DPB value produced by g[x0 \a0 , . . . , xn \an ]p . Therefore, we guesses an assignment s.t. σ (αλ (t)) 6= σ (αλ (s)) holds, first encode all non-parameterized expressions in the we have a conflict and need to add a lemma. However, scope of partial β -reduction β p (g) (cf. encode(g)) this conflict is unnecessary, as we know from the start before applying partial β -reduction with arguments that both function applications must map to the same a0 , . . . , an , which yields term t. If term t has an as- function value in order to be consistent. We avoid this signment, we can immediately check if it differs from conflict by propagating s to ρ(g). assignment σ (αλ ( f (a0 , . . . , an ))) and return ⊥ if this is Figure 6 illustrates our consistency checking algo- the case. However, if term t does not have an assign- rithm consistentλ , which takes the preprocessed input ment, which is the case when t has been instantiated formula π and a current assignment σ as arguments, and from a parameterized expression, we have to compute proceeds as follows. First, we initialize stack S with all the value for term t. Note that we could also encode non-parameterized function applications in formula π term t to get an assignment σ (t), but this might add a (cf. nonparam_apps(π)) and order them top-down, considerable amount of superfluous clauses to the SAT according to their appearance in the DAG represen- solver. Before computing a value for t we check if rule tation of π. The top-most function application then P applies and propagate f (a0 , . . . , an ) to h if applicable. represents the top of stack S, which consists of tuples Otherwise, we need to compute a value for t and (g, f (a0 , . . . , an )), where f and g are initially equal and check if t contains any function applications that were f (a0 , . . . , an ) denotes the function application propa- instantiated and not yet encoded (cf. fresh_apps(t)) gated to function g. In the main consistency checking and encode them if necessary. Finally, we compute 34 the value for t (cf. eval(t)) and compare it to the we run DPB on αλ (ψ1 ) and it returns a satisfying assignment of αλ ( f (a0 , . . . , an )). If the values differ, assignment σ such that σ (i) 6= σ ( j), σ (ai ) = σ (a j ), we found an inconsistency and return ⊥. Otherwise, σ (i) < 0 and σ (ai ) 6= σ (−i). First, we check con- we continue consistency checking the newly encoded sistency for λx (i) and check rule C, which is not function applications apps. We conclude with >, if all violated as σ (i) 6= σ ( j), and continue with checking rule B. We apply partial β -reduction and obtain term function applications have been checked successfully t := λx [x/i]p = λy (i) (since σ (i) < 0) for which rule P and no inconsistencies have been found. is applicable. We propagate λx (i) to λy , check if λx (i) is consistent w.r.t. λy , apply partial β -reduction, obtain A. Lemma generation t := λy [y/i]p = −i and find an inconsistency according Following [3], we introduce a lemma generation to rule B: σ (ai ) 6= σ (−i) but we obtained σ (ai ) = σ (−i). We generate lemma i < 0 → ai = −i. Assume procedure lemmaλ , which generates a symbolic lemma that in the next iteration DBP returns a new satisfying whenever our consistency checker detects an inconsis- assignment σ such that σ (i) 6= σ ( j), σ (ai ) = σ (a j ), tency. Depending on whether rule C or B was violated, σ (i) < 0, σ (ai ) = σ (−i) and σ ( j) > σ (−i). We first we generate a symbolic lemma as follows. Assume check consistency for λx (i), which is consistent due to that rule C was violated by function applications s := the lemma we previously generated. Next, we check g(a0 , . . . , an ), t := h(b0 , . . . , bn ) ∈ ρ( f ). We first collect rule C for λx ( j), which is not violated since σ (i) 6= all conditions that lead to the conflict as follows. σ ( j), and continue with checking rule B. We apply partial β -reduction and obtain term t := λx [x/ j]p = j 1) Find the shortest possible propagation path ps (pt ) (since σ ( j) > σ (−i) and σ (i) < 0) and find an incon- from function application s (t) to function f . sistency as σ (ai ) = σ (−i), σ (ai ) = σ (a j ) and σ ( j) > 2) Collect all ite conditions cs0 , . . . , csj (ct0 , . . . , ctl ) on σ (−i), but σ (a j ) = σ ( j). We then generate lemma path ps (pt ) that were > under given assignment σ . j > 0 → a j = j. 3) Collect all ite conditions cs0 , . . . , csk (ct0 , . . . , ctm ) on path ps (pt ) that were ⊥ under given assignment σ . VIII. E XPERIMENTS We generate the following (in general symbolic) We applied our lemmas on demand approach for lemma: λ -terms on three different benchmark categories: (1) j ^ k ^ l ^ m ^ n ^ crafted, (2) SMT’12, and (3) application. For the crafted csi ∧ ¬csi ∧ cti ∧ ¬cti ∧ ai = bi → s = t category, we generated benchmarks using SMT-LIB v2 i=0 i=0 i=0 i=0 i=0 macros, where the instances of the first benchmark set Assume that rule B was violated by a function (macro blow-up) tend to blow up in formula size if application s := λȳ (a0 , . . . , an ) ∈ ρ(λx̄ ). We obtained SMT-LIB v2 macros are treated as C-style macros. t := λx̄ [x0 \a0 , . . . , xn \an ]p and collect all conditions that The benchmark sets fisher-yates SAT and fisher-yates lead to the conflict as follows. UNSAT encode an incorrect and correct but naive im- 1) Collect ite conditions cs0 , . . . , csj and cs0 , . . . , csk for s plementation of the Fisher-Yates shuffle algorithm [11], as in steps 1-3 above. where the instances of the fisher-yates SAT also tend to blow up in the size of the formula if SMT-LIB 2) Collect all ite conditions ct0 , . . . , ctl that evaluated to v2 macros are treated as C-style macros. The SMT’12 > under current assignment σ when partially β - category consists of all non-extensional QF AUFBV reducing λx̄ to obtain t. benchmarks used in the SMT competition 2012. For 3) Collect all ite conditions ct0 , . . . , ctm that evaluated the application category, we considered the instantia- to ⊥ under current assignment σ when partially β - tion benchmarks1 generated with LLBMC as presented reducing λx̄ to obtain t. in [10]. The authors also kindly provided the same We generate the following (in general symbolic) benchmark family using λ -terms as arrays, which is lemma: j k l m denoted as lambda. ^ ^ ^ ^ csi ∧ ¬csi ∧ cti ∧ ¬cti → s = t We performed all experiments on 2.83GHz Intel Core i=0 i=0 i=0 i=0 2 Quad machines with 8GB of memory running Ubuntu Example 5: Consider formula ψ1 and its prepro- 12.04.2 setting a memory limit of 7GB and a time limit cessed formula abstraction αλ (ψ1 ) from Ex. 1. For the for the crafted and the SMT’12 benchmarks of 1200 sake of better readability, we will use λx and λy to seconds. For the application benchmarks, as in [10] denote functions f and g, and further use ai and a j as a shorthand for αλ (applyi ) and αλ (applyj ). Assume 1 http://llbmc.org/files/downloads/vstte-2013.tgz 35 Time Space Time Space Solver Solved TO MO Solver Solved TO MO [103 s] [GB] [103 s] [GB] Boolector 100 0 0 24.2 9.4 Boolector 139 10 0 19.9 14.8 SMT’12 macro blow-up Boolectornop 100 0 0 18.2 8.4 Boolectornop 134 15 0 26.3 14.5 Boolectorβ 28 49 23 91.5 160.0 Boolectorβ 137 11 1 21.5 22.7 CVC4 21 0 79 95.7 551.6 Boolectorsc12 140 9 0 15.9 10.3 MathSAT 51 2 47 64.6 395.0 SONOLAR 26 74 0 90.2 1.7 TABLE II: Results SMT’12 benchmark. Z3 21 0 79 95.0 552.2 Boolector 7 10 1 14.0 7.5 mark set is SONOLAR. However, it is not clear how Boolectornop 4 13 1 17.3 7.0 fisher-yates Boolectorβ 6 1 11 15.0 76.4 SONOLAR handles SMT-LIB v2 macros. Surprisingly, SAT CVC4 5 1 12 15.7 83.6 on these benchmarks Boolectornop performs better than MathSAT 6 10 2 14.7 17.3 Boolector with optimization rule P, which needs fur- SONOLAR 3 14 1 18.1 6.9 Z3 6 12 0 14.8 0.2 ther investigation. On the fisher-yates SAT benchmarks Boolector 5 13 1 17.4 7.1 Boolector not only solves the most instances, but re- Boolectornop 4 14 1 18.2 6.9 fisher-yates quires 107 seconds for the first 6 instances, for which UNSAT Boolectorβ 9 0 10 12.1 72.0 CVC4 3 4 12 19.2 82.1 Boolectorβ , MathSAT and Z3 need more than 300 MathSAT 6 11 2 15.9 14.7 seconds each. Boolectornop does not perform as well SONOLAR 3 15 1 19.2 6.8 Z3 10 9 0 11.2 2.2 as Boolector due to the fact that on these benchmarks optimization rule P is heavily applied. In fact, on these TABLE I: Results crafted benchmark. benchmarks, rule P applies to approx. 90% of all prop- we used a time limit of 60 seconds. We evaluated agated function applications on average. On the fisher- four different versions of Boolector: (1) our lemmas yates UNSAT benchmarks Z3 and Boolectorβ solve the on demand for λ -terms approach DPλ (Boolector), most instances, whereas Boolector and Boolectornop do (2) DPλ without optimization rule P (Boolectornop ), not perform so well. This is mostly due to the fact (3) DPλ with full β -reduction (Boolectorβ ), and (4) that these benchmarks can be simplified significantly the version submitted to the SMT competition 2012 when macros are eagerly eliminated, whereas partial (Boolectorsc12 ). For comparison we used the following β -reduction does not yield as much simplifications. SMT solvers: CVC4 1.2, MathSAT 5.2.6, SONOLAR We measured overhead of β -reduction in Boolector on 2013-05-15, STP 1673 (svn revision), and Z3 4.3.1. these benchmarks and it turned out that for the macro Note that we limited the set of solvers to those which blow-up and fisher-yates UNSAT instances the overhead currently support SMT-LIB v2 macros and the theory is negligible (max. 3% of total run time), whereas for of fixed-size bit vectors. As a consequence, we did not the fisher-yates SAT instances β -reduction requires over compare our approach to UCLID (no bit vector support) 50% of total run time. and Yices, which both have native λ -term support, but Table II summarizes the results of running all lack support for the SMT-LIB v2 standard. four Boolector versions on the SMT’12 benchmark As indicated in Tables I, II and III, we measured the set. We compared our three approaches Boolector, number of solved instances (Solved), timeouts (TO), Boolectornop , and Boolectorβ to Boolectorsc12 , which memory outs (MO), total CPU time (Time), and total won the QF AUFBV track in the SMT competition memory consumption (Space) required by each solver 2012. In comparison to Boolectorβ , Boolector solves 5 for solving an instance. If a solver ran into a timeout, unique instances, whereas Boolectorβ solves 3 unique 1200 seconds (60 seconds for category application) instances. In comparison to Boolectorsc12 , both solvers were added to the total time as a penalty. In case of a combined solve 2 unique instances. Overall, on the memory out, 1200 seconds (60 seconds for application) SMT’12 benchmarks Boolectorsc12 still outperforms and 7GB were added to the total CPU time and total the other approaches. However, our results still look memory consumption, respectively. promising since none of the approaches Boolector, Table I summarizes the results of the crafted bench- Boolectornop and Boolectorβ are heavily optimized yet. mark category. On the macro blow-up benchmarks, On these benchmarks, the overhead of β -reduction in Boolector and Boolectornop benefit from lazy λ -term Boolector is around 7% of the total run time. handling and thus, outperform all those solvers which Finally, Table III summarizes the results of the appli- try to eagerly eliminate SMT-LIB v2 macros with a cation category. We used the benchmarks obtained from very high memory consumption as a result. The only the instantiation-based reduction approach presented solver not having memory problems on this bench- in [10] (instantiation benchmarks) and compared our 36 Time Space Solver Solved TO MO in the SMT competition 2012. With the application [s] [MB] Boolector 37 8 0 576 235 benchmarks, we demonstrated the potential of native instantiation Boolectornop 35 10 0 673 196 λ -term support within an SMT solver. Our experiments Boolectorβ 44 1 0 138 961 Boolectorsc12 39 6 0 535 308 look promising even though we employ a rather naive STP 44 1 0 141 3814 implementation of β -reduction in Boolector and also Boolector 37 8 0 594 236 Boolectornop 35 10 0 709 166 do not incorporate any λ -term specific rewriting rules lambda Boolectorβ 45 0 0 52 676 except full β -reduction. Boolectorsc12 - - - - - In future work we will address the performance STP - - - - - bottleneck of the β -reduction implementation and will TABLE III: Results application benchmarks. further add λ -term specific rewriting rules. We will analyze the impact of various β -reduction strategies on new approaches to STP, the same version of the solver our lemmas on demand procedure and will further add that outperformed all other solvers on these benchmarks support for extensionality over λ -terms. Finally, with in the experimental evaluation of [10]. On the instanti- the recent and ongoing discussion within the SMT-LIB ation benchmarks Boolectorβ and STP solve the same community to add support for recursive functions, we number of instances in roughly the same time. However, consider extending our approach to recursive λ -terms. Boolectorβ requires less memory for solving those instances. Boolector, Boolectornop and Boolectorsc12 X. ACKNOWLEDGEMENTS did not perform so well on these benchmarks because We would like to thank Stephan Falke, Florian Merz in contrast to Boolectorβ and STP, they do not ea- and Carsten Sinz for sharing benchmarks and Bruno gerly eliminate read operations, which is beneficial on Duterte for explaining the implementation and limits these benchmarks. The lambda benchmarks consist of of lambdas in SMT solvers, and more specifically in the same problems as instantiation, using λ -terms for Yices. representing arrays. On these benchmarks, Boolectorβ clearly outperforms Boolector and Boolectornop and R EFERENCES solves all 45 instances within a fraction of time. [1] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jo- Boolectorsc12 and STP do not support λ -terms as arrays vanovic, T. King, A. Reynolds, and C. Tinelli. CVC4. In CAV, volume 6806 of LNCS, pages 171–177. Springer, 2011. and therefore were not able to participate on this bench- [2] A. Biere and R. Brummayer. Consistency Checking of All Different Constraints over Bit-Vectors within a SAT Solver. mark set. By exploiting the native λ -term support for In FMCAD, pages 1–4. IEEE, 2008. arrays in Boolectorβ , in comparison to the instantiation [3] R. Brummayer and A. Biere. Lemmas on Demand for the Extensional Theory of Arrays. JSAT, 6(1-3):165–201, 2009. benchmarks we achieve even better results. Note that on [4] R. E. Bryant, S. K. Lahiri, and S. A. Seshia. Modeling and the lambda (instantiation) benchmarks, the overhead in Verifying Systems Using a Logic of Counter Arithmetic with Lambda Expressions and Uninterpreted Functions. In CAV, Boolectorβ for applying full β -reduction was around volume 2404 of LNCS, pages 78–92. Springer, 2002. [5] A. Cimatti, A. Griggio, B. J. Schaafsma, and R. Sebastiani. 15% (less than 2%) of the total run time. The MathSAT5 SMT Solver. In TACAS, volume 7795 of Benchmarks, binaries of Boolector and all log files LNCS, pages 93–107. Springer, 2013. [6] L. De Moura and N. Bjørner. Z3: an efficient SMT solver. of our experiments can be found at: http://fmv.jku.at/ In Proc. ETAPS’08, pages 337–340, 2008. difts-rev-13/lloddifts13.tar.gz. [7] L. M. de Moura, H. Rueß, and M. Sorea. Lazy Theorem Proving for Bounded Model Checking over Infinite Domains. In CADE, volume 2392 of LNCS. Springer, 2002. [8] B. Dutertre and L. de Moura. The Yices SMT solver. Tool IX. C ONCLUSION paper at http://yices.csl.sri.com/tool-paper.pdf, Aug. 2006. [9] N. Eén and N. Sörensson. Temporal induction by incremental In this paper, we introduced a new decision procedure SAT solving. ENTCS, 89(4):543–560, 2003. for handling non-recursive and non-extensional λ -terms [10] S. Falke, F. Merz, and C. Sinz. Extending the Theory of Arrays: memset, memcpy, and Beyond. In Proc. VSTTE’13. as a generalization of the array decision procedure [11] R. Fisher and F. Yates. Statistical tables for biological, agricultural and medical research. Oliver and Boyd, 1953. presented in [3]. We showed how arrays, array op- [12] V. Ganesh and D. L. Dill. A Decision Procedure for Bit- erations and SMT-LIB v2 macros are represented in Vectors and Arrays. In Proc. CAV’07. Springer-Verlag, 2007. [13] F. Lapschies, J. Peleska, E. Gorbachuk, and T. Mangels. Boolector and evaluated our new approach with 3 SONOLAR SMT-Solver. System Desc. SMT-COMP’12. http: different benchmark categories: crafted, SMT’12 and //smtcomp.sourceforge.net/2012/reports/sonolar.pdf, 2012. [14] J. McCarthy. Towards a Mathematical Science of Computa- application. The crafted category showed the benefit tion. In IFIP Congress, pages 21–28, 1962. [15] O. Ohrimenko, P. J. Stuckey, and M. Codish. Propagation via of lazily handling SMT-LIB v2 macros where eager lazy clause generation. Constraints, 14(3):357–391, 2009. macro elimination tends to blow-up the formula in size. [16] R. Sebastiani. Lazy Satisability Modulo Theories. JSAT, 3(3- 4):141–224, 2007. We further compared our new implementation to the [17] S. A. Seshia. Adaptive Eager Boolean Encoding for Arith- version of Boolector that won the QF AUFBV track metic Reasoning in Verification. PhD thesis, CMU, 2005. 37 CHIMP: a Tool for Assertion-Based Dynamic Verification of SystemC Models Sonali Dutta Deian Tabakov Moshe Y. Vardi Rice University Schlumberger Rice University Email:Sonali.Dutta@rice.edu Email:deian@dtabakov.com Email:vardi@cs.rice.edu Abstract—CHIMP is a tool for assertion-based dynamic The growing popularity of SystemC has motivated verification of SystemC models. The various features of research aimed at assertion-based dynamic verification CHIMP include automatic generation of monitors from (ABDV) of SystemC models [9]. ABDV involves two temporal assertions, automatic instrumentation of the steps: generating run-time monitors from input asser- model-under-verification (MUV), and three-way commu- tions [8], and executing the model-under-verification nication among the MUV, the generated monitors, and the SystemC simulation kernel during the monitored execution (MUV) while running the monitors along with the model. of the instrumented MUV. Empirical results show that The monitors observe the execution of the MUV and CHIMP puts minimal runtime overhead on the monitored report if the observed behavior is consistent with the execution of the MUV. specified behavior. For discussion of related work in A newly added path in CHIMP results in a significant ABDV of SystemC see [7]. (over 75%) reduction of average monitor generation and CHIMP, available as an open-source tool1 , imple- compilation time. The average size of the monitors is re- ments the monitoring framework for temporal SystemC duced by over 60%, without increasing runtime overhead. properties described in [9]–see discussion below. CHIMP consists of (1) off-the-self components, (2) modified com- I. I NTRODUCTION ponents, and (3) an original component. CHIMP has two off-the-self components: (1) spot-1.1.1 [3], a SystemC (IEEE Standard 1666-2005) has emerged as C++ library used for LTL-to-Büchi conversion , and (2) a de facto standard for modeling of hardware/software AspectC++-1.1 [6], a C++ aspect compiler that is systems [4], supporting different levels of abstraction, used for instrumentation of the MUV. CHIMP has two iterative model refinement, and execution of the model modified components: (1) a patched version of the OSCI during each design stage. SystemC is implemented as kernel-2.2 to facilitate communication between the a C++ library with macros and base classes for mod- kernel and the monitors [9], and (2) an extension of eling processes, modules, channels, signals, ports, and Automaton-1.11 [5], a Java tool used for deter- the like, while an event-driven simulation kernel allows minization and minimization of finite automata, with the efficient simulation of concurrent models. Thus, on one ability to read automata descriptions from file. Finally, hand, SystemC leverages the natural object-oriented en- the original component is MONASGEN, a C++ tool for capsulation, data hiding, and well-defined inheritance automatic generation of monitors from assertions [8] and mechanism of C++, and, on the other hand, it allows for automatic generation of an aspect advice file for modeling and efficient simulation of hardware/software instrumentation [11]. Fig. 1 shows the five components of designs by its simulation kernel and predefined hardware- CHIMP as described above. The component Automaton- specific macros and classes. The SystemC code is avail- 1.11 is dotted because a newly added improved path able as open source, including a single-core reference in CHIMP has been able to remove the dependency of simulation kernel, referred to as the OSCI kernel; see CHIMP on Automaton-1.11 (explained in Section V). http://www.accellera.org/downloads/standards/systemc. CHIMP takes the MUV and a set of temporal as- Work supported in part by NSF grants CNS 1049862 and CCF- sertions about the behavior of that MUV as inputs, 1139011, by NSF Expeditions in Computing project ”ExCAPE: Expeditions in Computer Augmented Program Engineering”, by BSF 1 grant 9800096, and by gift from Intel. www.sourceforge.net/projects/chimp-rice 38 II. A SSERTIONS : S YNTAX AND S EMANTICS CHIMP accepts input assertions, defined using the temporal specification framework for SystemC described in [10], where a temporal assertion consists of a linear temporal formula accompanied by a Boolean expression serving as a clock. This supports assertions written at different levels of abstraction with different temporal resolutions. The framework of [10] proposes a set of SystemC-specific Boolean variables that refer to Sys- temC’s software features and its simulation semantics; see examples below. Input assertions in CHIMP are of the form “hLT L f ormulai@hclock expressioni”, where the LT L f ormula expresses a temporal property and the clock expression denotes when CHIMP should sample the execution trace of the MUV. Fig. 1. CHIMP components Several of the additional Boolean variables pro- posed in [10] refer to the simulation phases. Accord- ing to SystemC’s formal semantics, there are 18 pre- and outputs “FAILED” or “NOT FAILED”, for each defined kernel phases. CHIMP has Boolean variables assertion. CHIMP performs white-box validation of user that enable referring to these phases. For example code and black-box validation of library code. (If a user MON DELTA CYCLE END denotes the end of each wishes to do white-box validation of library code, it can delta cycle and MON THREAD SUSPEND denotes the be accomplished by treating the library code as part of suspension moment of each thread. By using these vari- the user code.) The two main features of CHIMP are: (1) ables as clock expressions, we can sample the execu- CHIMP generates C++ monitors, one for each assertion tion trace at different temporal resolutions. (By default, to be verified, and (2) CHIMP automatically instruments CHIMP samples at each kernel phase.) Other Boolean the user model [11] to expose the user model’s states and variables refer to SystemC events, which are key ele- syntax to the monitors. CHIMP puts nominal overhead ments of SystemC’s event-driven simulation semantics. on the runtime of the MUV, supports a rich set of CHIMP is also able to sample the execution at various assertions, and can handle a wide array of abstractions, key locations, e.g., function calls and function returns. from statement level to system level. This gives the user the flexibility to write assertions at A recently added path in CHIMP from LTL to different levels of abstraction, from the level of individual monitor can bypass the old performance bottleneck, C++ statements to the level of SystemC kernel phases. Automaton-1.1, and improves the monitor generation and The Boolean primitives supported by CHIMP are compilation time by a significant amount. It also reduces summarized below; see, [10] and [11] for further details: the size of the generated monitors notably. This entirely Function primitives: Let f() be a C++ function in the removes the dependency of CHIMP on Automaton-1.1. user or library code. The Boolean primitives f:call and The theoretical and algorithmic foundations of f:return refer to locations in the source code that contain CHIMP were described in [8]–[11]. In this paper we the function call, and to locations immediately after describe the architecture, usage, and recent evolution the function call, respectively. The primitives f:entry of CHIMP. The rest of the paper is organized as fol- and f:exit refer to the locations immediately before the lows. Section II describes the syntax and semantics of first executable statement and immediately after the last assertions. Section III presents an overall picture about executable statement in f(), respectively. If f() is a the usage, implementation and performance of CHIMP. library function then the entry and exit primitives are not Section IV describes the C++ monitor generated by supported (black-box verification model for libraries). CHIMP. Section V describes the new improved path in Value primitives: If a function f() has k arguments, CHIMP and its improved performance. Finally, Section CHIMP defines variables f : 1, . . . , f : k , where the VI presents a summary and talks about future work. value and type of f : i are equal to the value and type 39 of the ith parameter of function f() before executing the first statement in the definition of f(). CHIMP also defines the variable f : 0, whose value and type are equal to the value and type of the object that f() returns. For example, if the function int divide(int dividend, int divisor) is defined in the MUV, then the formula G (division:entry -> “division:2 != 0”) asserts that the divisor is nonzero whenever division function starts execution. Phase primitives: The user can refer to the 18 pre- defined kernel states [10] in the assertions to spec- ify when the state of the MUV should be sam- pled. For example, the assertion G (“p == 0”) @ MON DELTA CYCLE END requires the value of vari- able p to be zero at the end of every delta cycle. Event primitives: For each SystemC event E , CHIMP Fig. 2. Monitor generation flow provides a Boolean primitive E.notified that is true only when the OSCI kernel actually notifies E . For ex- ample, the assertion G (s.value changed event().notified) @MON UPDATE PHASE END says that the signal s changes value at the end of every update phase. III. U SAGE , I MPLEMENTATION AND P ERFORMANCE Running CHIMP consists of three steps: (1) In the first step, the user writes a configuration file containing all assertions to be verified, as well as other necessary information. The user can also provide inputs through command-line switches. For each LTL assertion in the configuration file, MONASGEN first generates a non- deterministic Büchi automaton on words (NBW) using SPOT, then converts that NBW to a minimal deter- ministic finite automaton on words (DFW), using the Automaton-1.11 tool for determinization and minimiza- tion. Then, MONASGEN generates the C++ monitors from the DFW, one monitor for each assertion (Fig. 2). Fig. 3. MUV instrumentation flow (2) MONASGEN produces an aspect-advice file that is then used by AspectC++ to generate the instrumented MUV (Fig. 3). (3) Finally, the monitors and the instru- of the system submit requests and the system uses a mented MUV are compiled together and linked to the randomly generated flight database to find a direct flight patched OSCI kernel, and the code is then run to execute or a sequence of up to three connecting flights. Those are the simulation with inputs provided by user. The inputs returned to the user for approval, payment and booking. can be generated using any standard stimuli-generation This model is intended to run forever. It is inspired technique. For every assertion, CHIMP produces output by actual request/grant subsystems currently used in indicating if the assertion held or failed for that particular hardware design. input (Fig. 4). We used a patched version of the 2.2.0 OSCI kernel For experimental evaluation we used a SystemC and compiled it using the default settings in the Makefile. model with about 3,000 LOC, implementing a system The empirical results below were measured on a Pentium for reserving and purchasing airplane tickets. The users 4, 3.20GHz CPU, 1 GB RAM machine running GNU 40 ARS: Overhead per monitor for Properties (1) and (2) 1.4 Property 1 (safety) Overhead per monitor (% of baseline) Property 2 (liveness) 1.2 Both properties 1 0.8 0.6 0.4 0.2 0 1 2 3 10 10 10 Number of monitors Fig. 5. Run time overhead for monitoring Properties (1) and (2) Fig. 4. Running instrumented MUV with monitors using patched OSCI kernel −4 x 10 2 Linux. To assess runtime overhead imposed by the mon- 1.8 Percentage overhead per call itors, we measured the effect of running with different 1.6 assertions and also increasing number of assertions [9]. 1.4 In each case we first ran the model without monitors and 1.2 user-code instrumentation to establish the baseline, and 1 then ran several simulations with instrumentation and an increasing number of copies of the same monitor. The 0.8 results report runtime overhead per monitor as a per- 0.6 centage of the baseline. (For these experiments monitors 0.4 were constructed manually.) 0.2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 The first property we checked is a safety Number of monitor calls x 10 5 property asserting that whenever the event new requests nonfull is notified, the corresponding Fig. 6. Instrumentation overhead per monitor call queue new planning requests must have space for at as a percentage of the baseline run time. Y axis is ((instrumentation overhead − baseline)/(baseline × least one request. number of monitors)) × 100% G "new_planning_requests.size() < capacity" @ new_requests_nonfull.notified (1) the above two properties as a percentage of the baseline. The second property says that the system must propa- Checking Property 2 is relatively expensive because it gate each request through each channel (or through each requires a lot of communication from the model to the module) within 5 cycles of the slow clock. This property monitor. It is shown in [1] that the finite state monitors is a conjunction of 16 bounded liveness assertions similar have less runtime overhead than transaction-based moni- to the one shown here. tors generated by tools like Synopsys VCS. ... We also evaluated the cost of instrumentation sepa- // Propagate through module within 5 clock ticks rately by simulating one million SystemC clock cycles ALWAYS (io_module_receive_transaction($1) -> with focus on the overhead of instrumentation [11]. The ( within [5 slow_clock.pos()] io_module_send_to_master($2) & ($1 == $2) average wall-clock execution time of the system over 10 ) AND ... runs without instrumentation was 33 seconds. We call (2) this “baseline execution”. Fig. 6 shows the cost of the Fig. 5 presents the runtime overhead of monitoring instrumentation per monitor call, as a percentage of the 41 baseline execution. The data suggest that there is a fixed cost of the instrumentation, which, when amortized over more and more calls, leads to lower average cost. The average cost per call stabilizes after 300,000 calls, and is less than 0.5 × 10−4 %. Another testbench we used is an Adder model that implements the squaring function by repeated increment by 1. It uses all three kinds of event notifications. It is Fig. 7. The DFW generated by MONASGEN from the LTL scalable as it spawns a SystemC thread for each addition formula G(a → Xb). All the states are accepting. The DFW rejects bad prefixes by no available transition. State 0 is the in the squaring function. We monitored several properties initial state. of the Adder model using CHIMP. To study the effect of monitor encoding on runtime possible transitions from each state are encoded using an overhead, Tabakov and Vardi [8] describes 33 different inner if-elseif block, the condition statement being the monitor encodings, and shows that front det switch en- guard on a transition. coding with state minimization and no alphabet mini- mization is the best in terms of runtime overhead. The As a running example, we show how the monitor step() downside is that this flow suffers from a rather slow function is encoded for an assertion G(a → Xb) @ clk . monitor generation and compilation time. This led us to clk here is some boolean expression specifying when the develop a new flow for the tool, as described below. step() function needs to be called during the simulation. This assertion asserts that, always, if a is true, then in the IV. CHIMP MONITORS next clock cycle b has to be true. The DFW generated by CHIMP for this assertion is shown in Fig. 7. In CHIMP, the LT L f ormula in every assertion hLT L f ormulai @ hclock expressioni is converted Listing 1 shows the step() function of the C++ monitor into a C++ monitor class. Each C++ monitor has a step() generated by CHIMP for the assertion G(a → Xb) @ function that implements the transition function of the clk . The variables current_state and next_state DFW generated from the LT L f ormula (as described are member variables of the monitor class and store the in Fig. 9). This step() function is called at every sam- current automaton state and next automaton state respec- pling point defined by clock expression. The monitor.h tively. If next state becomes −1 after the execution of the and monitor.cc files, generated by MONASGEN, contains step() function, it means that no transition can be made one monitor class for every assertion and a class called from the current state. In this case, the monitor outputs local_observer that is responsible for invoking the ”FAILED”. The initial state 0 of the monitor is assigned callback function, which invokes step() function of the ap- to the variable current_state inside the constructor propriate monitor class at the right sampling point during of the monitor class as shown in Listing 2. the monitored simulation. Different encodings have been Listing 1. The step() function of the monitor for G(a → Xb) explored for writing the monitor’s step() function. For the /∗ ∗ new flow of CHIMP (described below), Fig. 13 shows that ∗ Simulate a step of the monitor . front det ifelse encoding is the best among all of them ∗/ in terms of runtime overhead. void In front det ifelse encoding, the C++ monitor pro- monitor0 : : step ( ) { duced is deterministic in terms of transitions. This means / / I f t h e p r o p e r t y has not f a i l e d y e t from one state, either one or no transition is possible. If i f ( s t a t u s == NOT FAILED ) { no transition is possible from a state, the monitor rejects / / number o f s t e p s e x e c u t e d s o f a r and outputs ”FAILED”. Else, the monitor keeps executing n u m s t e p s ++; until the end of simulation and outputs ”NOT FAILED”. / / assign next s t a t e to current s t a t e In front det ifelse encoding, each state of the monitor current state = next state ; is encoded as an integer, from 0 upto the total number of / / make n e x t s t a t e i n v a l i d states. The step() function of the monitor uses an outer n e x t s t a t e = − 1; if-elseif statement block to determine the next state. The 42 i f ( c u r r e n t s t a t e == 0 ) { if (!( a) ) { next state = 0; } else if (( a) ) { next state = 1; } } / / i f ( c u r r e n t s t a t e == 0 ) e l s e i f ( c u r r e n t s t a t e == 1 ) { i f ( ( b ) && ! ( a ) ) Fig. 8. Old LTL to monitor path in CHIMP { next state = 0; } e l s e i f ( ( a ) && ( b ) ) { next state = 1; } } / / i f ( c u r r e n t s t a t e == 1 ) Fig. 9. New LTL to monitor path in CHIMP / / FAILED i f no t r a n s i t i o n p o s s i b l e b o o l n o t s t u c k = ( n e x t s t a t e ! = − 1); if (! not stuck ) { property failed (); uses SPOT to convert the LTL formula to a nondeter- } ministic Büchi Automaton (NBW). Then MONASGEN } / / i f ( s t a t u s == NOT FAILED ) prunes the NBW to remove all states that do not lead to } / / step () any accepting state, and generate the corresponding non- deterministic finite automaton (NFW), see [2]. MONAS- GEN then uses the Automaton Tool to determinize and Listing 2. The constructor of the monitor for G(a → Xb) minimize the NFW to generate minimal deterministic /∗ ∗ finite automaton (DFW). Finally MONASGEN converts ∗ The c o n s t r u c t o r the DFW to C++ monitor. ∗/ monitor0 : : monitor0 ( . . . ) : The main bottleneck in this flow was minimization sc core : : mon prototype ( ) { and determination of NFW using the Automaton tool as next state = 0; / / i n i t i a l s t a t e id it consumes 90% of total monitor generation and com- c u r r e n t s t a t e = − 1; pilation time. Also, the Automaton tool may generate s t a t u s = NOT FAILED ; multiple edges between two states, resulting in quite large num steps = 0; monitors. The newest version of CHIMP introduces a . . . new path as shown in Fig. 9, which bypasses Automaton } / / Constructor tool completely and uses only SPOT. So the component Automaton-1.11 in Fig. 1 is not needed by CHIMP any- The sampling points can be kernel phases, e.g., more. This new path leverages the new functionality of MON DELTA CYCLE END, or event notification, e.g., SPOT-1.1.1 to convert an LTL formula to a minimal DFW E.notified (E is an event). In such cases, the OSCI kernel that explicitly rejects all bad prefixes. needs to communicate with the local_observer at After replacing the Automaton Tool by SPOT to con- the right time (when a delta cycle ends or when event E vert the NBW to minimal DFW, the improvement in is notified) to call the step() function of the monitor. This compilation time and monitor size is evident. This new communication is done by the patch, put on OSCI ker- flow results in 75.93% improvement in monitor genera- nel. This patch contains a class called mon_observer, tion and compilation time. To evaluate the performance of which communicates with the local_observer class the revised CHIMP, we use the same set of 162 pattern on behalf of the OSCI kernel. formulas and 1200 random formulas as mentioned in [8]. The scatter plot on Fig. 10 shows the comparison of V. E VOLUTION OF CHIMP monitor generation and compilation time of the new flow Fig. 8 shows the old path of generating C++ monitor vs the old flow. Most of the points are above the line with from an LTL property in CHIMP. First, MONASGEN slope 1, which indicates that for most of the monitors, the 43 generation and compilation time by old CHIMP is more than by new CHIMP. Fig. 10 - Fig. 14 are all scatter plots2 and interpreted in the same way. The new flow also merges multiple edges between two states into one edge guarded by the disjunction of the guards on all edges between the states. In this way the average reduction in monitor size in bytes is 61.27%. Fig. 11 shows the comparison of the size in bytes of the monitors generated by the new flow vs the old flow. Since the main focus of CHIMP has always been minimizing runtime overhead, we need to ensure that this new flow does not incur more runtime overhead compared to the old flow as a cost of reduced monitor generation and compilation time. So we ran the same set of 162 pattern formulas and 1200 random formulas as mentioned above, to compare the runtime overhead incurred by the new flow vs the old flow. Fig. 12 shows that the runtime overhead Fig. 10. Comparison of monitor generation and compilation of CHIMP using the new flow has been reduced compared time of the old CHIMP vs new CHIMP to the old flow. The reduction is 7.97% on average. As CHIMP has evolved to follow a different and more efficient path and the monitors have been reduced in size, it was not clear a priori which monitor encoding is the best. We conducted the same experiment as in [8] with the same set of formulas, 162 pattern formulas and 1200 random formulas, to identify the best encodings in terms of both runtime overhead and monitor generation and compilation time. We identified two new best encodings. Fig. 13 shows that the new best encoding in terms of runtime overhead is now front det ifelse. Fig. 14 shows that the new best encoding in terms of monitor genera- tion and compilation time is back ass alpha. CHIMP now provides two configurations for monitor generation, best runtime, which has minimum runtime overhead and best compiletime, which has minimum monitor generation and compilation time. Since the bigger concern is usually runtime overhead, best runtime is the default Fig. 11. Comparison of monitor size (bytes) generated by the configuration, given that its monitor generation and com- old CHIMP vs new CHIMP pilation time is very close to that of best compiletime. VI. C ONCLUSION we plan to look at the possibility of verifying parametric We present CHIMP, an Assertion-Based Dynamic Ver- properties, for example G(send(id) → F (receive(id))) ification tool for SystemC models, and show that it puts where one can pass a parameter (here id), to the variables minimal overhead on the runtime of the MUV. We show in the LTL formula of the assertion. The above property how the new path in CHIMP results in significant re- means that always if the message with ID id is sent, it duction of monitor generation and compilation time and should be received eventually. Also at present the user monitor size, as well as runtime overhead. In the future needs to know the system architecture to declare an asser- tion. We would like to make CHIMP work with assertions 2 http://en.wikipedia.org/wiki/Scatter plot that are declared in the elaboration phase. 44 Fig. 13. Comparison of runtime overhead of front det ifelse encoding vs all other encodings Fig. 12. Comparison of runtime overhead incurred by old CHIMP vs new CHIMP R EFERENCES [1] R. Armoni, D. Korchemny, A. Tiemeyer, M. Vardi, and Y. Zbar. Deterministic dynamic monitors for linear-time assertions. In Proc. Workshop on Formal Approaches to Testing and Run- time Verification, volume 4262 of Lecture Notes in Computer Science. Springer, 2006. [2] M. d’Amorim and G. Ro¸su. Efficient monitoring of ω- languages. In Proc. 17th International Conference on Com- puter Aided Verification, pages 364–378, 2005. [3] A. Duret-Lutz and D. Poitrenaud. SPOT: An extensible model checking library using transition-based generalized Büchi au- tomata. Modeling, Analysis, and Simulation of Computer Systems, 0:76–83, 2004. [4] C. Helmstetter, F. Maraninchi, L. Maillet-Contoz, and M. Moy. Automatic generation of schedulings for improving the test coverage of Systems-on-a-Chip. In FMCAD ’06: Proceedings of the Formal Methods in Computer Aided Design, pages 171– 178, Washington, DC, USA, 2006. IEEE Computer Society. [5] A. Møller. dk.brics.automaton. Fig. 14. Comparison of monitor generation time and com- http://www.brics.dk/automaton/, 2004. pile time overhead of back ass alpha encoding vs all other [6] O. Spinczyk, A. Gal, and W. Schröder-Preikschat. AspectC++: encodings an aspect-oriented extension to the C++ programming lan- guage. In CRPIT ’02: Proceedings of the Fortieth Interna- tional Conference on Tools Pacific, pages 53–60, Darlinghurst, [10] D. Tabakov, M. Vardi, G. Kamhi, and E. Singerman. A Australia, Australia, 2002. Australian Computer Society, Inc. temporal language for SystemC. In FMCAD ’08: Proc. Int. [7] D. Tabakov. Dynamic Assertion-Based Verification for Sys- Conf. on Formal Methods in Computer-Aided Design, pages temC. PhD thesis, Rice University, Houston, 2010. 1–9. IEEE Press, 2008. [8] D. Tabakov, K. Rozier, and M. Y. Vardi. Optimized temporal [11] D. Tabakov and M. Y. Vardi. Automatic aspectization of monitors for SystemC. Formal Methods in System Design, SystemC. In Proceedings of the 2012 workshop on Modularity 41(3):236–268, 2012. in Systems Software, MISS ’12, pages 9–14, New York, NY, [9] D. Tabakov and M. Vardi. Monitoring temporal SystemC USA, 2012. ACM. properties. In Proc. 8th Int’l Conf. on Formal Methods and Models for Codesign, pages 123–132. IEEE, July 2010. 45 Abstraction-Based Livelock/Deadlock Checking for Hardware Verification In-Ho Moon and Kevin Harer Synopsys Inc. {mooni, kevinh}@synopsys.com ABSTRACT has any incoming edges to the SCC in the state transi- Livelock/deadlock is a well known and important problem tion graph representing a hardware design. However, even in both hardware and software systems. In hardware verifi- though the method in [27] is an improved method from its cation, a livelock is a situation where the state of a design previous work [13] in symbolic approaches, it is still infea- changes within only a smaller subset of the states reachable sible to apply the method to the industrial designs, simply from the initial states of the design. Deadlock is a special due to the capacity problem of symbolic methods. In gen- case in which there is only one state in a livelock. However, eral, any BDD-based method can handle only up to several livelock/deadlock checking has never been actively used in hundred latches without any abstraction or approximation hardware verification in practice, mainly due to the com- techniques, whereas there can be millions of latches in in- plexity of the computation which involves finding strongly dustrial designs. connected components. In this paper, we first present an improved BDD-based This paper presents a practical abstraction-based live- algorithm finding TSCCs from[27] in the following aspects. lock/deadlock checking algorithm for hardware verification. First, initial state is taken into account in finding TSCCs. The proposed livelock/deadlock checking works on FSMs Especially, the improved algorithm handles multiple initial rather than the whole design. For each FSM, we make an states efficiently. Secondly, unreachable TSCCs are distin- abstract machine of manageable size from the cone of influ- guished from reachable TSCCs which are more interesting to ence of the FSM. Once a livelock is found on an abstract designers. Thirdly, we provide more intuitive state classifi- machine, the livelock is justified on the concrete machine cation as main, transient, and livelock groups as opposed to with trace concretization. Experimental results shows that transient and recurrence classes in [27]. In our classification, the proposed abstraction-based livelock checking finds real a set of transient states is further classified into main and livelock errors in industrial designs. transient groups. Recurrence class in [27] is mapped into livelock group in our classification. Main group is an SCC 1. INTRODUCTION that contains the initial state. Transient group is a set of Livelock/deadlock is a well known and important problem states that belong to neither main nor livelock group. There in both hardware and software systems. In hardware verifi- is one or zero main group in a design per one initial state. cation, a livelock is a situation where the state of a design This paper also presents a practical approach for checking changes within only a subset of the states reachable from livelock using abstraction techniques. The proposed live- the initial states of the design. In a state transition graph, a lock checking works on FSMs(Finite State Machines)2 rather livelock is a set of states from which there is no path going than the whole design. For each FSM, we make an abstract to any other states that are reachable from the initial state. machine (by localization reduction [15]) of manageable size Since deadlock is a special case in which there is only one by the improved BDD method from the COI(Cone of Influ- state in a livelock, deadlock checking can be done by livelock ence) of the FSM. Once a livelock is found on an abstract checking. Thus, livelock implies both livelock and deadlock machine, the livelock is justified on the concrete machine in this paper. However, livelock checking1 has never been with trace concretization using SAT (Satisfiability[9]) and actively used in hardware verification in practice, mainly due simulation. When there is no livelock on the abstract ma- to the complexity of the computation which involves find- chine, there is no guarantee for no livelock on the concrete ing SCCs (Strongly Connected Components). Thus, livelock machine. However, the bigger the abstract size is, the more checking has been on hardware designer’s wish list to verify confidence we have that no livelock exists on the concrete their designs. machine. The key benefit of this abstraction-based livelock There have been many approaches on finding SCCs [13, checking is that it enables finding real livelock groups that 27, 28, 3, 20, 12]. Among these work, Xie and Beeral pro- cannot be found by tackling whole design directly. posed a symbolic method finding terminal SCCs (in short, Once an FSM is given, its COI is first computed. Then, TSCCs) using BDDs (Binary Decision Diagrams [4]) in [27]. an abstract machine is computed by finding N influential In a state transition graph, a TSCC is an SCC that does latches from the COI. Influential latches are the latches that not have any outgoing edges to any state outside the SCC. are likely related with the FSM. N is either pre-defined or a Thus, a TSCC becomes a livelock group when the TSCC user-defined number of latches in the abstract machine, or 1 2 Livelock checking is different from liveness checking and FSMs are either automatically extracted [26] or any sets of the difference will be explained in Section 2.3. sequential elements that are user-specified. 46 gradually increased. In general, N is up to a few hundred algorithms. Ravi et al. [20] provided a taxonomy of those latches. Influential latches are computed mainly by approx- symbolic algorithms. One is SCC-hull algorithms (without imate state decomposition [6]. However, in many cases, the enumerating SCCs) [11, 14, 24], and the other is SCC enu- size of COIs is too big for even approximate state decom- meration algorithms [28, 3, 12, 20]. The details are in [20]. position. Thus, a structural abstraction is applied by using connectivity and sequential depth before approximate state 2.2 Finding TSCCs decomposition. This structural abstraction reduces the COI Even though TSCCs are a subset of SCCs in the states of a to a manageable size by approximate state decomposition. design, the algorithms for finding TSCCs can be significantly There is another important hardware property called tog- optimized since not all SCCs are of interest. gle deadlock. A state variable has a toggle deadlock if the This section recapitulates the work on finding TSCCs by state variable initially toggles and the state variable becomes Xie and Beeral [27]. This algorithm classifies all states into a constant after a certain number of transitions. However, either recurrence or transient class. Recurrence class is a set notice that this is not a constant variable since it initially of TSCCs and the rest belongs to transient class. Let S be toggles. Toggle deadlock may or may not happen depending the set of states. With i, j ∈ S, i → j denotes that there is on input stimuli in simulation. Therefore, toggle deadlock is at least one path from i to j. Definition 1 defines forward also an important property to check with formal approaches. set and backward set of a state. Experimental results shows that the proposed abstraction- Definition 1. The forward set of state i ∈ S, denoted by based approach finds real livelock and toggle deadlock errors F (i), is the set of states that have a path from i. That is, from industrial designs. F (i) = {j ∈ S | i → j}. Similarly, the backward set of state The contributions of this paper are in the three aspects. i, denoted by B(i), is the set of states that have a path to i. • Improved algorithm for livelock checking That is, B(i) = {j ∈ S | j → i}. The proposed algorithm improved the existing algo- Lemma 1. Let i, j ∈ S. If j ∈ F (i), then F (j) ⊆ F (i). rithm [27] in many aspects, such as providing new Similarly, if j ∈ B(i), then B(j) ⊆ B(i). state classification with initial state, handling multi- ple initial states, refining the search space efficiently Theorem 1. A state i ∈ S is recurrent if and only if with care states, early termination, and trimming out F (i) ⊆ B(i). In other words, i is transient if and only if transient states. F (i) * B(i). • Abstraction-based livelock checking Theorem 2. If state i ∈ S is transient, then states in This paper presents theories and an implementation B(i) are all transient. If state i is recurrent, on the other on abstraction-based livelock checking to handle large hand, states in F (i) are all recurrent. In the latter case, set designs in practice. F (i) is a recurrence class, and set B(i)\F (i) (if not empty) • Toggle deadlock checking contains only transient states. To the best of our knowledge, this paper presents the Lemma 1, Theorem 1 and 2 are from [27]. Lemma 1 shows first method to solve toggle deadlock problem. a subset relation between two forward sets as well as two The remainder of this paper is organized as follows. Sec- backward sets when j is in either F (i) or B(i). Theorem 1 tion 2 briefly recapitulates finding SCCs and TSCCs, and and 2 show how a state is determined whether the state describes related work. Section 3 describes our improved al- belongs to either recurrence or transient class. Based on gorithm for finding TSCCs. Section 4 explains how livelock Theorem 1 and 2, all TSCCs can be found by performing is checked on FSM using abstraction. Section 5 describes forward and backward reachability iteratively. The detailed how to check toggle deadlocks. Experimental results are algorithm can be found in [27] and our improved algorithm is presented and discussed in Section 6. We conclude with described in Section 3.2 with the comparisons to the original Section 7. algorithm. 2. PRELIMINARIES 2.3 Related work 2.1 Finding SCCs There are two types of properties in model checking; safety Given a graph, G = (V, E) where G is an infinite transition and liveness properties [16]. A safety property represents system of the Kripke structure [8], V is a finite set of states ’something bad never happens’, whereas a liveness property and E ⊆ V × V is the set of edges, a strongly connected represents ’something good eventually happens’. Liveness component (SCC) is a maximal set of state U ⊆ V such checking with a liveness property can be performed by find- that for every pair (u, v) ∈ U , u and v are reachable from ing SCCs [20]. Liveness checking can also be performed by each other, that is, u is reachable from v and v is reachable safety checking with proper transformations [1]. from u [27, 28]. Livelock checking is different from liveness checking in the Finding SCCs has a variety of applications in formal ver- sense that liveness checking requires a liveness property to ification such as Buchi emptiness [11], LTL model check- work on a design, whereas livelock checking does not require ing [25], CTL model checking with fairness constraints [10], any property and works on a design directly. Liveness checking [16, 1], and so on. There have been many publications on finding all SCCs [24, The traditional approach to find SCCs is to use Tarjan’s 28, 3, 12]. Even though all TSCCs can be found by any of method [23]. Since this method manipulates the states of the these approaches on finding all SCCs, it is not necessary to graph explicitly, even though it runs in linear time in the size find all SCCs for finding all TSCCs since we are interested of the graph, the size of the graph grows exponentially as in finding only all TSCCs for livelock checking. the number of state variables grows. Hachtel et al. [13] proposed a symbolic approach to find To overcome this state explosion problem in explicit al- all recurrence classes concurrently identifying all TSCCs by gorithms, there have been many publications on symbolic computing transitive closure [17] on the transition graph 47 with the Markov chain. Due to the complexity of transi- states, the reachable states are a through i inside the dot- tive closure, this approach takes significantly more time and ted rectangle. The unreachable states are j through o out- memory than a reachability-based approach does. side the dotted rectangle. There are five SCCs that are Qadeer et al. [19] proposed an algorithm to find single {a, b, c}, {e, f, g}, {h, i}, {j, k, l}, and{m, n, o}. Since a is the TSCC in the context of safe replacement in sequential equiv- initial state, {a, b, c} becomes the main group. {h, i} and alence checking [21, 22]. In this approach, multiple TSCCs {m, n, o} are TSCCs and only {h, i} is a livelock group (reach- are not considered. able STSCC) since it is reachable from a. {m, n, o} is called Xie and Beeral proposed a reachability-based algorithm an unreachable TSCC. The rest states, {d, e, f, g, j, k, l}, be- to find all TSCCs iteratively [27]. This is also a symbolic long to the transient group in which the states are contained approach that outperforms the method in [13]. However, in neither the main group nor the TSCCs. this approach does not consider initial states. None of the above previous work on finding TSCCs was 3.2 Finding Livelock used in real designs in practice, due to the design sizes. Our We first define transition relation in Definition 3 to explain abstraction-based approach is the first in publication to han- our algorithms to check livelock. dle large designs in practice. Definition 3. Let x = {x1 , . . . , xn }, y = {y1 , . . . , yn }, Case et al. [5] proposed a method finding transient sig- and w = {w1 , . . . , wp } be sets of variables ranging over B = nals using ternary simulation. A transient signal is a toggle {0, 1}. A (finite state) machine is a pair of boolean functions deadlock on over-approximate reachable states. The toggle ⟨Q(x, w, y), I(x)⟩, where Q : B 2n+p → B is 1 if and only if deadlock checking in this paper finds transients signals in there is a transition from the state encoded by x to the state exact reachable states. encoded by y under the input encoded by w. I : B n → B is 1 if the state encoded by x is an initial state. Q(x, w, y) is 3. IMPROVED LIVELOCK CHECKING called transition relation. The sets x, y, and w are called the 3.1 State Classification present state, next state, and input variables, respectively. The state classification in [27] consists of one transient The procedure ComputeF orwardSet in Figure 2 is a mod- class and one or more recurrence classes. However, in hard- ified version of the procedure f orward set in [27] in order ware verification, initial states are given to verify the hard- to compute forward set of a given state s only within the ware behavior only in reachable state space. One problem of given care states careSet in the procedure and to perform the state classification in [27] is that there is no distinction early termination when stop is not ZERO (empty BDD). ⇓ between reachable TSCCs and unreachable TSCCs from the represents a restrict operator [7] that is used to minimize initial states. Also, the reachable TSCCs may vary depend- the transition relation with respect to careSet in Line 2. ing on initial states. The minimized transition relation is denoted by Q̃. In Line We propose a new state classification that is shown in Fig- 7, y ← x represents that y variables are replaced by x vari- ure 1, assuming that there is one single initial state. Han- ables by BDD substitution. Early termination is another dling multiple initial states is explained in Section 3.3. big difference from f orward set in [27] and is used in Fig- Definition 2. STSCC is a sink TSCC that has incoming ure 3. This is to bail out computing forward set as soon as edges from any states outside the TSCC. any newly reached state intersects with the states in stop We first define sink TSCC (in short, STSCC) in Defini- as in Line 11. BddIteConstant is a BDD ITE(if-then-else) tion 2. The new state classification consists of main group, operation without creating a new BDD node. O is an array transient group, livelock groups (reachable STSCCs) and of states to store newly reached states at each iteration and unreachable TSCCs for a given initial state. The transient O is called onion rings. These onion rings are used later in class in [27] is further classified into main group or tran- Section 3.3. ComputeF orwardSet returns the forward set sient group. Main group is an SCC containing the initial F (s) and the onion rings O. state and there exists either one or no main group. The ComputeForwardSet(Q, careSet, s, stop) { recurrence classes in [27] are further classified into livelock 1 F (s) = ZERO; groups (reachable STSCCs) and unreachable TSCCs. When 2 Q̃(x, w, y) = Q(x, w, y) ⇓ careSet; there is no livelock, there exists only one SCC which is the 3 f rontier(x) = s; 4 Put s in O; main group. 5 while (f rontier(x) ̸= ZERO) { 6 image(y) = ∃x, w. Q̃(x, w, y) ∧ f rontier(x); Unreachable TSCC 7 image(x) = image(y)|y←x ; 8 F (s) = F (s) ∨ image(x); 9 f rontier = image(x) ∧ ¬F (s); k l m n 10 Put f rontier in O; 11 if (BddIteConstant(f rontier, stop, ZERO) != ZERO) j o 12 break; Reachable States 13 } 14 return (F (s), O); c g i } Figure 2: Computing forward set. a b d e f h Livelock Group ComputeBackwardSet is a dual procedure to Compute Main Group Transient Group (Reachable STSCC) F orwardSet, except not using stop and not computing the onion rings O. Figure 3 is a procedure for finding TSCCs from the given Figure 1: State classification. set of states S. The procedure F indT SCCs is a modified In Figure 1, there are states a through o and a is the version of the procedure State classif ication in [27]. The initial state that is marked with thick circle. Among all modified procedure utilizes care states careSet, assuming S 48 is not necessarily all state space. T is a set of transient states Line 11. TR represents the set of transient states that are in S, and R is an array of TSCCs in S. P ickOneState in reachable from s. In Line 12, R and TR are computed by Line 5 picks a random state from careSet as a seed state to calling F indT SCCs with careSet. If s ∈ / M (means that find a TSCC. In Line 7, early termination is used in comput- the main group is empty), s is added to TR in Line 13-14. ing the forward set F (s), by setting stop in Figure 2 as the TU represents the set of transient states that are unreachable negation of B(s). This is because while we compute F (s) from s and TU is computed in Line 15. T is computed by within B(s) for the state s, once any state outside B(s) union of TR and TU in Line 16. is reachable from s, all states in B(s) are transient. An- other big difference is trimming transient states in Line 12 3.3 Multiple Initial States and 16. T rimT ransient(Q, careSet, T, dir) trims out the It is possible for a design to have multiple initial states transient states from the current care states by the given when some of the state variables do not have concrete initial direction (dir) that is either PREFIX, SUFFIX, or BOTH. values. In the presence of multiple initial states, finding PREFIX(SUFFIX) means to trim out the lasso prefix(suffix) livelock groups has to be devised correctly to avoid false states. This is the same technique used in finding SCCs [20]. positives and redundant computations. Finally, F indT SCCs returns R (a set of TSCCs) and T (a Figure 5 shows an example with multiple initial states. set of transient states). In this example, there are six states, S = {a, b, c, d, e, f }. FindTSCCs(Q, S) { There are two SCCs, {a, b, c} and {d, e, f }. We can see 1 R = { }; that {d, e, f } is a TSCC. a and d are initial states, I = 2 T = ZERO; {a, d}, as shown with thick circles. Suppose that we com- 3 careSet = S; 4 while (careSet ̸= ZERO) { pute livelock by calling F indLivelock(Q, S, I). Then, we 5 s = PickOneState(careSet); get F (I) = B(I) = M = {a, b, c, d, e, f } and R = {} which 6 B(s) = ComputeBackwardSet(Q, careSet, s); is not correct since there is a reachable TSCC. Now, let 7 F (s) = ComputeForwardSet(Q, careSet, s, ¬B(s)); 8 if (F (s) ⊆ B(s)) { us try to call F indLivelock for each single initial state. 9 R = R ∪ F (s); First for the initial state a, we get F (a) = {a, b, c, d, e, f } 10 T = T ∨ (B(s) ∧ ¬F (s)); and B(a) = {a, b, c}. This gives us Ma = {a, b, c} and 11 careSet = careSet ∧ ¬B(s); 12 TrimTransient(Q, careSet, T , PREFIX); Ra = {d, e, f }. There is a livelock group Ra for the initial 13 } else { state a. Now, for the initial state d, F (d) = {d, e, f } and 14 T = T ∨ (s ∨ B(s)); B(d) = {a, b, c, d, e, f }. This gives us Md = {d, e, f } and 15 careSet = careSet ∧ ¬(s ∨ B(s)); 16 TrimTransient(Q, careSet, T , BOTH); Rd = {} and TU = {a, b, c}. There is no livelock group for 17 } the initial state d. Therefore, we can see that livelock check- 18 } ing has to be applied for each single initial state separately 19 return (R, T ); } in the presence of multiple initial states. Figure 3: Finding TSCCs. FindLivelock(Q, S, s) { c f 1 (F (s), O) = ComputeForwardSet(Q, S, s, ZERO); 2 B(s) = ComputeBackwardSet(Q, S, s); 3 reached = F (s) ∨ s; a b d e 4 if (F (s) ⊆ B(s)) { 5 M = F (s); 6 R = { }; Figure 5: Multiple initial states. 7 T = ZERO; 8 } else { 9 M = F (s) ∧ B(s); Theorem 3. When there are two initial states (i0 and 10 careSet = F (s) ∧ ¬(M ∨ s); i1 ), if i1 is included in the reached states from i0 , the livelock 11 TrimTransient(Q, careSet, T , PREFIX); groups from i1 are a subset of the livelock groups from i0 . 12 (R, TR ) = FindTSCCs(Q, careSet); 13 if (s ∈ / M) Proof. Since i1 is included in the reached states from 14 TR = TR ∨ s; i0 , i1 is in either main, transient, or livelock groups from 15 TU = B(s) ∧ ¬M ; 16 T = TR ∨ TU ; i0 . When i1 is in the main group, the same livelock groups 17 } from i1 are obtained. When i1 is in the transient group, all 18 return (M, R, T, reached, O); or a subset of the livelock groups i1 is obtained. When i1 } is in one of the livelock groups, the livelock group including Figure 4: Finding livelock. i1 becomes the main group from i1 , and no livelock group Figure 4 shows the procedure to perform our new state exists from i1 since the other livelock groups from i0 become classification. As explained in Section 3.1, we find main unreachable TSCCs from i1 . From the above three cases, no group (M ), transient group (T ), and livelock groups (R) new livelock group is obtained from i1 compared to the ones from the given initial state (s) within the given care states from i0 . Therefore, the livelock groups from i1 are a subset (S). F indLivelock starts computing forward set F (s) and of the livelock groups from i0 . backward set B(s) in Line 1 and 2. In Line 3, reached is the Theorem 3 says that when there is large number of initial reached states from s in S. If F (s) ⊆ B(s) in Line 4, there states, we can skip livelock checking for any initial states is no livelock in S. In this case, F (s) becomes the main that are already in the forward sets of other initial states. group and both R and T are set to empty in Line 5-7. If In Figure 5, livelock checking for the initial state d can be F (s) * B(s) in Line 8, there must exist at least one livelock skipped because of d ∈ F (a), assuming that a is used first. group. In this case, M is computed by intersecting F (s) and However, there is an order dependency on which initial state B(s) in Line 9. careSet is set to a subset of F (s) in Line is used first. If d is used first, we still need to run live- 10. The lasso prefix states in careSet are trimmed out in lock checking with a. In practice, the number of calls to 49 F indLivelock is greatly reduced because of Theorem 3 in propose a framework for abstraction-based livelock checking the presence of multiple initial states. on an abstracted COI of the FSM. Once we find a livelock on Figure 6 is the top-level procedure that checks livelock the abstract machine, we justify whether the livelock exists with multiple initial states. CheckLivelock takes transition on the concrete machine. Notice that a livelock on the ab- relation(Q), a set of states(S), a set of initial states(I), and stract machine can be mapped into more than one livelock a concrete machine(C) as procedure inputs. The use of C on the concrete machine. is explained in Section 4. CheckLivelock first finds live- Figure 7 shows how an abstract machine is obtained from lock groups in the reachable states in Line 1-17 and then it the COI of an FSM. Suppose an FSM that has two state finds TSCCs in the unreachable states in Line 18-23. The variables f and g. Then, we compute the COI of the FSM. while loop (Line 6-17) performs livelock checking for a cur- Suppose that there are state variables {a, b, c, d, e} in the rent initial state s until all initial states are covered with COI of the FSM. The size of the abstract machine is pre- iteration index k. For this, remaining is initially set to defined and let us suppose that the size is N . Then, a set I in Line 3 and updated by eliminating the newly reached of influential latches from the COI is computed from the states reachedk from remaining in Line 13. reached is the FSM variables. The minimum abstract machine is the FSM reached states from all initial states. reached is initially set itself and the maximum abstract machine is the concrete to ZERO in Line 1 and updated by adding reachedk that is machine. In this example, N =4 and we get the abstract the reached states from s in Line 12. Then, the next initial machine {f, g, d, e}. state is chosen from remaining in Line 15. TU is the union of unreachable transient states from each initial state. TU is Abstract Machine initially set to ZERO in Line 2 and updated by adding the COI of FSM FSM unreachable states of Tk in Line 14. For the current initial a d f state s, F indLivelock is called in Line 7. |Rk | represents b e g the number of livelock groups in Rk in Line 8. For each c Rkj , a trace tracejk is generated in Line 9 and the livelock Concrete Machine is reported with the trace in Line 10. Generating trace is explained in Section 4.2 and reporting livelock is explained Figure 7: Abstract machine. in Section 4.3. Theorem 4. If any state in a livelock group on an ab- CheckLivelock(Q, S, I, C) { stract machine is reachable from the initial state on the con- 1 reached = ZERO; 2 TU = ZERO; crete machine, the livelock exists on the concrete machine. 3 remaining = I; Proof. Since the abstraction is an over-approximation, 4 k = 0; the set of all transitions on the abstract machine is a superset 5 s = PickOneState(I); 6 while (s ̸= ZERO) { of the set of all transitions on the concrete machine. Since 7 (Mk , Rk , Tk , reachedk , Ok ) = FindLivelock(Q, S, s); there is no path from any state in the livelock group to any 8 for (j = 0; j < |Rk |; j++) { state in the main group on the abstract machine, there is 9 tracejk = GenerateTrace(C, Rk j , s, Ok ); still no path from any projected states of the livelock group 10 ReportLivelock(s, Mk , Rk , Tk , tracejk ); j 11 } on the concrete machine to any projected states of the main 12 reached = reached ∨ reachedk ; group on the concrete machine. Now, suppose that the live- 13 remaining = remaining ∧ ¬reachedk ; lock does not exist on the concrete machine. In order for the 14 TU = TU ∨ (Tk ∧ ¬reachedk ); 15 s = PickOneState(remaining); livelock not to exist on the concrete machine, the only con- 16 k++; dition is that there is no path from the projected main group 17 } to the projected livelock group on the concrete machine. In 18 careSet = ¬(reached ∨ TU ); other words, the projected livelock group has to be unreach- 19 if (careSet ̸= ZERO) { 20 Rk = FindTSCCs(Q, careSet); able from the initial state. However, this contradicts the 21 for (j = 0; j < |Rk |; j++) assumption that any state of the livelock group is reachable j 22 ReportUnreachLivelock(Rk ); from the initial state on the concrete machine. Therefore, 23 } } the livelock group still exists on the concrete machine. Figure 6: Checking livelock. Thanks to Theorem 4, this abstraction-based livelock finds Once all reachable livelock groups are found, we next find a livelock on small abstract machine using BDD-based sym- unreachable TSCCs. We set the care states careSet to the bolic method, then justifies the existence of the livelock on negation of all visited states so far in Line 18, then call the concrete machine by trace concretization in Section 4.2, F indT SCCs with careSet in Line 20. If there is any un- by using SAT techniques that can handle large designs. The reachable TSCC, the TSCC is reported in Line 22. abstraction-based livelock checking is an incomplete method in the sense that it does not provide the proof of no livelock 4. LIVELOCK CHECKING ON FSM unless the checking is performed on a concrete machine. No To check whether a livelock exists in a design or not, the livelock on an abstract machine does not guarantee no live- checking should be done on the whole design. However, this lock on the concrete machine. However, the abstraction- is infeasible due to the size of the design in practice. Thus, based livelock checking enables finding real livelock errors we propose a practical method for checking livelock on FSMs on industrial large designs. on the design. Even when we check livelock on an FSM, the entire COI 4.1 Causality Checking logic of the FSM must be considered in order to get an ex- Let V be the set of state variables in an abstract machine act result on livelock. However, this is still computationally for livelock checking. Suppose that R(V ) is the reached very expensive or not feasible, in most real designs. Thus, we states in the abstract machine and L(V ) is a livelock group 50 containing a TSCC. Also, suppose that v is a state variable using the onion rings Ok in Figure 6. The second step is to in V . We are interested in whether v contributes to the generate an abstract trace. Starting from the target state, livelock as in Definition 4. This is called variable causality. an abstract trace can be computed by applying BDD-based Definition 4. When a livelock exists in the abstract ma- pre-image computation iteratively until the initial state is chine, a variable v in V contributes to the livelock if the live- reached. The third step is to generate a concrete trace by lock disappears by eliminating v from the abstract machine. making a BMC (Bounded Model Checking [2]) problem from In other words, there is no livelock in another abstract ma- the abstract trace, in order to see whether the livelock group chine that is composed of the variables, V \v. is reachable on the concrete machine. An efficient approach for concretization was proposed in [18] Equation 1 shows a condition for existence of livelock. L(V ) ⊂ R(V ) (1) 4.3 Reporting Livelock Now, let R̃ be the quantified reached states and L̃ be the Once a concrete trace is generated for a livelock group, quantified livelock states with respect to a state variable v, the livelock is real on the concrete machine. We report the as shown in Equation 2 and 3. livelock group with the state classification mentioned in Sec- tion 3.1. A livelock group is reported with its initial state, R̃(V \v) = ∃v . R(V ) (2) the main group, transient group, and the unreachable states L̃(V \v) = ∃v . L(V ) (3) in terms of the number of states and the percentage in each Then, it is determined by Equation 4 to check whether group on the abstract machine. the variable v contributes to the livelock. Theorem 5 says By looking at the transient and livelock groups, we can that if Equation 4 holds, v contributes to the livelock. see what fraction of the state space is in problematic zone. A good design is expected to have only one main group per L̃(V \v) ⊂ R̃(V \v) (4) one initial state without any transient and livelock groups, Theorem 5. When a livelock group is found on an ab- unless the design has an intended reset sequence to a normal stract machine (L(V ) ⊂ R(V )), if L̃(V \v) ⊂ R̃(V \v) holds mode. for a variable v, the variable v contributes to the livelock. Proof. Let M1 be the machine consisting of V and sup- 5. TOGGLE DEADLOCK CHECKING pose that a livelock group exists in M1 . Let M2 be the ma- There is another important design property, called toggle chine consisting of (V \v) by eliminating v from M1 . Also, deadlock that is related to livelock. A livelock may occur for let T1 (T2 ) be the set of transitions in M1 (M2 ), respectively. multiple state variables of a design, whereas a toggle dead- Since M2 is an over-approximated machine from M1 , M2 has lock may occur on a single state variable. A state variable more transitions than M1 (T1 ⊂ T2 ). Let Td be the differ- has a toggle deadlock if the variable initially toggles, but ence between T1 and T2 . If there is any transition (in Td ) the variable gets stuck at a constant value after a certain that makes a path from any state in the livelock to any state number of cycles. in the main group in M2 , the livelock group merges into the Figure 8 shows an example of toggle deadlock. There are main group and both groups become a single SCC, yielding two state variables {a, b} and four states {s0 , s1 , s2 , s3 } as in L̃(V \v) = R̃(V \v). Thus, M2 becomes a machine without the example. Provided that s0 is the initial state, the main the livelock. This means that v is a necessary variable to group is {s0 , s1 } and the livelock group is {s2 , s3 }. Once the have the livelock in M1 . Therefore, if L̃(V \v) = R̃(V \v), v state transition reaches to s2 that is a state in the livelock contributes to the livelock. group, the value of b gets stuck at 1, whereas a still toggles. This causality checking can also be applied to a set of Thus, we say that b has a toggle deadlock. variables, especially with FSM variables, in order to report a=0, b=0 a=0, b=1 whether the livelocks are related with the FSM. Let F be the s0 s3 set of variables in an FSM and C be the set of variables in the COI of the FSM. Suppose that R(F, C) is the reached states s1 s2 in the abstract machine and L(F, C) is a livelock group con- a=1, b=0 a=1, b=1 taining a TSCC. The quantified reached states and the quan- tified livelock states are computed in Equation 5 and 6 with Figure 8: Toggle deadlock. respect to the FSM variables, respectively. R̃(C) = ∃F . R(F, C) (5) Theorem 6. If there is no STSCC in a design, there is no toggle deadlock on any variable. L̃(C) = ∃F . L(F, C) (6) Proof. To be a toggle deadlock, a variable is supposed to Then, Equation 7 shows the causality checking with the toggle at a cycle and to hold the value forever from the cycle. FSM variables to check whether the FSM variables con- No STSCC implies that there is only main group in the tribute to the livelock. design. If a variable appears as constant in the main group, L̃(C) = R̃(C) (7) the variable is a constant. However, the main group does not have any prefix behavior. This means it is not possible for 4.2 Trace concretization the variable to get toggled before the main group. Therefore, Once a livelock group is found on an abstract machine, we no STSCC implies no toggle deadlock. need to justify whether the livelock group is reachable on the concrete machine. This can be done by the following three Theorem 6 shows that toggle deadlock occurs in the pres- steps. The first step is to pick a target state in the livelock ence of a livelock. It is also possible that there is no toggle group. The target state is chosen randomly from the livelock deadlock on a design that has a livelock. Thus, toggle dead- group, but is one of the closest states to the initial states by lock on a state variable can be computed by two steps. First, 51 Statistics Results Design L I F T COI N Llk Dlk New1 New2 TraceGen Time Mem Ops Time Mem Ops Time Len D1 1163 1330 7 - 632 30 0 - 16:20 107.1 44 15:31 107.9 41 - - 60 0 - 41:11 136.7 66 41:31 137.7 66 - - 90 0 - 2:30:33 224.1 81 2:24:50 224.3 81 - - 120 - - time-out (> 24h) time-out (> 24h) - - D2 385 352 25 - 68 30 0 - 0:01 21.0 12 0:01 21.1 12 - - 60 0 - 0:40 38.7 28 0:40 38.7 28 - - 68 12032 - 4:47:52 95.7 386439 0:53:58 96.5 70329 - - D3-F1 32541 912 2 - 28941 30 1 - 0:49 140.4 54 0:49 140.5 55 6:37 66 D3-F2 32541 912 4 - 28930 4 1 - 0:09 129.1 9 0:09 129.1 12 1:54 14 - 4 28930 30 - 1 1:31 132.8 168 1:31 132.8 172 2:11 14 Table 1: Experimental results. we find STSCCs on an abstract machine from the state vari- image/pre-image computations(Ops). N ew1 is the proposed able. The abstract machine is made in the same way as in method without the trimming technique, whereas N ew2 is livelock checking on FSM. Secondly, we evaluate the value the proposed method with the trimming technique. The of the state variable in the livelock if the livelock exists. times are in the form of hh:mm:ss and the memory con- Figure 9 shows the procedure that checks toggle deadlock sumptions are in M-byte. The final two columns(T raceGen) on a given state variable t. CheckT oggleDeadlock takes show the results on trace generation on concrete machine for transition relation (Q), a set of states (S), a set of initial the livelock or toggle deadlock found by N ew2, and T ime states (I), a concrete machine (C), and the state variable shows the time spent for trace generation and Len shows (t) as procedure inputs. CheckT oggleDeadlock is similar the trace length. to CheckLivelock in Figure 6. For each reachable livelock We have chosen 3 industrial designs (D1, D2, and D3). group Rkj in Line 6, T estT oggleDeadlock checks whether the For each design, we have run livelock or toggle deadlock value of t toggles or not in the livelock group and returns dlk checking on several sizes of abstract machines with the mul- and c in Line 7. dlk represents whether the state variable is tiples of 30 latches. We have set the maximum run time to in toggle deadlock or not, and c is the constant value (0 or 24 CPU hours. 1) in the case of toggle deadlock. In D1, there is one FSM automatically extracted. The CheckToggleDeadlock(Q, S, I, C, t) { FSM consists of 7 latches and contains 632 latches in its COI. 1 remaining = I; We can see that the run time is exponentially increased, 2 k = 0; 3 s = PickOneState(I); depending on the size of the abstract machine. On this 4 while (s ̸= ZERO) { design, the livelock checking became infeasible when N =120. 5 (Rk , reachedk , Ok ) = FindLivelock(Q, S, s); In D2, there is also one FSM that was user-specified. The 6 for (j = 0; j < |Rk |; j++) { 7 (dlk, c) = TestToggleDeadlock(Rk j , t); FSM consists of 25 latches and contains only 68 latches in 8 if (dlk) { its COI. This design has a livelock group. However, the j j 9 tracek = GenerateTrace(C, Rk , s, Ok ); livelock was not detected when N=30 and N=60. The live- j 10 ReportToggleDeadlock(s, Rk , tracejk , c); lock was detected only when all the latches in the COI were 11 } included in the abstract machine. In other words, the ab- 12 } 13 remaining = remaining ∧ ¬reachedk ; stract machine is the concrete machine at the FSM point of 14 s = PickOneState(remaining); view. Since the livelock was found on the concrete machine, 15 k++; trace concretization is not required since the abstract trace 16 } } in Section 4.2 is already a concrete trace. Figure 9: Checking toggle deadlock. D2 is the only design showing a significant performance difference between N ew1 and N ew2 in the table. This is be- 6. EXPERIMENTAL RESULTS cause this design has many transient states as well as many We have implemented the proposed livelock checking and livelock groups. In this case, the trimming technique signif- toggle deadlock checking algorithms. Table 1 shows our ex- icantly reduced the number of image/pre-image operations perimental results on livelock and toggle deadlock checking, from 386K to 70K (5.5X reduction) that gave big speed-up generated on a 1.4 GHz Intel processor machine with 4 GB from 5 hours to 1 hour (5X speed-up). This shows that the memory running Red Hat Linux. trimming technique helps the performance when there are The first column lists the design names. The next five many transient states. When there is no transient states, columns present the statistics on the designs, in terms of the the trimming technique becomes a pure overhead as shown number of latches (L), the number of inputs (I), the number in D3. However, the overhead is almost negligible from the of latches in FSM (F ), the number of toggle signals to check experiment. (T ), and the number of latches in the COI of either FSM In D3, there are two FSMs (F 1 and F 2). F 1 is composed and a toggle signal (COI). The next three columns show the of 2 latches and a livelock was found with N =30 within 49 results on livelock and toggle deadlock checking. The col- seconds. The livelock was justified by trace concretization umn with N shows how many latches were in the abstract that took 397 seconds, and the trace length was 66. F 2 is machine. The column with Llk shows how many livelock composed of 4 latches and a livelock was found with N =4 groups are found and the column with Dlk shows how many (the FSM itself) in 9 seconds. The livelock was also justi- toggle deadlock are found. The next six columns compare fied by trace concretization that took 114 seconds, and the the performance between two methods (N ew1 and N ew2), trace length was 14. We have also tried the toggle dead- in terms of time(T ime), memory(M em), and the number of 52 lock checking on F 2 separately from the livelock checking. [5] M. Case, H. Mony, J. Baumgartner, and R. Kanzelman. A toggle deadlock was found in 90 seconds and the concrete Enhanced verification by temporal decomposition. In Formal Methods in Computer Aided Design, pages 37–54, 2009. trace was generated in 131 seconds. D3 shows the value of [6] H. Cho, G. D. Hachtel, E. Macii, M. Poncino, and F. Somenzi. abstraction-based livelock and toggle deadlock checking. Automatic state space decomposition for approximate fsm Table 2 shows a comparison on finding all SCCs with four traversal based on circuit analysis. IEEE Transactions on algorithms (XB [28], Lockstep [3], Skeleton [12], IXB [20]) Computer-Aided Design, 15(12):1451–1464, Dec. 1996. [7] O. Coudert and J. C. Madre. A unified framework for the on the design D2 from Table 1. In this design, the number of formal verification of sequential circuits. In Proceedings of the recurrent states is 2.07e8 and the number of transient states International Conference on Computer-Aided Design, pages is 1.2e6 that is only 0.6% of all states. However, it turned out 126–129, Nov. 1990. [8] O. G. E. M. Clarke and D. Peled. Model Checking. The MIT that how to handle these transient states efficiently is the key Press, 1999. factor in the performance. One main difference between XB [9] N. Een and N. Sorensson. MiniSat. and IXB is that IXB trims out those transient states as much http://www.cs.chalmers.se/Cs/Research/FormalMethods/ as possible. This trimming technique makes the IXB method MiniSat. [10] E. A. Emerson and C. Lei. Modalities for model checking: outperform on this design: faster in time (more than 15X) Branching time logic strikes back. Science of Computer and fewer number of image operations (more than 10X) than Programming, 8:275–306, 1987. the other methods. This explains why N ew2 outperformed [11] E. A. Emerson and C.-L. Lei. Efficient model checking in on D2 in Table 1. Table 2 also shows why livelock checking is fragments of the propositional mu-calculus. In Proceedings of the First Annual Symposium of Logic in Computer Science, done by finding TSCCs instead of SCCs. Finding all livelock pages 267–278, June 1986. groups took 54 minutes, whereas finding all SCCs took 100 [12] R. Gentilini, C. Piazza, and A. Policriti. Computing strongly minutes (2X) even with IXB. connected components in a linear number of symbolic steps. In SODA ’03: Proceedings of the fourteenth annual ACM-SIAM Method Time Memory Ops SCCs States symposium on Discrete algorithms, pages 573–582, 2003. XB 84:07:56 98.2 1013333 [13] G. D. Hachtel, E. Macii, A. Pardo, and F. Somenzi. Markovian Lockstep 45:55:53 237.3 2590724 19458 2.08e8 analysis of large finite state machines. IEEE Transactions on Skeleton 26:01:54 266.5 2609008 Computer-Aided Design, 15(12):1479–1493, Dec. 1996. IXB 1:39:47 92.5 102990 [14] R. Hojati, H. Touati, R. P. Kurshan, and R. K. Brayton. Efficient ω-regular language containment. In Computer Aided Table 2: Finding all SCCs in D2. Verification, pages 371–382, Montréal, Canada, June 1992. [15] R. P. Kurshan. Computer-Aided Verification of Coordinating 7. CONCLUSIONS Processes. Princeton University Press, Princeton, NJ, 1994. We have presented a framework for abstraction-based live- [16] L. Lamport. Proving the correctness of multiprocess programs. lock and toggle deadlock checking, in order to handle large IEEE Transactions on Software Engineering, SE-3(2):125–143, Mar. 1977. designs in practice. Since exact livelock and toggle deadlock [17] Y. Matsunaga, P. C. McGeer, and R. K. Brayton. On checking is infeasible on real designs directly, our approach computing the transitive closure of a state transition relation. is to check livelock and toggle deadlock on abstract machine In Proceedings of the Design Automation Conference, pages of either an FSM or a toggle signal. Once we find a livelock 260–265, June 1993. [18] K. Nanshi and F. Somenzi. Constraints in one-to-many or toggle deadlock, we justify the livelock or toggle deadlock concretization for abstraction refinement. In Proceedings of the on the concrete machine by concretizing the abstract trace Design Automation Conference, pages 569–574, 2009. on the concrete machine. [19] S. Qadeer, R. K. Brayton, V. Singhal, and C. Pixley. Latch Even though the proposed approach does not prove the redundancy removal without global reset. In Proceedings of the International Conference on Computer Design, pages non-existence of livelock or toggle deadlock on a design un- 432–439, 1996. less the design is small enough to handle, this approach finds [20] K. Ravi, R. Bloem, and F. Somenzi. A comparative study of livelocks or toggle deadlocks on the design if there exists. symbolic algorithms for the computation of fair cycles. In W. A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods To the best of our knowledge, it is the first approach to in Computer Aided Design, pages 143–160. Springer-Verlag, use the abstraction-based livelock checking and also the first Nov. 2000. LNCS 1954. approach for checking toggle deadlock. The experimental re- [21] V. Singhal. Design replacements for sequential circuits. Ph.D. sults showed that the abstraction-based approach finds live- dissertation, University of California at Berkeley, 1996. [22] V. Singhal, C. Pixley, A. Aziz, and R. K. Brayton. Theory of lock errors on the real designs. safe replacements for sequential circuits. IEEE Transactions As future work, we are interested in improving the con- on Computer-Aided Design, 20(2):249–265, Feb. 2001. cretization, finding more accurate influential latches, and [23] R. Tarjan. Depth first search and linear graph algorithms. optimizing the computations with multiple FSMs or toggle SIAM Journal of Computing, 1:146–160, 1972. [24] H. J. Touati, R. K. Brayton, and R. P. Kurshan. Testing signals by considering the overlaps in their COIs. language containment for ω-automata using BDD’s. Information and Computation, 118(1):101–109, Apr. 1995. 8. REFERENCES [25] M. Y. Vardi and P. Wolper. An automata-theoretic approach to [1] A. Biere, C. Artho, and V. Schuppan. Liveness checking as automatic program verification. In Proceedings of the First safety checking. In International Workshop in Formal Symposium on Logic in Computer Science, pages 322–331, Methods for Industrial Critical Systems, pages 160–177, 2002. Cambridge, UK, June 1986. [2] A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symbolic model [26] T.-H. Wang and T. Edsall. Practical FSM analysis for verilog. checking without BDDs. In Fifth International Conference on In IVC-VIUF ’98: Proceedings of the International Verilog Tools and Algorithms for Construction and Analysis of HDL Conference and VHDL International Users Forum, Systems (TACAS’99), pages 193–207, Amsterdam, The pages 52–58, 1998. Netherlands, Mar. 1999. LNCS 1579. [27] A. Xie and P. A. Beeral. Efficient state classification of [3] R. Bloem, H. Gabow, and F. Somenzi. An algorithm for finite-state markov chains. IEEE Transactions on strongly connected component analysis in n log n symbolic Computer-Aided Design, 17(12):1334–1339, Dec. 1998. steps. In Formal Methods in Computer Aided Design, pages [28] A. Xie and P. A. Beeral. Implicit enumeration of strongly 37–54, 2000. connected components and an application to formal [4] R. Bryant. Graph-based algorithms for boolean function verification. IEEE Transactions on Computer-Aided Design, manipulation. IEEE Transactions on Computers, 19(10):1225–1230, Oct. 2000. C-35(8):677–691, Aug. 1986. 53 DIFTS13 Keyword Index Keyword Index abstraction 4 abstraction-based 46 assertion 38 Boolector 28 Data-path equivalence checking 9 deadlock checking 46 dynamic verification 38 IC3 19 Lambda 28 Lemmas on Demand 28 livelock checking 46 Model Checking 19 modelchecking 4 Polynomial equivalence checking 9 QF BV SMT solving 9 reparameterization 4 SAT solver 19 SMT 28 synthesis 4 systemC 38 verification 4 DIFTS13 Author Index Author Index Biere, Armin 28 Brayton, Robert 9 Cabodi, Gianpiero 19 Case, Michael 9 Cleaveland, Rance 1 Dutta, Sonali 38 Een, Niklas 4 Fujita, Masahiro 2 Goswami, Dhiraj 3 Harer, Kevin 46 Long, Jiang 9 Mishchenko, Alan 4, 19 Moon, In-Ho 46 Niemetz, Aina 28 Palena, Marco 19 Preiner, Mathias 28 Tabakov, Deian 38 Vardi, Moshe Y. 38