=Paper= {{Paper |id=None |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-1130/difts13Proceedings.pdf |volume=Vol-1130 }} ==None== https://ceur-ws.org/Vol-1130/difts13Proceedings.pdf
         Program Proceedings

  International Workshop on Design and
Implementation of Formal Tools and Systems




              Portland, OR
            October 19, 2013
Co-located with FMCAD and MEMOCODE
Preface
        The second DIFTS (Design and Implementation of Formal Tools and Systems) workshop
was held at Portland, Oregon on Oct 19, 2013, co-located with Formal Methods in Computer-
Aided Design Conference (FMCAD), and Formal Methods and Models for Co-design
(MEMOCODE). The workshop emphasized the insightful experiences in tool and system design.
The goal of the workshop is to provide a forum for sharing challenges and solutions that are
original with ground breaking results. The workshop provided an opportunity for discussing
engineering aspects and various design decisions required to put such formal tools and systems
into practical use. It took a broad view of the formal tools/systems area, and solicited
contributions from hardware and software domains such as decision procedures, verification,
testing, validation, diagnosis, debugging, and synthesis.
        The workshop received 10 original submissions, out of which 3 were chosen under tool
category, and 3 were chosen under system category. There were also three invited talks: first was
given by Rance Cleaveland, Reactive Systems Inc., USA on “Approximate Formal verification
using Model-based Testing”, second was given by Masahiro Fujita, University of Tokyo on
“Diagnosis and correction of buggy hardware/software with formal approaches”, and third talk
was given by Dhiraj Goswami, Synopsys Inc. on “Stimulus generation, enhancement and debug
in constraint random verification.”
        First of all, we thank FMCAD’s steering committee for their continual support. We also
thank FMCAD chairs Sandip Ray and Barbara Jobstmann, and MEMOCODE chairs Marly
Roncken and Fei Xie for a seamless organization. We also thank Joe Leslie-Hurd for his help in
local arrangements. We thank Boğaziçi University, Turkey for hosting the DIFTS website. We
sincerely thank the program committee members and sub reviewers for selecting the papers and
providing candid review feedbacks to the authors. Last but not least, we thank all the authors for
contributing to the workshop and to all the participants of the workshop.

Malay K. Ganai and Alper Sen
Program Chairs
DIFTS 2013




                                                                                                 i
DIFTS13                                                      Program Committee


General Program Chairs

 Malay Ganai        NEC Labs America, USA
 Alper Sen          Bogazici University, Turkey


Program Committee

 Armin Biere        Johannes Kepler University, Austria
 Gianpiero Cabodi   Politecnico di Torino, Italy
 Franco Fummi       University of Verona, Italy
 Malay Ganai        NEC Labs America, USA
 Daniel Grosse      University of Bremen, Germnay
 William Hung       Synopsys Inc, USA
 Daniel Kroening    Oxford University, UK
 Alper Sen          Bogazici University, Turkey
 Ofer Strichman     Technion - Israel Institute of Technology, Israel
 Chao Wang          Virginia Tech, USA




                                                                                 ii
DIFTS13                                      Additional Reviewers


Additional Reviewers

                       Balakrishnan, Gogul
                       Eldib, Hassan
                       Horn, Alexander
                       Ivrii, Alexander
                       Kusano, Markus
                       Le, Hoang
                       Schrammel, Peter
                       Sinz, Carsten
                       Sousa, Marcelo
                       Suelflow, Andre




                                                                    iii
DIFTS13                                                                                                                  Table of Contents


Table of Contents
Approximate Formal Verification using Model-based Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                         1
 Rance Cleaveland (Invited speaker)
Diagnosis and Correction of Buggy Hardware/Software with Formal Approaches . . . . . . . . . .                                                2
  Masahiro Fujita (Invited speaker)
Stimulus generation, enhancement and debug in constraint random verification . . . . . . . . . . .                                            3
  Dhiraj Goswami (Invited speaker)
A Fast Reparameterization Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   4
  Niklas Een and Alan Mishchenko
LEC: Learning driven data-path equivalence checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                   9
  Jiang Long, Robert Brayton and Michael Case
Trading-off Incrementality and Dynamic Restart of Multiple Solvers in IC3 . . . . . . . . . . . . . . 19
  Marco Palena, Gianpiero Cabodi and Alan Mishchenko
Lemmas on Demand for Lambdas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
  Mathias Preiner, Aina Niemetz and Armin Biere
CHIMP: a Tool for Assertion-Based Dynamic Verification of SystemC Models . . . . . . . . . . . . 38
 Sonali Dutta, Moshe Y. Vardi and Deian Tabakov
Abstraction-Based Livelock/Deadlock Checking for Hardware Verification . . . . . . . . . . . . . . . . 46
 In-Ho Moon and Kevin Harer




                                                                                                                                                  iv
  Approximate Formal Verification using Model-based Testing
                                       Rance Cleaveland
                                  University of Maryland, USA

Abstract: In model-based testing, (semi-)formal models of systems are used to drive the
derivation of test cases to be applied to the system-under-test (SUT). The technology has long
been a part of the traditional hardware-design workflows, and it is beginning to find application
in embedded-software development processes also. In automotive and land-vehicle control-
system design in particular, models in languages such as MATLAB® / Simulink® /Stateflow®
are used to drive the testing of the software used to control vehicle behavior, with tools like
Reactis®, developed by a team including the speaker, providing automated test-case generation
support for this endeavor. This talk will discuss how test-case generation capabilities may also be
used to help verify that models meet formal specifications of their behavior. The method we
advocate, Instrumentation-Based Verification (IBV), involves the formalizaton of behavior
specifications as models that are used to instrument the model to be verified, and the use of
coverage testing of the instrumented model to search for specification violations. The
presentation will discuss the foundations of IBV, the test-generation approach and other features
in Reactis® that are used to support IBV, and the results of several case studies involving the use
of the methods.




                                                                                                  1
   Diagnosis and Correction of Buggy Hardware/Software with
                      Formal Approaches
                                        Masahiro Fujita
                                   University of Tokyo, Japan

Abstract: There have been intensive researches on debugging hardware as well as software.
Some are very ad-hoc and based on simple heuristics, but others utilize formal methods and are
mathematically modeled. In this talk, we first review various proposals on debugging from
historical viewpoints, and then summarize the state-of-the-art in terms of both diagnosis and
automatic correction of designs. In particular we show various approaches with SAT-based
formulations of diagnosis and correction problems. We also discuss about them in relation to
manufacturing test techniques. That is, if the design errors are within the pre-determined types
and/or areas, there could be very efficient ways to formally verify, diagnosis and correction
methods with small numbers of test vectors. In the last part of the talk, future perspectives
including post-silicon issues are discussed.




                                                                                                   2
  Stimulus Generation, Enhancements and Debug in Constraint
                     Random Verification
                                         Dhiraj Goswami
                                         Synopsys, USA

Abstract: Verification cost in an IC design team occupies 60-80% of the entire working
resources and efforts. Functional verification, posed at the foremost stage of the IC design flow,
determines the customers' ability to find bugs quickly and thereby their time-to-results (TTR)
and cost-of-results (COR). Consequently, functional verification has been the focus of the EDA
industry for the last several decades.

Constrained random simulation methodologies have become increasingly popular for functional
verification of complex designs, as an alternative to directed-test based simulation. In a
constrained random simulation methodology, random vectors are generated to satisfy certain
operating constraints of the design. These constraints are usually specified as part of a testbench
program (using industry-standard testbench languages, like SystemVerilog from Synopsys, e
from Cadence, OpenVera, etc.). The testbench automation tool (TBA) is then expected to
generate random solutions for specified random variables, such that all the specified constraints
over these random variables are satisfied. These random solutions are then used to generate valid
random stimulus for the Design Under Verification (DUV). This stimulus is simulated using
industry-standard simulation tools, like VCS from Synopsys, NC-Verilog from Cadence, etc.
The results of simulation are then typically examined within the testbench program to monitor
functional coverage, which gives a measure of confidence on the verification quality and
completeness.

In this talk we will review the challenges of stimulus and configuration generation using
constraint random verification methodology. We will also explore why state-of-the-art debug
solutions are important to handle complexity and improve the quality of stimulus.




                                                                                                     3
                                A Fast Reparameterization Procedure

                                               Niklas Een, Alan Mishchenko
                                                 {een,alanmi}@eecs.berkeley.edu

                                        Berkeley Verification and Synthesis Research Center
                                                         EECS Department
                                              University of California, Berkeley, USA.


 Abstract. Reparameterization, also known as range pre-            the original node can be removed. The old primary in-
 serving logic synthesis, replaces a logic cone by another logic   puts dominated by the given node are also removed by this
 cone, which has fewer inputs while producing the same out-        procedure.
 put combinations as the original cone. It is expected that
 a smaller circuit leads to a shorter verification time. This        Example. Suppose a design contains inputs x1 , x2 ,
 paper describes an approach to reparameterization, which            and a gate Xor(x1 , x2 ); and that furthermore, x1 has
 is faster but not as general as the previous work. The new          no other fanouts besides this Xor-gate. Then, no mat-
 procedure is particularly well-suited for circuits derived by
                                                                     ter which value x2 takes, both a zero and a one can
 localization abstraction.
                                                                     be forced at the output of the Xor by setting x1 ap-
                                                                     propriately, and thus the Xor-gate can, for verification
                                                                     purposes, be replaced by a primary input.
1     Introduction
                                                                   The proposed method to find similar situations starts by
The use of reparameterization as a circuit transformation in
                                                                   computing all dominators of the netlist graph, then for each
the verification flow was pioneered by Baumgartner et. al.
                                                                   candidate node dominating at least one PI the following
in [1]. In their work, new positions for the primary inputs
                                                                   quantification problem is solved: “for each assignment to
(PIs) are determined by finding a minimum cut between
                                                                   the non-dominated gates, does there exist a pair of assign-
the current PI positions and the next-state variables (flop
                                                                   ments to the dominated PIs that results in a zero and a
inputs). BDDs are then used to compute the range (or im-
                                                                   one at the candidate node”. More formally, assuming that
age) on the cut. Finally, a new logic cone with the same
                                                                   x represents non-dominated gates (“external inputs”) and
range (but fewer PIs), is synthesized from the BDDs and
                                                                   yi represents dominated PIs (“internal inputs”), the follow-
grafted onto the original design in place of the current logic
                                                                   ing is always true for the node’s function φ:
cone. This is a very powerful transformation, but it has po-
tential drawbacks: (i) the BDDs may blow up and exhaust                          ∀x ∃y0 , y1 . ¬φ(x, y0 ) ∧ φ(x, y1 )
the memory, (ii) the extracted circuit may be larger than
the logic it replaces, and (iii) the runtime overhead may be       Important features of this approach are:
too high.
   In contrast, the proposed approach is based on greedy lo-        (i) It is circuit based, while early work on reparameteri-
cal transformations, capturing only a subset of optimization            zation was based on transition relations [4].
opportunities. However, memory consumption is modest,              (ii) In its simplest form, the proposed restructuring re-
runtimes are very low, and the resulting design is always               places some internal nodes by new primary inputs and
smaller, or of the same size, as the original design. It is             remove dangling logic.
shown experimentally that the proposed method leads to
sizeble reductions when applied for circuits produced by           (iii) The analysis is completely combinational: no informa-
localization abstraction [5].                                            tion on the reachable state-space is used.
                                                                   (iv) If the property was disproved after reparametrization,
2     Fast Reparameterization                                           it is straight-forward to remap the resulting counter-
                                                                        example to depend on the original primary inputs.
The fast reparameterization algorithm is based on the fol-
lowing observation: if a node dominates1 a set of PIs, and         It is important to realize that by analyzing and applying
those PIs are sufficient to force both a zero and a one at           reductions in topological order from PIs to POs, regions
that node, regardless of the values given to the other PIs         amenable to reparameterization are gradually reduced to
and state variables, then that node can be replaced by a           contain fewer gates and PIs. By this process, the result of
new primary input, while the unused logic cone driving             repeatedly applying local transformations can lead to a sub-
                                                                   stantial global reduction. In the current implementation,
   1 A node n dominates another node m iff every path from m to a
                                                                   the above formula is evaluated by exhaustive simulation of
primary output goes through node n.

                                                                                                                        4
                [cand]                       [cand]
                                                                        Fast Reparameterization
                                                                        − Compute dominators. For a DAG with bounded
                    &                            &                      in-degree (such as an And-Inverter-Graph), this is a
                                                                        linear operation in the number of nodes (see Figure 3).
                                                                        − For each PI, add all its dominators to the set of
                                                                        “candidates”.
            &               &            !            &
                                                                        − For each candidate c, in topological order from inputs
                                                                        to outputs:
                                                                          - Compute the set D of nodes dominated by c (Figure 4).
            !               ?                    !            ?
                                                                          - Denote the PIs in D “internal” inputs.

                            [cand]                                        - Denote any node outside D, but being a direct fanin of a
                                                                          node inside D, an “external” input (may be any gate type).
                                                                          - Simulate all possible assignments to the internal and ex-
                                                                          ternal inputs. If for all external assignments there exists
                                &                                         an internal assignment that gives a 0 at c, and another in-
                                                                          ternal assignment that gives a 1 at c, then substitute c by
                                                                          a new primary input.

                        !            &
                                                                           Figure 2. Steps of the reparameterization algorithm.


                                &            &
                                                                        Compute Dominators
                                                                        − Initialize all POs to dominate themselves
                !               ?            !            ?             − Traverse the netlist in reverse topological order (from POs
                                                                        to PIs), and mark the children of each node as being domi-
                                                                        nated by the same dominator as yourself unless the child has
Figure 1. Example of subgraphs that can be reduced. Here “!”
                                                                        already been assigned a dominator
denote nodes we can control (a PI dominated by the output
node); “?” denote nodes that can assume any value beyond our            − For already marked children, compute the “meet” of the
control. In the above examples, the output node “[cand]” can            two dominators, i.e. find the first common dominator. If
be forced to both a zero and a one by choosing the right values         there is no common dominator, mark the node as dominat-
for the “!”, regardless of the values of the “?”. In such cases,        ing itself.
the output node is replaced by a new PI.
                                                                       Figure 3. Review of the “finding direct dominators” algorithm
                                                                       for DAGs. For a more complete treatment of this topic, see [8],
the logic cone rooted in the given node while the cone is              or for a more precise description, the source code of “compute-
limited to 8 inputs. Limiting the scope to cones with 8 in-            Dominators()” in “ZZ/Netlist/StdLib.cc” of ABC-ZZ [11].
puts and simulating 256 bit patterns (or eight 32-bit words)
seems to be enough to saturate the reduction achievable on
the benchmarks where the method is applicable.                          Compute Dominated Area
   Some typical reductions are shown in Figure 1. The
graphs should be understood as sub-circuits of a netlist be-            area = {w dom}             – init. set of gates to the dominator
ing reparameterized. The exclamation marks denote “inter-               count = [0, 0, . . ., 0]    – count is a map “gate → integer”
                                                                        for w ∈ area:
nal” PIs dominated by the top-node, and hence under our
                                                                              for v ∈ faninsOf (w ):
control; and the question marks denote gates with fanouts
                                                                                    count[v ]++
outside the displayed logic cone, for which no assumption                           if count[v ] == num of fanouts[v ]:
on their values can be made. The full algorithm is described                                 area = area ∪ {v }
in Figure 2.

Counterexample reconstruction. There are several                       Figure 4. Find nodes used only by gate “w dom”. This set is
                                                                       sometimes called maximum fanout free cone (MFFC). The outer
ways that a trace on the reduced netlist can be lifted to the
                                                                       for-loop over the elements of area is meant to visit all elements
original netlist. For instance, the removed logic between              added by the inner for-loop. For this procedure, flops are treated
the new PIs and the old PIs can be stored in a separate                as sinks, i.e. having no fanins.
netlist. The trace on the reduced netlist can then be pro-


                                                                   2                                                             5
jected onto the original PIs by rerunning the simulation            ing Competition 2012 were considered. Localization ab-
used to produce the reparameterized circuit, and for each           straction [5] was applied with a timeout of one hour to each
candidate pick an assignment that gives the correct value.          benchmark and the resulting models meeting the following
But even simpler, one can just put the original netlist into        criteria were kept:
a SAT solver and assert the values from the trace onto the
appropriate variables and call the SAT solver to complete             – At least half of the flops were removed by abstraction.
the assignment. In practice, this seems to always work well.
                                                                      – The abstraction was accurate (no spurious counterex-
                                                                        amples).
3    Improvements
                                                                      – At least one of the verification engines could prove the
The algorithm described in the previous section replaces
                                                                        property within one hour.
internal nodes with inputs, and thus only removes logic. If
we are prepared to forgo this admittedly nice property and
                                                                    The sizes of benchmarks selected in this way are listed in
occasionally add a bit of new logic, then nodes that are
                                                                    table Table 1. All those models were given to the reparam-
not completely controllable by their dominated inputs can
                                                                    eterization engine, both in weak mode and strong mode,
still be reparameterized by the following method: for node
                                                                    the latter using the improvements described in section 3.
with function φ(x, y), where x are external inputs and y are
                                                                    The reparameterized models were also post-processed with
internal inputs, compute the following two functions:
                                                                    a quick simplification method called “shrink” which is part
                    φ0 (x) ≡ ∀y.¬φ(x, y)                            of the ABC package [7]. The longest runtime for weak repa-
                    φ1 (x) ≡ ∀y.φ (x, y)                            rameterization was 16 ms, for strong reparameterization
                                                                    28 ms and for the simplification phase 50 ms.2 Reductions
Using these two functions, φ can be resynthesized using a           are listed in table Table 2.
single input ynew by the expression:                                   For comparison, Table 2 also include the results of run-
                  ¬φ0 (x) ∧ (φ1 (x) ∨ ynew )                        ning an industrial implementation of the BDD based algo-
                                                                    rithm of [1]. Because runtimes are significantly longer with
In other words, if two or more inputs are dominated by              this algorithm, they are given their own column. These
the node φ, a reduction in the number of inputs is guar-            results were given to us from IBM, and according to their
anteed. Depending on the shape of the original logic, and           statement “are not tweaked as much as they could be”.
how well new logic for φ0 and φ1 is synthesized, the num-              All benchmarks were given to three engines: Property Di-
ber of logic gates may either increase or decrease. In our          rected Reachability [2, 6], BDD-based reachability [3], and
implementation, logic for φ0 and φ1 is created by the fast          Interpolation-based Model Checking [9]. The complete table
irredundant sum-of-product (“isop”) proposed by Shin-ichi           of results is given in Table 3. A slice of this table, showing
Minato in [10]. We greedily apply this extended method              only results for PDR, with and without (strong) reparam-
for all nodes with two or more dominated PIs, even if it            eterization, is given in Table 4 together with a scatter plot.
leads to a blow-up in logic size. To counter such cases, fast
logic synthesis can be applied after the reparameterization.
Obviously, there are many ways to refine this scheme.               Analysis. Firstly, we see a speedup of 100x-1000x over
                                                                    previous work in the runtime of the reparameterization al-
                                                                    gorithm itself, with comparable quality of results for the
4    Future Work                                                    application under consideration (models resulting from lo-
                                                                    calization abstraction). This means the algorithm can al-
Another possible improvement to the method is extending             ways be applied without the need for careful orchestration.
the reparameterization algorithm to work for multi-output           Secondly, we see an average speedup of 2.5x in verifica-
cones. As an example, consider a two-output cone where              tion times when applying reparameterization in conjunc-
the outputs can be forced to all four combinations {00,             tion with PDR, which is also the best overall engine on
01, 10, 11} by choosing appropriate values for dominated            these examples. For two benchmarks, 6s121 and 6s150,
inputs. In such a case, the cone can be replaced by two             BDD reachability do substantially better than PDR, and
free inputs. If some of the four combinations at the out-           for the latter (where runtimes are meaningful) the speedup
puts are impossible under conditions expressed in terms of          due to reparameterization is greater than 3x. Furthermore,
non-controllable signals, a logic cone can be constructed to        for BDD reachability one can see that on several occa-
characterize these conditions and reduce the number of PIs          sions (6s30 in particular), reparameterization is completely
by adding logic similar to the case of a single-output cone.        crucial for performance. Finally, interpolation based mod-
                                                                    elchecking (IMC) seems to be largely unaffected by repa-
                                                                    rameterization.
5    Experiments
                                                                       2 Benchmarks from HWMCC’12 are quite small. For comparison:
As part of the experimental evaluation, all benchmarks              running reparameterization on a 7 million gate design from one of our
from the single-property track of the Hardware Modelcheck-          industrial collaborators took 4.1 s.


                                                                3                                                               6
                                                                        Abstraction Phase
                                       !                                                !
                          Design       !   #And           #PI             #FF           ! Depth
                                       !                                                !
                          6s102        !
                                       !     6,594           72      1,121 → 56         !
                                                                                        !    23
                          6s121        !
                                       !     1,636           99        419 → 110        !
                                                                                        !    19
                          6s132        !
                                       !     1,216           94        139 → 113        !
                                                                                        !     7
                          6s144        !
                                       !    41,862          480      3,337 → 236        !
                                                                                        !    18
                          6s150        !
                                       !     5,448          146      1,044 → 323        !
                                                                                        !   103
                          6s159        !
                                       !     1,469           13        252 → 18         !
                                                                                        !     4
                          6s164        !
                                       !     1,095           91        198 → 93         !
                                                                                        !    16
                          6s189        !
                                       !    36,851          479      2,434 → 259        !
                                                                                        !    18
                          6s194        !
                                       !    12,049          532      2,389 → 198        !
                                                                                        !    45
                          6s30         ! 1,043,139
                                       !                 32,994      1,195 → 128        !
                                                                                        !    32
                          6s43         !
                                       !     7,408           30        965 → 310        !
                                                                                        !    25
                          6s50         !
                                       !    16,700        1,570      3,107 → 207        !
                                                                                        !    52
                          6s51         !
                                       !    16,701        1,570      3,107 → 209        !
                                                                                        !    65
                          bob05        !
                                       !    18,043          224      2,404 → 146        !
                                                                                        !   106
                          bob1u05cu    !    32,063          224      4,377 → 146        !   106

Table 1. Sizes of original designs and abstract models. Column #FF shows how many flip-flops were turned into unconstrained
inputs by localization abstraction. The removed FFs show up as PIs in the other tables. Last column shows the BMC depth used
by the abstraction engine (see [5] for more details).
                                                                                        Reparameterization
                  !                 !              !             !
                  ! No Reparam. ! Weak Rep. ! Strong Rep. !              BDD Reparam.
                  !                 !              !             !
                  !                !               !             !
        Design    ! #And     #PI ! #And #PI ! #And          #PI ! #And #PI Runtime
                  !                 !               !             !
        6s102     !
                  !   6,594  1,137 !! 1,247    331 !! 1,188  267 !! 1,283   285    71.91 s
        6s121     !
                  !   1,636    408  !
                                    !    627   120  !
                                                    !   559   66  !
                                                                  !    732   41     0.27 s
        6s132     !
                  !   1,216    120 ! 1,108
                                    !           62 ! 1,102
                                                    !         55 ! 2,731
                                                                  !          36     0.80 s
        6s144     ! 41,862   3,580 !! 10,172 1,038 !! 9,494  812 !! 13,259  889    60.50 s
                  !
        6s150     !
                  !   5,448    867 !! 3,062    506 !! 2,213   89 !! 2,231    46     1.81 s
        6s159     !
                  !   1,469    247 !!    116    20 !!   114   19 !!    256   13     0.02 s
        6s164     !
                  !   1,077    196 !!    661   109 !!   499   41 !! 3,413    40    11.61 s
        6s189     ! 36,851   2,654 !! 10,033 1,004 !! 9,552  794 !! 12,051  814    63.11 s
                  !
        6s194     ! 12,049   2,723 ! 1,366     184 ! 1,348   166 !! 1,612   121    10.37 s
                  !                 !               !
        6s30      ! 102,535 34,061 !! 1,508    307 !! 1,184  205 !!    603   50     0.74 s
                  !
        6s43      !
                  !   7,408    685 !! 3,451    304 !! 3,218  202 !! 17,691  101     2.07 s
        6s50      ! 16,700   4,470 !! 1,841    350 !! 1,652  270 !! 4,319   752    46.57 s
                  !
        6s51      !
                  !  16,701  4,468  !
                                    !  1,828   350  !
                                                    ! 1,639  268  ! 1,255
                                                                  !          62    16.34 s
        bob05     ! 18,043   2,358  ! 1,618    187  ! 1,388   52  ! 1,586    43     0.31 s
                  !                 !               !             !
        bob1u05cu ! 32,063   4,455 ! 1,618     187 ! 1,388    52 ! 1,586     43     0.33 s

Table 2. Effect of reparameterization on the abstracted models. “Weak” reparameterization refers to the basic method described
in section 2, “Strong” additionally includes the improvements discussed in section 3. Runtimes are omitted as the average CPU
time was 4-8 ms (and the longest 28 ms). However, BDD based reparameterization is not as scalable as the method presented in
this paper, and runtimes (typically between 100x-1000x longer) are listed for reference.

References                                                        [4] Pankaj Chauhan, Edmund Clarke, and Daniel Kroening.
                                                                      A SAT-Based Algorithm for Reparameterization in
 [1] J. Baumgartner and H. Mony. Maximal Input Reduc-                 Symbolic Simulation. In Proc. of DAC, 2004.
     tion of Sequential Netlists via Synergistic Reparam-
     eterization and Localization Strategies. In Proc. of         [5] N. Een, A. Mishchenko, and N. Amla. A Single-
     CHARME, pages 222–237, 2005.                                     Instance Incremental SAT Formulation of Proof-
 [2] Aaron Bradley. IC3: SAT-Based Model Checking                     and Counterexample-Based Abstraction. In FM-
     Without Unrolling. In Proc. of VMCAI, 2011.                      CAD, 2010.

 [3] R. E. Bryant. Graph-Based Algorithms for Boolean             [6] Niklas Een, Alan Mishchenko, and Robert Brayton. Effi-
     Function Manipulation. In IEEE Transactions on Com-              cient Implementation of Property Directed Reach-
     puters, vol. c-35, no.8, Aug., 1986.                             ability. In Proc. of FMCAD, 2011.


                                                             4                                                       7
                                                                                                                               Verification Runtimes
             !                                     !                            !
             !
             !                 PDR                 !
                                                   !         BDD reach.         !
                                                                                !                                                           IMC
             !                                     !                            !
             ! NoRep.           Weak      Strong   ! NoRep.     Weak Strong ! NoRep.                                                        Weak     Strong
             !                                     !                            !
   6s102     !
             !     1.7             0.4        0.4 !!   121.2      90.2   204.3 !!      –                                                    2619.2         –
   6s121     !
             !    64.0            12.3        5.4 !!     0.5       0.3     0.4 !!   10.2                                                       5.8      36.0
   6s132     !
             !     7.8             8.1        7.9 !!  2327.3    1375.5       – !!  201.4                                                     384.9     267.9
   6s144     !
             !    10.3            12.4        9.6 !!       –         –       – !! 1177.5                                                     942.1    1413.0
   6s150     !
             !       –          2323.7          – !!   539.3     143.6   189.1 !!      –                                                         –         –
   6s159     !
             !     0.0             0.0        0.0 !!     0.1       0.1     0.1 !!    0.1                                                       0.1       0.1
   6s164     !
             !     7.0            34.1        6.8 !!       –      33.6     0.4 !!    4.4                                                       8.5       3.0
   6s189     !
             !    11.9             7.8        8.6 !!       –         –       – !!  738.0                                                     396.5     366.0
   6s194     !
             !     4.7             4.6        3.7 !!   307.5      42.5   742.2 !!  798.9                                                    1042.8    1103.3
   6s30      !
             !    15.9            45.3       12.3 !!       –      17.6    49.2 !!      –                                                         –         –
   6s43      !
             !    13.6            11.7        9.1 !!       –    1191.1  1390.4 !!      –                                                         –         –
   6s50      !
             !    14.7            10.8        5.4 !!   941.6     241.0  1680.8 !! 1795.6                                                     820.0    2348.9
   6s51      !
             !    18.7             7.1        4.1 !!   303.8     147.6  1891.2 !! 1806.8                                                     954.8    1737.8
   bob05     !
             !     0.3             0.3        0.3 !!  2450.5    2371.4  1253.2 !!   30.3                                                      26.3      25.1
   bob1u05cu !     0.3             0.2        0.3 !        –    2157.6  1088.3 !    30.3                                                      24.0      24.8

Table 3. Full table of results. Each design after abstraction is given to three engines: Property Directed Reachability (PDR),
BDD-based reachability (BDD), and Interpolation-based Model Checking (IMC). Each engine is run on three versions of the model
with no/weak/strong reparameterization applied. A dash represents a timeout after one hour.


                      !
            Design    ! NoRep.         Rep.
                                                          PDR on reparametrized abstract model




            6s102
                      !
                      !    1.66        0.40                                                      1000
                      !
            6s121     !
                      !   64.00        5.43
            6s132     !
                      !   7.82         7.90
            6s144     !   10.34        9.56                                                          100
                      !
            6s159     !
                      !    0.04        0.03
            6s164     !
                      !    6.97        6.83
            6s189     !
                      !   11.86        8.59                                                           10
            6s194     !
                      !    4.71        3.72
            6s30      !
                      !   15.92       12.29
            6s43      !
                      !   13.63        9.15                                                            1
            6s50      !
                      !   14.66        5.44
            6s51      !
                      !   18.72        4.08
            bob05     !    0.33        0.26
                      !                                                                              0.1
            bob1u05cu !    0.33        0.26                                                                0.1     1      10       100      1000
                                                                                                                    PDR on abstract model

Table 4. Runtime improvements for PDR. Column “NoRep.” shows runtimes (in seconds) for proving the property of each
benchmark using PDR after abstraction, but without reparameterization; column “Rep.” shows runtimes after reparameterization.
The scatter plot on the right places these runtime pairs on a log-scale. The average speedup is 2.5x.


 [7] Berkeley Logic Synthesis Group.        ABC: A Sys-                                                    Products Forms from Binary Decision Diagrams. In
     tem for Sequential Synthesis and Verification.                                                        Proc. of SASIMI, 1992.
     http://www.eecs.berkeley.edu/˜alanmi/abc/, v00127p.                                             [11] Berkeley Verification and Synthesis Research Center.
 [8] T. Lengauer and R.E. Tarjan. A Fast Algorithm for                                                    ABC-ZZ: A C++ framework for verification & syn-
     Finding Dominators in a Flowgraph. In ACM Trans-                                                     thesis. https://bitbucket.org/niklaseen/abc-zz.
     actions on Programming Languages and Systems, Vol. 1,
     No. 1, July, pages 121–141, 1979.
 [9] K. L McMillan. Interpolation and SAT-based Model
     Checking. In Proc. of CAV, 2003.
[10] S. Minato. Fast Generation of Irredundant Sum-Of-


                                                                                                 5                                                     8
        LEC: Learning-Driven Data-path Equivalence
                        Checking
                                       Jiang Long∗ , Robert K. Brayton∗ , Michael Case†
                                               ∗ EECS Department, UC-Berkeley

                                              {jlong, brayton}@eecs.berkeley.edu
                                                   † Calypto Design Systems

                                                     {mcase}@calypto.com



   Abstract—                                                          equivalence problem is NP-complete. However, on the extreme
   In the LEC system, we employ a learning-driven approach for        end, the complexity becomes O(1) of the size of the network
solving combinational data-path equivalence checking problems.        if the two designs are structurally the same. An NP-complete
The data-path logic is specified using Boolean and word-level
operators in VHDL/Verilog. The targeted application area are C-       problem can be tackled by using SAT-solvers as a general
to-RTL equivalence checking problems found in an industrial set-      procedure. To counter the capacity limitation of SAT-solving,
ting. These are difficult because of the algebraic transformations    it is crucial to reduce the complexity by identifying internal
done on the data-path logic for highly optimized implementations.     match points and by conducting transformations to bring in
Without high level knowledge, existing techniques in bit-level        more structural similarity between the two sides of the miter
equivalence checking and QF BV SMT solving are unable to
solve these problems effectively. It is crucial to reverse engineer   logic.
such transformations to bring more similarity between the two
sides of the logic. However, it is difficult to extract algebraic                    Verilog operators         SMT QF BV operators
logic embedded in a cloud of Boolean and word-level arithmetic           Boolean     &&, k, !, ⊕, mux           and, or, not, xor, ite
operators. To address this, LEC uses a compositional proof               bit-wise     &, |, ∼, ⊕, mux      bvand, bvor, bvnot, bvxor, bvite
methodology and analysis beyond the bit and word level by incor-        arithmetic     +, −, ∗, /, %     bvadd, bvsub, bvmul, bvdiv, bvmod
porating algebraic reasoning through polynomial reconstruction.           extract             []                       extract
LEC’s open architecture allows new solver techniques to be                concat             {}                        concat
integrated progressively. It builds sub-model trees, recursively       comparator       <, >, ≤, ≥            bvugt, bvult, bvuge, bvule
transformating the sub-problems to simplify and expose the                shifter           ,                     bvshl, bvshr
actual bottleneck arithmetic logic. In addition to rewriting
                                                                                                  TABLE I
rules that normalize the arithmetic operators, LEC supports                           S UPPORTED OPERATORS ( UNSIGNED )
conditional rewriting, where the application of a rule is dependent
on the existence of invariants in the design itself. LEC utilizes
                                                                      A. Motivation
both functional and structural information of the data-path
logic to recognize and reconstruct algebraic transformations. A          The differences between the two data-path logics under
case-study illustrates the steps used to extract the arithmetic       equivalence checking are introduced by various arithmetic
embedded in a data-path design as a linear sum of signed              transformations for timing, area and power optimizations.
integers, and shows the procedures that collaboratively led to
a successful compositional proof.                                     These optimizations are domain specific and can be very spe-
                                                                      cialized towards a particular data-path design and underlying
                                                                      technology. They have the following characteristics:
                       I. I NTRODUCTION                                  • The two sides of the miter logic are architecturally
   With the increasing popularity of high-level design method-              different and have no internal match points.
ologies there is renewed interest in data-path equivalence               • Many expensive operators such as adders and multipliers
checking [3][13][18][20]. In such an application, a design                  are converted to cheaper but more complex implemen-
prototype is first implemented and validated in C/C++, and                  tations and the order of computations are changed. It is
then used as the golden specification. A corresponding Ver-                 not a scalable solution to rely on SAT solving on the
ilog/VHDL design is implemented either manually or au-                      bit-blasted model.
tomatically through high-level synthesis tool [2][4][15]. In             • The parts of the transformed portion are embedded in
both cases, a miter logic for equivalence checking is formed                a cloud of bit and word level operators. Algebraic ex-
to prove the correctness of the generated RTL model by                      traction [8][26] of arithmetic logic based on structural
comparing it against the original C/C++ implementation.                     patterns is generally too restrictive to handle real-world
   The data-path logic targeted in this paper is specified                  post-optimization data-path logic.
using Verilog/VHDL. The bit and word-level operators in                  • Word-level rewriting uses local transformation. Without
Verilog/VHDL have the same semantic expressiveness as SMT                   high-level information, local rewriting is not able to make
QF BV theory[5]. Table I gives a one-to-one correspondence                  the two sides of the miter logic structurally more similar.
between Verilog and QF BV unsigned operators. Signed arith-              Lacking high-level knowledge of the data-path logic, the
metic operators are also supported. The complexity of such an         equivalence problems can be very difficult for gate-level equiv-
                                                                                                                               9
alence checking and general QF BV SMT solvers. Strategi-                                Original
                                                                                         miter
cally, LEC views the bottleneck of such problems as having                                                                                          new miter target
been introduced by high-level optimizations and employs a                                          Verific Parser Frontend

collaborative approach to isolate, recognize and reconstruct
the high-level transformations to simplify the miter model by                                                  VeriABC
bringing in more structural similarities.
                                                                                                          SSA Network
B. Contributions
                                                                                   Word−level                 Bit −level               Learning−based
   The LEC system incorporates compositional proof strate-                          Simulator                    AIG                   Transformations
gies, uses rewriting to normalize arithmetic operators, and
                                                                                                                ABC                       Transformed
conducts analysis beyond bit and word level. The collaborating                                                 solvers                        RTL
procedures help to expose the actual bottleneck in a proof of
equivalence. The novel aspects of this system are:                                                      SAT                  UNSAT
   1) It uses global algebraic reasoning through polynomial          Fig. 1.       Overall tool flow
      reconstruction. In the case-study, it uses the functional
      information of the design to reverse engineer the arith-
      metic expression as a linear sum and also uses a struc-           LEC tries to solve the miter directly using random simula-
      tural skeleton of the original data-path to achieve the        tion on the word-level simulator or by ABC[1]’s equivalence
      equivalence proof.                                             checking procedure dcec, which is a re-implementation of
   2) It supports conditional rewriting and proves required          iprove[24]. If unresolved, LEC applies transformations to the
      invariants as pre-conditions.                                  SSA and produces sub-models in Verilog miter format from
   3) It uses recursive transformations that target making both      which LEC can be recursively applied. The overall system
      sides of the miter logic structurally more similar and         integration is described in Section IV.
      hence more amenable to bit-level SAT sweeping.
   4) It has an open architecture, allowing new solver tech-                                        III. L EARNING TECHNIQUES
      niques to be integrated progressively.                            In this section, we present the techniques implemented in
   Through a case study, we demonstrate the steps that were          LEC. Even though some are simple and intuitive, they are
used to reconstruct the arithmetic embedded in a data-path           powerful when integrated together as demonstrated in the
design as a linear sum of signed integers, as well as all the pro-   experimental results. All techniques are essential bdecause
cedures that compositionally led to a successful equivalence         LEC may not achieve a final proof if any one is omitted.
proof. The experimental results demonstrate the effectiveness        Their interactions are illustrated in the case-study in Section
of these collaborating procedures.                                   V.

                                                                                    miter                                    miter                                      miter
C. Overview
   The overall tool flow is described in Section II. Learning                       F=G                                      F=G                                        F=G

techniques and system integration are presented in Section III        F (̄x )                  G(̄x )           F (̄x )                   G(̄x )           F (̄x )                    G(̄x )
and IV. A case study is presented in Section V. Experimental
results is presented in Section VI followed by a comparison
                                                                                                                    red                blue                    red                blue
with related work and conclusion.

                                                                                                                          purple                                        kept
                        II. T OOL FLOW
                                                                                  input : ̄x                                                                     abstract input x̄'
   LEC takes Verilog/VHDL as the input language for the data-                                                             input : ̄x
                                                                                                                                                                     redundant
path logic under comparison. Internally, a miter network, as
in Figure 2(a), is constructed comparing combinational logic                                                                                                         input : ̄x
                                                                                (a)miter network                 (b )structurally hashed
functions F and G. Figure 1 illustrates the overall tool flow.
                                                                                                                                                                (c )abstraction
   First, the Verific RTL parser front-end[6] is used to compile     Fig. 2.       Miter network
input RTL into the Verific Netlist Database. VeriABC[23]
processes the Verific netlist, flattens the hierarchy and produces
an intermediate DAG representation in static single assignment       A. Structural information
(SSA) form, consisting of Boolean and word-level operators              An SSA netlist is a DAG of bit and word-level operators
as shown in Table I. Except for the hierarchical information,        annotated with bit-width and sign information. In the tool flow,
the SSA is a close-to-verbatim representation of the original        both Verific and VeriABC perform simple structural hashing
RTL description. From SSA, a bit-blasting procedure generates        at the SSA level, merging common sub-expressions. After
a corresponding bit-level network as an AIG (And-inverter            merging, the miter logic is divided into three colored regions
graph). Word-level simulation models can be created at the           using cone of influence (COI) relations, as in Figure 2 (b).
SSA level. ABC[1] equivalence checking solvers are integrated
                                                                        • Red: if the node is in the COI of F only
as external solvers.
                                                                                                                                                                            10
  •  Blue: if the node is in the COI of G only                      removes the logic between the cut and the inputs. An abstract
  •  Purple: the node is in the COI of both sides of the miter      model is formed by replacing the cut signals with free inputs
     i.e. common logic                                              x̄0 . If this abstracted miter is UNSAT, then the original miter
The purple region is the portion of the miter logic that            is UNSAT. In our current implementation, LEC traverses the
has been proved equivalent already, while the red and blue          SSA network in topological order from the inputs. As each
regions are the unresolved ones. LEC makes progress by              node is tentatively replaced with new PIs, simulation is used
reducing the red/blue regions and increasing the purple region.     to validate the replacement. If successful, the node is omitted
The common logic constrains the logic for the red and blue          and replaced with the new PIs and the next node is processed
regions, which may be abstracted (see Section III-E) to reduce      similarly.
redundancy and possibly expose the real bottleneck in a proof.           A successful abstraction step removes irrelevant logic and
                                                                    exposes a smaller unresolved region of the miter logic, al-
                                                                    lowing LEC to continue using other procedures. In addition,
B. Simulation model
                                                                    as seen from experimental results, the reduction of common
   Two word-level simulators are generated from the SSA             logic can reduce significantly the amount of complexity for
network. One is a native interpreted model. The other uses          downstream SAT-solving, e.g. when common multipliers being
the open-source Verilator[29] for compiled simulation. From         removed from the miter logic. An unsuccessful abstraction
the SSA network, LEC automatically generates C++ code for           when the abstract miter becomes SAT, indicates the existence
pseudo-random input drivers and for monitoring design be-           of a rare event not being captured during random simulations.
havior. Verilator compiles the Verilog miter logic, links in the    Often, this gives hints for selecting case-splitting candidates.
generated C++ code and produces a simulator as a standalone
executable. Efficient and effective simulation is crucial in our
current flow in capturing potential constants and potential         F. Rewriting
internal equivalent points at the SSA level. Simulation is also        Similar to [20], word-level rewriting transforms an SSA
used to reduce common logic in the abstraction computation          network into a structurally different but functionally equivalent
procedure.                                                          one. Through rewriting, certain equivalence checking problems
                                                                    can become much simpler. In our experience, a multiplier is
C. Bit-level model                                                  often a source of difficulty in data-path equivalence checking.
                                                                    If two multipliers from opposite sides of the miter are matched
   As shown in Figure 1, an AIG is created from the SSA             exactly, LEC can simplify the miter through structural hashing
network by bit-blasting. LEC calls ABC[1]’s SAT sweeping            and treat them as common logic. This is most effective when
procedure dcec to perform direct solving at the bit level. Using    combined with the abstraction procedure as the common
the AIG model, the native SAT solver embedded in LEC can            multiplier can now be totally removed.
be used to obtain a formal proof for a particular query. Typical       In LEC, a few rules are hard-coded through pattern match-
queries are for extracting constant nodes, proving suspected        ing applied to the SSA network. The goal is to process
equivalent pairs of points or conducting particular learning for    multiplications so that they can be matched exactly. This
rewriting. Book-keeping information between the SSA nodes           rewriting is implementation specific; for illustration purposes,
and the AIG nodes allows queries to be constructed at the           we list a few rewriting rules in Table II using Verilog notation
word-level and verified at the bit-level. The result is then used   and the semantics of the operators.
to simplify the SSA network.
                                                                       The first rule is the normalization of multiplier operands. If
                                                                    a multiplier uses a partial product generator and a compressor
D. Constants and Potential Equivalent Points (PEPs)                 tree, switching the operands of the multiplication becomes a
   At the word-level, candidates for constants and PEPs are         very hard SAT problem because at the bit level the imple-
identified through simulation and SAT queries are posed. Each       mentation is not symmetrical. It is almost imperative to apply
such SAT query is constructed and checked at the bit-level.         this rule whenever possible. The second and third rules use
SAT-solving is configured at a low-effort level (run for a few      the distributive laws of multiplication and multiplexing. Rules
seconds) for these types of queries. Proven constants and PEPs      4 and 5 remove the shift operator  when it is used with
are used immediately to simplify the SSA network, leading           extract and concat because it is hard for multiplication to
to a new sub-model of less complexity. LEC then continues           be restructured through the  operator. Rule 6 distributes
to process the sub-model. In the presence of unproven PEPs,         multiplication through the concat of two bit vectors using
LEC can choose one as the next miter target, normally the           +. It uses the fact that the concatenation {a, b[n − 1 : 0]} is
smallest in terms of the number of nodes in its COI. The proof      equivalent to a ∗ 2n + b[n − 1 : 0].
progresses as constants and PEPs are identified and used to            The following is a more complex rule that distributes + over
simplify the miter model.                                           the extract operator. The right hand side is corrected with a
                                                                    third term, which is the carry bit from adding the lower n bits
                                                                    of a and b.
E. Abstraction
                                                                           (a + b)[m : n] =
  As illustrated in Figure 2 (c), in computing an abstraction,                                                                   (1)
LEC computes a cut in the purple region (common logic), and            a[m : n] + b[m : n] + (a[n − 1 : 0] + b[n − 1 : 0])[n]

                                                                                                                            11
               Before                     After
                                                                     implemented to trace back from the sel port of a mux node
 1              a∗b                        b∗a
 2     mux(cond, d0, d1) ∗ c    mux(cond, d0 ∗ c, d1 ∗ c)            through its Boolean fanins and choose the candidates that have
 3   mux(cond, d0, d1)[m : n] mux(cond, d0[m : n], d1[m : n])        the highest controllability.
 4         a[m : 0]  n            { (m-n)’b0, a[m:n] }                 Another advantage of case-splitting is that the co-factored
 5   (a[m : 0]  n)[m − n : 0]          a[m : n]                     sub-models contain new candidates for constants and PEPs,
 6      {a, b[n − 1 : 0]} ∗ c  a ∗ c  n + b[n − 1 : 0] ∗ c          which lead to other down-stream transformations not possible
                            TABLE II                                 before. Case-splitting also reduces the amount of Boolean
                        R EWRITING RULES
                                                                     logic in the SSA network and exposes the data-path logic to
                                                                     high-level learning such as polynomial construction.
Repeatedly applying the above rules, LEC transforms the SSA
network and keeps only the ∗ and + operators, enhancing              H. Polynomial construction
the possibility of multipliers to be matched. Note that the
above rule (1) and Rule 4-6 in Table II are correct for                 Reasoning at the word-level, rewriting rules are based on
unsigned operators. Currently, for signed operators, due to          the arithmetic properties of the corresponding operators such
sign extension and the two’s complement representation of the        as the commutative law of integer multiplication. However,
operands, we have not implemented a good set of rewriting            rewriting applies only local transformations and does not have
rules.                                                               a global view. In situations when the miter logic is constructed
   1) Conditional rewriting: The following equation                  from arithmetic optimization at the polynomial level, local
                                                                     rewriting is not able to bring similarity into the miter for
                   (a  c) ∗ b = (a ∗ b)  c                  (2)    further simplification. In such a situation, LEC tries to recon-
                                                                     struct the polynomial of the whole miter model to establish
reduces the bit-width of a multiplier on the left hand side to
                                                                     equivalence through arithmetic or algebraic equivalences and
a smaller one on the right. It is correct if a, b, c are integers
                                                                     then use high level transformations to prove the equivalence
but incorrect in Verilog semantics, which uses modulo integer
                                                                     of the original miter.
arithmetic. However, if the following is true within the miter
                                                                        As a generic procedure, LEC follows four steps to prove a
model in modulo integer semantics
                                                                     miter network F (x̄) = G(x̄) where F and G are the top level
                     ( (a  c)  c) == a                      (3)    signals being compared, and x̄ is the vector of input variables
                                                                     (bit-vectors):
then equation (2) is valid. In such a situation, LEC identifies
the pattern on the left hand side of (2) in the SSA network            1) Conjecture (possibly by design knowledge) about the
and executes a SAT query concerning (3) using the AIG model               algebraic domain of the polynomial, e.g. signed vs.
through bit-level solvers. The transformation to the left hand            unsigned integer, modulo integer arithmetic, the order
side of (2) is carried out only if the query is proven to be an           of the polynomial etc. These conjectures set up the
invariant. Such a transformation may produce an exact match               framework and semantics for polynomial reconstruction
of a∗b afterwards, which can be crucial for achieving the final           as illustrated in the case-study of Section V.
proof.                                                                 2) Determine a polynomial f and create a logic network
                                                                          F 0 such that the following can be proved formally.

G. Case-split                                                                                 F 0 implements f                    (5)
   Case-splitting on a binary signal, cofactors the original                                   miter F 0 = F                      (6)
model into two sub-models. The miter is proven if both sub-
models are proven, or falsified if any sub-model is falsified. Al-        How f is constructed is domain and test-case dependent.
though exponential in nature, if many signals are chosen, case-           In the case-study of Section V, we use simulation
splitting can simplify the underlying bit-level SAT solving               patterns to probe for the coefficients of a linear function.
significantly. For example, it is difficult to prove the following     3) Determine a polynomial g and create a logic network
miter structure directly through bit-blasting and SAT solving             G0 such that the following can be proved formally.
at the AIG level
                                                                                              G0 implements g                     (7)
       (x + y) ∗ (x + y) == x ∗ x + 2 ∗ x ∗ y + y ∗ y         (4)                              miter G0 = G                       (8)
where x is a 32-bit integer and y a single binary signal.              4) Establish the following equivalence formally at the al-
However, it can be proven easily if case-splitting is done on             gebraic level.
y = 0 and y = 1. After constant propagation, the bit-level
solver can prove both sub-models easily.                                                             f =g                         (9)
   The current case-splitting mechanism supports cofactoring
on an input bit or input bit-vector. In verifying the test cases     The combination of Items 2, 3, and 4 establishes the equiva-
experienced so far, the case splits are conducted on a bit, a        lence proof of the original miter model F = G. In constructing
bit-vector equal to zero or not, or on the lsb or msb of a           F 0 and G0 , we try to make them as structurally similar to F
bit-vector equals to zero or not. A heuristic procedure can be       and G as possible. Details are given in Section V.
                                                                                                                             12
                                                                                             Sub-model            Return
                    IV. S YSTEM I NTEGRATION
                                                                                              simulator            SAT
   The above learning techniques are integrated in LEC as a set                                  AIG           SAT/UNSAT
of logically independent procedures. Each procedure produces                                  simplified       SAT/UNSAT
one or more sub-models, illustrated as a tree in Figure 3.                                   abstraction         UNSAT
The root node is the current targeted Verilog miter model.                                      rewrite        SAT/UNSAT
                                                                                              case-split       SAT/UNSAT
It has eight children. The simulator and AIG models are
                                                                                         linear construction   SAT/UNSAT
the ones described in Figure 1. The simplif ied sub-model is                                     PEP            SIMPLIFY
generated by constant propagation and merging proven PEPs.
                                                                                                    TABLE III
The abstraction and rewrite sub-models are created by the                                    S UB MODEL RETURN VALUE
abstraction and rewrite procedures in the previous section.
The case-split sub-model consists of a set of sub-models,
corresponding to the cofactoring variables selected. In the              is disjunctive, which includes the root and all the leaf nodes
current implementation, the user needs to input the set of               in bold font. The case-split and linear construction nodes
signals to case-split on; eventually they will be selected by            are conjunctive; a P EP node is disjunctive. The semantics,
heuristics. The linear-construction node has two sub-models              shown in the following tables, are used to resolve the proof
which will be explained in detail in Section V. When PEPs                result of the parent model from its immediate sub-models. To
are identified through simulation, a P EP node is create with            complete the calculus, we introduced two values: CON and
the set of unproven-PEPs as sub-models.                                  BOT, where CON stands for an internal conflict indicating a
                                                                         potential LEC software bug and BOT is the bottom of the
 Verilog Miter Model        simulator
                                                                         value lattice and acts like an uninitialized value.
                           AIG
                           simplified                                          k      SAT      UNS     UNK      SMP    CON     BOT
                           abstraction                                        SAT     SAT      CON     SAT      SAT    CON     SAT
                           rewrite                                            UNS     CON      UNS     UNS      UNS    CON     UNS
                           case-split            case 0                       UNK     SAT      UNS     UNK      SMP    CON     UNK
                                                 case
                                                 ... 1                        SMP     SAT      UNS     SMP      SMP    CON     SMP
                                                 case n                       CON     CON      CON     CON      CON    CON     CON
                           linear-construction            case F              BOT     SAT      UNS     UNK      SMP    CON     BOT
                                                          case G
                           PEP          pep 0                                                        TABLE IV
                                        pep 1                                                D ISJUNCTIONS OF MODELS
                                        ...
                                        pep m
                                                                               &      SAT      UNS      UNK     SMP    CON     BOT
                                                                              SAT     SAT      SAT      SAT      n/a   CON     SAT
                                                                   LEC
                                                                              UNS     SAT      UNS      UNK      n/a   CON     UNS
Fig. 3.   Branching sub-model tree                                            UNK     SAT      UNK      UNK      n/a   CON     UNK
                                                                              SMP      n/a      n/a      n/a     n/a    n/a     n/a
   Two nodes in the sub-model tree are terminal. One is the                   CON     CON      CON      CON      n/a   CON     CON
simulator model which can falsify the miter through random                    BOT     SAT      UNS      UNK      n/a   CON     BOT
simulation. The other is the AIG model where ABC’s bit-level                                         TABLE V
dcec procedure is applied. The rest of the leaf models (in bold                               C ONJUNCTION OF MODELS
font) are generated as Verilog miter models, which have the
same format as the root node. LEC procedures can be applied                 Tables IV and V are the truth tables for the disjunction and
recursively to these leaf nodes to extend the sub-model trees to         conjunction semantics of the return values, in which UNS,
simpler ones. The LEC proof process progresses by expanding              UNK, SMP stand for UNSAT, UNKNOWN, and SIMPLIFY.
the sub-model tree. A sub-model is resolved as SAT or UNSAT              Assuming a bug free situation, at a disjunctive node, if either
from its sub-models’ proof results.                                      SAT or UNSAT is returned from a sub-model, this is the final
   Since there are no logical dependencies between sibling               proof result for the parent. In conjunction, the parent must wait
sub-models, any branch can be chosen to continue the proof               until all sub-models are resolved as UNSAT before deciding
process. Sibling sub-models can be forked in parallel from               that its result is UNSAT, while any SAT sub-model implies
a parent process. A node in the sub-model tree determines                the current model is SAT. A P EP node returns SIMPLIFY
its proof result from its children. Table III gives the possible         to its parent if one of its sub-models, say pepi , is proven
return values from the first level sub-models. SIMPLIFY is               UNSAT. In this case, the parent model can apply another round
returned by a P EP node to its parent model when at least                of simplification to obtain a new simplif ied sub-model by
one of its sub-models, pepi , is proven UNSAT, notifying the             merging the node pair in the just-proved pepi . The proof log in
parent node to simplify further with the newly proved pepi .             Figure VI is a sample sub-model tree where only the branches
   Depending on the logical relationships between a parent and           that contributed to the final proof are shown. Indentation
its immediate sub-models, a node is either disjunctive or con-           indicates the parent-child relationship. Recursively, the proof
junctive in semantics. In Figure 3, a Verilog miter model node           result of the top level target is evaluated as UNSAT.
                                                                                                                                 13
{
    "case split": {                                                 three numbers on the right are the node counts in the red,
       "case_0": "UNSAT by AIG"                                     blue and purple regions (common logic) of the SSA network
       "case_1": {                                                  as distinguished in Figure 2(a). Only those sub-models that
         "simplified": {                                            contributed to the final proof are shown in the figure. Others
            "abstraction": {                                        are ignored. As seen in Figure 5, the case-split procedure is
               "case split": {
                 "case_00": "UNSAT by AIG",                         1.original model                : 366 332 776
                 "case_01": "UNSAT by AIG",                         2. case-split
                 "case_10": "UNSAT by AIG",                         3.    case_0                     : 366 331 844
                 "case_11": "UNSAT by AIG"                          4.      AIG                      : UNSAT
               },                                                   6.    case_1                     : 366 332 776
            },                                                      7.      simplified              : 344 289 675
         },                                                         8.        abstraction           : 344 289 29
       },                                                           9.           case-split
    },                                                              10.            case_0            : 344 289 31
}                                                                   11.              AIG             : UNSAT
-------------------------------------                               12.            case_1            : 344 289 31
Miter proof result: [Resolved: UNSAT]                               13.              simplified     : 343 288 27
-------------------------------------                               14.                PEP
Fig. 4.   Illustration of proof log                                 15.                   pep_0      : 335 280 27
                                                                    16.                     linear construction
                                                                    17.                       case_F
   Using this sub-model tree infrastructure, any new proce-         18.                         AIG : UNSAT
dures discovered in the future can be plugged into the system       19.                       case_G
easily. Also, the system is fully parallelizable in that siblings   20.                         AIG : UNSAT
                                                                    21.                simplified : 10 10 305
can be executed at the same time. The proof process can be          22.                   AIG        : UNSAT
retrieved from the expanded sub-model tree.
                                                                    Fig. 5.   Sub-model proof tree
                             V. C ASE S TUDY
   The design in this case-study is an industrial example taken     applied twice, at lines 2 and 9. Both models have a single-
from the image processing domain. We verify specification           bit input port, which was selected for cofactoring. ABC[1]
= implementation where the “specification” is a manually-           immediately proved the first cofactored case, case 0 (3 and
specified high-level description of the design. “Implemen-          10) , using the AIG model at 4 and 11. The time-out for the
tation” is a machine-generated and highly optimized RTL             dcec run was set to two seconds. Abstraction was applied at 8,
implementation of the same design using[2]. The miter logic         significantly reducing the common logic from 675 to 29 SSA
is obtained through SLEC[3]. Therefore, the miter problem           nodes, and effectively removing all the comparator logic. We
is verifying that the high-level synthesis (HLS) tool did not       tried abstraction on the original model without the case-split
modify the design behavior.                                         procedure and it failed to produce any result. The case-split at
   This miter is sequential in nature, but here we examine a        2 removed enough Boolean logic and eliminated some corner
bounded model checking (BMC) problem which checks the               cases such that the abstraction procedure was able to produce
correctness of the implementation at cycle N. This renders the      an abstract model successfully.
problem combinational. This is industrially relevant because           Model 15 is the smallest unproved PEP from model 13.
the sequential problem is too hard to solve in general, and         It is proved using the linear construction procedure at 16,
even the BMC problem at cycle N becomes too difficult for           which we shall describe in detail in Section V-A. Model
industrial tools.                                                   21 is the simplified model of model 13 after merging the
   The original design (specification) consists of 150 lines        just-proved pep0 . After simplification, most of the logic in
of C++. It went through the Calypto frontend[3] and was             model 21 became common logic through structural hashing,
synthesized into a word-level netlist in Verilog. The generated     leaving only 10 nodes in each of the blue and red regions.
miter model has 1090 lines of structural Verilog code with 36       Model 21 was proved quickly by ABC which concludes the
input ports: 29 of which are 7 bits wide, 2 are 9 bits, 4 are       proof of the original miter. In this case, the linear-construction
28 bits and one is a single-bit wire. The miter is comparing        procedure was crucial in attaining the proof. However, the
two 28-bit values. We do not have knowledge about what the          case-split, simplification, abstraction, and PEP models also are
design does except through structural statistics: no multipliers,   very important because they collaborate in removing Boolean,
many adders, subtractors, comparators, shifters etc., together      mux and comparator logic etc, but keeping only the part of the
with Boolean logic. From a schematic produced from the              original miter logic which constitutes a linear function. Only
Verilog, there seems to be a sorting network implemented            at this point, can a proof by the linear construction procedure
using comparators, but we can not tell anything further.            succeed.
   Figure 5 illustrates the compositional proof produced by the
LEC system by showing the sub-model tree created during             A. Linear construction
the proof process. Indentations indicate parent and sub-model         For model 15 in Figure 5, the SSA network contains many
relations and are listed in the order they were created. The        +,−,  and  operators along with extract and concat oper-
                                                                                                                             14
ators, but contains no Boolean operators or muxes. The input       input variables to compute the coefficients:
ports consist of twenty-five 7-bit or 12-bit wide ports. The
                                                                                                      b = F (0, 0, ..., 0)
miter is comparing two 15-bit wide output ports. At this point,
simplification and abstraction can not simplify the model                                            a0 = F (1, 0, ..., 0) − b
further. Also, there are no good candidates for case-splitting.                                      a1 = F (0, 1, ..., 0) − b
The local rewriting rules can not be applied effectively without                                     ...
having some global information to help converge the two sides
                                                                                                   an−1 = F (0, 0, ..., 1) − b
of the miter logic. High-level information must be extracted
and applied to prove this miter model.                             Another round of random simulation on both the logic and
   After the linear construction procedure through LEC, the        the polynomial can be done to increase the likelihood of the
miter logic is found to be implementing the following linear       conjecture. The same is repeated for G(x̄) to obtain g(x̄).
sum in the signed integer domain using two’s complement               In integer arithmetic, f is equal to g if and only if the
representation:                                                    coefficients match exactly for each term:
                                                                                        f = g <=> ∀ i ai = a0i                     and           b = b0            (12)
     −16 ∗ x0 + 2 ∗ x1 + 2 ∗ x2 + 2 ∗ x3 + 2 ∗ x4 + 2 ∗ x5
                                                                   So checking of f = g is trivial in this case. In other algebraic
               +2 ∗ x6 + 2 ∗ x7 + 2 ∗ x7 + 2 ∗ x8 + 2 ∗ x9
                                                                   domains, domain specific reasoning may have to be applied
       +2 ∗ x10 + x11 + x12 + 2 ∗ x13 + 2 ∗ x14 + 2 ∗ x15          to derive algebraic equivalence e.g. in [28].
 +2 ∗ x16 + 2 ∗ x17 + 2 ∗ x18 + 2 ∗ x19 − 2 ∗ x20 + 2 ∗ x21            3) Synthesizing implementations F 0 /G0 for f (f =g), struc-
                         +2 ∗ x22 + 2 ∗ x23 + 2 ∗ x24 + 14         turally similar to F/G: We want to find a Verilog implemen-
                                                                   tation F 0 (x̄) of f such that
                                                                       1) F 0 implements f
                                                                       2) F 0 is structurally similar to F
One side of the miter implements the above sum as a plain
                                                                   To do this, all nodes in the SSA network with arithmetic
linear adder chain (Figure 6(a)), the other side is a highly op-
                                                                   operators +, − are marked, and edges connecting single bits
timized implementation using a balanced binary tree structure
                                                                   are removed. A reduced graph is then created from the marked
(Figure 6(b)) and optimization tricks, which we don’t fully un-
                                                                   nodes in the remaining graph maintaining the input/output
derstand. This is a hard problem for bit-level engines because
                                                                   relations between marked nodes. This graph is a skeleton of
                                                                   the implementation structure of F . For each of its nodes, we
                                                                   annotate it with a conjectured linear sum computed in the
                                                                   same way as in the above steps. The root node F is annotated
                                                                   with f and internal nodes annotated with linear sums fs ,
                                                                   ft , etc. For illustration purposes, Figure 7(a) shows such an
                                           ...
                                                                   annotated reduced graph for node w. For an arbitrary node w
                        ...
          (a)linear adder chain   (b) balanced adder tree                              u=...                                             u=...
                                                                                               +                                                   +
Fig. 6.   Addition implementation                                                                       v=...     w=c s⋅s+c t⋅t + f st ( ̄x )
                                                                         w=f w ( ̄x )                                                                      v=...
                                                                                        +           +                                      +           +
there are no internal match points to utilize. Therefore, LEC       s=f s ( ̄x )                                          s=...           t =...
resorts to trying a high-level method to establish equivalence                     +           + t=f ( ̄                             +             +
                                                                                                    t x)

at the polynomial level. The following are the detailed steps
                                                                                     ̄x                                                        ̄x
for this specific case.                                              (a)annotated reduced graph                                 (b) substituted annotation
   1) The conjecture: Assume the miter logic is F (x̄) = G(x̄)     Fig. 7.     Annotated reduced graph
as in Figure 2(a). LEC conjectures the following for the
arithmetic domain.                                                 in the reduced graph with inputs from nodes s and t, from the
   • Signed integer arithmetic. The numbers are in 2’s com-        annotation we have the following:
     plement representation.                                                                                    s = fs (x̄)
   • Assume F (x̄) and G(x̄) are implementing the linear sums
                                                                                                                t = ft (x̄)
     f and g of the forms
                                                                                                                w = fw (x̄)
                                       X
                              f (x̄) =   ai · xi + b        (10)   We would like to substitute fw with variable s and t, such
                                      X                            that w is a function of s and t in order to follow the structure
                              g(x̄) =   a0i · xi + b0       (11)   of the skeleton reduced graph. Because all the functions are
                                                                   linear sums, we can compute, using algebraic division, two
  2) Determining the coefficients of f , g and proving f = g       constants cs and ct such that the following holds:
algebraically: Given the data-path logic F (x̄) and the linear
sum formula (10), it takes n + 1 simulation patterns on the n                                      w = cs · s + ct · t + fst (x̄)

                                                                                                                                                              15
cs is the quotient of fw /fs and ct = (fw − cs · fs )/ft , while    c[m : 0] ={a[n : 0]}[m : 0];
fst is the remainder of the previous division. The substitution               assert a[n : 0] == {(n − m) ∗ {c[m − 1]}, c[m : 0]}
is conducted in one topological traversal from the inputs to the
miter output. After substitution, the annotated reduced graph       The first two sets of assertions ensure there is no overflow of
is essentially a multi-level implementation of the above linear     signed integer add and no non-zero remainder of division.
sum. Because the linear sum annotated at each node in the           The third one ensures that extraction does not change the
reduced graph is only a conjecture, it may not be exactly the       value in two’s complement representation. The SVA checkers
same function as in the original miter logic. However, in re-       are formally verified separately using the VeriABC[23] flow,
implementing f using this structure, certain similarities are       which in turn uses ABCs model checker.
still captured through the construction process.                       From the above procedures, we established the following:
   This multi-level circuit is implemented by traversing the
                                                                                                     f (x̄) = g(x̄)                    (13)
substituted reduced graph, and creating a corresponding Ver-
                                                                                          0
ilog file for F 0 . It is generated by allocating a bit-vector at                       F (x̄) implements f (x̄)                       (14)
each internal node with its bit-width equivalent to output port                         G0 (x̄) implements g(x̄)                       (15)
of F . The same can be done for G to obtain G0 . The reason                                                F = F0                      (16)
why LEC goes through so much trouble to obtain separate
                                                                                                           G = G0                      (17)
RTL implementations for F 0 and G0 (even though they are
both implementing f ), is that we need to prove F = F 0 and         Altogether they establish the proof for F = G.
G = G0 separately next. Without the structural similarities            Combining the above procedures into a single run, LEC took
created through this procedure, proving equivalence with an         about 10 minutes on an i7 processor to complete the full proof.
arbitrary F 0 and G0 would be as difficult as proving the           Roughly 80% of the time is spent on compiling and running
original F = G. Generally, only with these extra similarities       random simulation, the rest are used for SAT solving. In table
injected, can the miter model be simplified enough to allow         VI, we also compare this run-time against Boolector[11], z3
SAT sweeping to succeed for F = F 0 and G = G0 .                    [16] and ABC’ iprove [24] solver, all run on the same server.
   4) Proving F = F 0 and G = G0 : We construct two miter           It is clear that LEC expedites the proof significantly by using
models F = F 0 and G = G0 as caseF and caseG in Figure              the knowledge of the linear sum formulation inside the miter
3, and apply LEC separately to each. By construction, each          logic.
miter should be simpler than the original F = G because of
the increased structural similarity between the two sides of the             Miter      Boolector      Z3        iprove    LEC
miter. Another round of LEC might reduce this miter logic, if               model 1     time-out    time-out   time-out   10min
not prove it directly through bit-level solvers. In the present                               TABLE VI
case, F = F 0 was proven instantly because F is a simple                 C OMPARISON WITH OTHER SOLVES ( TIME - OUT IN 24 HOURS )
linear adder chain, and so is F 0 . Proving G = G0 takes more
time using ABCs dcec because G is a highly optimized imple-            In summary, polynomial reconstruction was a key tech-
mentation of f and the multi-level implementation from the          nique to prove the underlying miter problem. The case-
annotated reduced graph only captures part of the similarity.       study illustrates the major steps and the proof obligations
But, the injected similarity was sufficient enough to reduce the    encountered during the process. The actual techniques used
complexity to be within dcec’s capacity.                            for different domains of the underlying arithmetic would
   5) Proving that F 0 /G0 implements f /g : To complete the        differ. Each algebraic domain would require special purpose
proof, we still have the proof obligation that F 0 and G0           heuristics and automated proof procedures to guarantee the
implement f and g respectively. By construction from the            correctness of the reconstructions and transformations used
reduced graph, the generated Verilog is a verbatim translation      in the method. The goal of the linear construction procedure
from the multi-level form of f . However, we need to bridge         was to inject increased structural similarity by using global
the gap between Verilog’s bit-vector arithmetic vs. the integer     algebraic transformations.
arithmetic of the linear sum. To do so, we created SVA
assertions to check that every Verilog statement captures the
integer value in full without losing precision due to underflows                      VI. E XPERIMENTAL RESULTS
or overflows.                                                          Table VI shows the experimental results comparing
                                                                    Boolector[11], Z3[16] and iprove[24] using a 24-hour time-
     c[n : 0] =a[n − 1 : 0] + b[n − 1 : 0];
                                                                    out limit on an 2.6Ghz Intel Xeon processor. These models
               assert (a[n − 1] & b[n − 1] =⇒ c[n])                 are generated directly using SLEC[3] for checking C-to-RTL
               assert (!a[n − 1] & !b[n − 1] =⇒ !c[n])              equivalence or extracted as a sub-target from PEPs. The first
                                                                    column is the miter design name. The second column is the
  c[n : 0] =a[n : 0]/b[m : 0];                                      number of lines of Verilog for the miter model specification.
            assert (a[n : 0] == (c[n : 0] ∗ b[m : 0])[n : 0])       Run-time or time-out results are reported for each solver in
                                                                    columns 3 to 6. Although the miter models are not big in terms
                                                                    of lines of Verilog, they are quite challenging for Boolector,
                                                                    Z3 and iprove. The run-time of LEC is the total CPU time
                                                                                                                                  16
including Verilog compilation. It was expected that iprove            it in 9 hours while both iprove and Boolector time out.
would not prove any of them because it works on the bit-              This shows that LEC’s transformations through collaborating
blasted model without any high-level information that the other       procedures successfully reduce the surrounding logic in the
solvers have.                                                         original model, which was preventing Z3 to prove it in 24
                                                                      hours.
    Design      Lines   Boolector      z3        iprove    LEC
   mul 64 64     125     20 sec     200 sec     timeout   10 sec         The above experiments demonstrate the effectiveness of
      d1          24    time-out    time-out   time-out   15 sec      LEC’s collaborating procedures of simplification, rewriting,
      d2         507    time-out    time-out   time-out   2 min       case-splitting and abstraction computations. The LEC architec-
      d3         191    time-out    time-out   time-out   15 min
      d4         473    time-out    time-out   time-out   60 sec
                                                                      ture allows these procedures to be applied recursively through
   d5 pep 0      674    time-out     9 hour    time-out   4 min       a sub-model tree: the model obtained by one procedure
                         TABLE VII
                                                                      introduces new opportunities for applying other procedures
          B ENCHMARK COMPARISON (T IMEOUT 24 HOURS )                  in the next iteration. As exemplified in miter d4, the initial
                                                                      case-split gives rise to new opportunities for simplification
   The miter, mul 64 64, is comparing a 64x64 multiplier              as new constants are introduced by cofactoring. Then a new
with an implementation using four 32x32 multipliers as the            round of abstraction is able to remove enough common logic
following:                                                            and expose three one-bit inputs as case-split candidates in
     {aH , aL } ∗ {bH , bL } = (aH ∗ bH ) << 64 +                     the reduced model, which in turn gives rise to another case-
                                                                      split transformation that leads to the final proof. None of
               (aH ∗ bL + aL ∗ bH ) << 32 + aL ∗ bL                   this is possible without the transformations being applied in
where aH , aL , bH , bL are the 32-bit slices of the original         sequence.
64-bit bit-vectors. Both Boolector and Z3 are able to prove it.
LEC proves it by first utilizing rewriting rules to transform                  VII. C OMPARISON WITH RELATED WORK
the 64x64 multiplier into four 32x32 multipliers, matching
the other four in the RHS of the miter. As they are matched              In bit-level equivalence-checking procedures [24][25], sim-
exactly, they become common logic in the miter model. LEC             ulation, SAT-sweeping, AIG rewriting and internal equiva-
then produces an abstraction and obtains a reduced model              lence identification are all relevant to data-path equivalence-
with all the multipliers removed: the outputs of the multipliers      checking. In LEC, these types of procedures are conducted
become free inputs in the abstract model. The abstract model          at the word-level. Word-level rewriting is difficult if only a
is then proven instantly by ABC’s dcec on the AIG model.              bit-level model is available. For example, with no knowledge
   The miter d1, extracted from a PEP sub-model, is a demon-          of the boundary of a multiplier, normalizing its operands is
stration of rewrite rule 6 in Table II using 32-bit multiplication.   impractical at the bit-level . Although abstraction and case-
As both Boolector and Z3 fail to prove equivalence within             split techniques in LEC can be applied at the bit-level in
the time-limit, they likely do not have this rewriting rule           theory, these are not used due to the difficulty of comput-
implemented.                                                          ing an abstraction boundary or of finding good cofactoring
   To prove d2, LEC conducts conditional rewriting using rule         candidates.
(2) by first statically proving an invariant in the form of (3).         SMT solving is relevant because a data-path is a subset
After the transformation, the multipliers are matched exactly         of QF BV theory. Methods such as [7][11][16][14][17][19],
on both sides of the miter and removed in the subsequent              are state-of-art QF BV solvers. These employ different imple-
abstract model. The final miter model is proved instantly by          mentations of word-level techniques in rewriting, abstraction,
ABC on the bit level AIG.                                             case-splitting, and simplification, and interleave Boolean and
   The miter model d3 has part of its logic similar to                word-level reasoning via a generalized DPLL framework or
mul 64 64 embedded inside. LEC proves d3 by first applying            through abstraction refinements of various forms. Hector[20] is
rewriting rules repeatedly until no more rewriting is possible.       closest to LEC in terms of technology and targeted application
Then, LEC computes a reduced model through abstraction. In            domains, and has a rich set of word-level rewriting rules along
the reduced model, LEC conducts a case-split on a one-bit             with some theorem prover [7] procedures to validate every
input. The case-0 AIG model is proven instantly, while case-1         rewriting applied. Hector also has an orchestration of a set of
is proven in about 10 minutes by ABC.                                 bit-level solvers using SAT and BDD engines to employ once
   The miter d4 is proven by first conducting a case-split of two     the bit-level miter model is constructed. Strategically, LEC
bit-vector inputs: cofactoring on whether the bit-vector equals       relies less on the capacity of SAT solver; instead it builds
zero or not. Three of the four cofactored cases are proven            a compositional proof infrastructure and employs iterative
instantly. The one unresolved goes through a round of simpli-         transformations to finally obtain a proof through sub-model
fication and abstraction. On the then obtained sub-model, three       trees. The goal of these LEC learning procedures is to reverse
one-bit inputs are identified and cofactored through case-split       engineer the embedded high-level algebraic transformations
procedures. LEC prove all eight cases quickly within a few            and bring more similarity between both sides of the miter
seconds.                                                              model.
   Miter d5 is extracted from model 15 in Figure 5 which                 The techniques in [26] [31][33] also try to reconstruct an
contains the purely linear sum miter described in the case-           algebraic model from the underlying logic, but they employ a
study section. For this simpler miter, Z3 is able to prove            bottom up approach and their primitive element is a half-adder.
                                                                                                                            17
The method in [8] simplifies the algebraic construction by                       [11] R. Brummayer and A. Biere. Boolector: An efficient smt solver for bit-
solving an integer linear programming problem. The limitation                         vectors and arrays. In Tools and Algorithms for the Construction and
                                                                                      Analysis of Systems, pages 174–177. Springer, 2009.
of these approaches is that they rely on the structural pattern                  [12] M. L. Case, A. Mishchenko, and R. K. Brayton. Automated extraction
of the underlying logic to reconstruct the algebraic model.                           of inductive invariants to aid model checking. In Formal Methods
On the other hand, the linear construction case-study in                              in Computer Aided Design, 2007. FMCAD’07, pages 165–172. IEEE,
                                                                                      2007.
Section V-A constructs the polynomial through probing with                       [13] P. Chauhan, D. Goyal, G. Hasteer, A. Mathur, and N. Sharma. Non-
simulation patterns. This is more general as it uses only the                         cycle-accurate sequential equivalence checking. In Proceedings of the
functional information of the data-path logic. For different                          46th Annual Design Automation Conference, pages 460–465. ACM,
                                                                                      2009.
domains, other techniques may well be more applicable such                       [14] A. Cimatti, A. Griggio, B. J. Schaafsma, and R. Sebastiani. The mathsat5
as the bottom-up approach. The use of vanishing polynomials                           smt solver. In Tools and Algorithms for the Construction and Analysis
and Grobner bases in [27][28] to prove equivalence between                            of Systems, pages 93–107. Springer, 2013.
                                                                                 [15] J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang.
polynomials in the modulo integer domain can be utilized once                         High-level synthesis for fpgas: From prototyping to deployment.
a polynomial form is reconstructed in LEC. In many data-path                          Computer-Aided Design of Integrated Circuits and Systems, IEEE Trans-
miter models, such a polynomial in a certain domain or theory                         actions on, 30(4):473–491, 2011.
                                                                                 [16] L. De Moura and N. Bjørner. Z3: An efficient smt solver. In Tools
is likely embedded in other control and data-path logic. Direct                       and Algorithms for the Construction and Analysis of Systems, pages
application of algebraic techniques is often not practical. Thus                      337–340. Springer, 2008.
the collaborating procedures in LEC are designed to bridge                       [17] B. Dutertre and L. De Moura. The yices smt solver. Tool paper at
                                                                                      http://yices. csl. sri. com/tool-paper. pdf, 2:2, 2006.
this gap and isolate such polynomials so that these high level                   [18] M. Fujita. Equivalence checking between behavioral and rtl descriptions
theories can then be applied.                                                         with virtual controllers and datapaths. ACM Transactions on Design
   In conducting consistency checking between C and Verilog                           Automation of Electronic Systems (TODAES), 10(4):610–626, 2005.
                                                                                 [19] S. Jha, R. Limaye, and S. A. Seshia. Beaver: Engineering an efficient
RTL, the work [21] focuses on how to process a C program                              smt solver for bit-vector arithmetic. In Computer Aided Verification,
to generate formal models. The tool relies on SMT solvers                             pages 668–674. Springer, 2009.
[11][16][14] as the back-end solving engines.                                    [20] A. Koelbl, R. Jacoby, H. Jain, and C. Pixley. Solver technology for
                                                                                      system-level to rtl equivalence checking. In Design, Automation & Test
   In terms of tool architecture, [9] [10] [22], all employ                           in Europe Conference & Exhibition, 2009. DATE’09., pages 196–201.
a sophisticated set of transformations to simplify the target                         IEEE, 2009.
model during verification. These are done at the bit-level. The                  [21] D. Kroening, E. Clarke, and K. Yorav. Behavioral consistency of C and
                                                                                      Verilog programs using bounded model checking. In Proceedings of
LEC infrastructure allows future extension to take advantage                          DAC 2003, pages 368–371. ACM Press, 2003.
of multi-core parallelization as demonstrated in [30]. [12]                      [22] A. Kuehlmann and J. Baumgartner. Transformation-based verification
[32], use a dedicated data-structures to represent the proof-                         using generalized retiming. In Computer Aided Verification, pages 104–
                                                                                      117. Springer, 2001.
obligations, while LEC relies on the sub-model tree to track                     [23] J. Long, S. Ray, B. Sterin, A. Mishchenko, and R. Brayton. Enhanc-
the compositional proof strategy used at each node.                                   ing abc for ltl stabilization verification of systemverilog/vhdl models.
                                                                                      Ganesh Gopalakrishnan University of Utah USA, page 38, 2011.
                                                                                 [24] A. Mishchenko, S. Chatterjee, R. Brayton, and N. Een. Improvements to
                         VIII. C ONCLUSION                                            combinational equivalence checking. In Computer-Aided Design, 2006.
   In LEC, we build a system of collaborating procedures                              ICCAD ’06. IEEE/ACM International Conference on, pages 836–843,
                                                                                      2006.
for data-path equivalence-checking problems found from an                        [25] V. Paruthi and A. Kuehlmann. Equivalence checking combining a
industrial setting. The strategy is to utilize Boolean level                          structural sat-solver, bdds, and simulation. In Computer Design, 2000.
solvers, conduct the transformations at the word-level and                            Proceedings. 2000 International Conference on, pages 459–464, 2000.
                                                                                 [26] E. Pavlenko, M. Wedler, D. Stoffel, W. Kunz, A. Dreyer, F. Seelisch,
to synthesize internal similarities by lifting the reasoning to                       and G. Greuel. Stable: A new qf-bv smt solver for hard verification
the algebraic level . Using a real industrial case-study, we                          problems combining boolean reasoning with computer algebra. In
demonstrated the applicability of the sub-tree infrastructure for                     Design, Automation & Test in Europe Conference & Exhibition (DATE),
                                                                                      2011, pages 1–6. IEEE, 2011.
integrating a compositional proof methodology using LEC.                         [27] N. Shekhar, P. Kalla, and F. Enescu. Equivalence verification of
                                                                                      polynomial datapaths using ideal membership testing. Computer-Aided
                              R EFERENCES                                             Design of Integrated Circuits and Systems, IEEE Transactions on,
 [1] ABC - a system for sequential synthesis and verification. Berkeley               26(7):1320–1330, 2007.
     Verification and Synthesis Research Center, http://www.bvsrc.org.           [28] N. Shekhar, P. Kalla, F. Enescu, and S. Gopalakrishnan. Equivalence
 [2] Calypto R Catapult Design Product. http://www.calypto.com.                       verification of polynomial datapaths with fixed-size bit-vectors using
 [3] Calypto R SLEC. http://www.calypto.com.                                          finite ring algebra. In Computer-Aided Design, 2005. ICCAD-2005.
 [4] Forte design systems. http://www.forteds.com.                                    IEEE/ACM International Conference on, pages 291–296, 2005.
 [5] smtlib. http://www.smt-lib.org.                                             [29] W. Snyder, P. Wasson, and D. Galbi. Verilator: Convert verilog code to
 [6] Verific Design Automation: http://www.verific.com.                               c++/systemc, 2012.
 [7] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanović, T. King,   [30] B. Sterin, N. Een, A. Mishchenko, and R. Brayton. The benefit of
     A. Reynolds, and C. Tinelli. Cvc4. In Computer Aided Verification,               concurrency in model checking. IWLS, 2011.
     pages 171–177. Springer, 2011.                                              [31] D. Stoffel and W. Kunz. Equivalence checking of arithmetic circuits on
 [8] M. A. Basith, T. Ahmad, A. Rossi, and M. Ciesielski. Algebraic                   the arithmetic bit level. Computer-Aided Design of Integrated Circuits
     approach to arithmetic design verification. In Formal Methods in                 and Systems, IEEE Transactions on, 23(5):586–597, 2004.
     Computer-Aided Design (FMCAD), 2011, pages 67–71. IEEE, 2011.               [32] D. Wang and J. Levitt. Automatic assume guarantee analysis for
 [9] J. Baumgartner, H. Mony, V. Paruthi, R. Kanzelman, and G. Janssen.               assertion-based formal verification. In Proceedings of the 2005 Asia
     Scalable sequential equivalence checking across arbitrary design trans-          and South Pacific Design Automation Conference, pages 561–566. ACM,
     formations. In Computer Design, 2006. ICCD 2006. International                   2005.
     Conference on, pages 259–266. IEEE, 2007.                                   [33] M. Wedler, D. Stoffel, R. Brinkmann, and W. Kunz. A normalization
[10] R. Brayton and A. Mishchenko. Abc: An academic industrial-strength               method for arithmetic data-path verification. Computer-Aided Design of
     verification tool. In Computer Aided Verification, pages 24–40. Springer,        Integrated Circuits and Systems, IEEE Transactions on, 26(11):1909–
     2010.                                                                            1922, 2007.

                                                                                                                                                    18
   Trading-off incrementality and dynamic restart of
                multiple solvers in IC3
                                         G. Cabodi (*), A. Mishchenko (**), M. Palena (*)
                                                (*) Dip. di Automatica ed Informatica
                                                 Politecnico di Torino - Torino, Italy
                                  (**) Dept. of EECS, University of California, Berkeley, CA, USA


   Abstract—This paper1 addresses the problem of SAT solver                 process by constantly refining a set of over-approximations
performance in IC3, one of the major recent breakthroughs                   to forward reachable states with new inductive clauses; at the
in Model Checking algorithms. Unlike other Bounded and                      bottom level, a SAT solving framework is exploited by the
Unbounded Model Checking algorithms, IC3 is characterized
by numerous SAT solver queries on small sets of problem                     top-level algorithm to respond to queries about the system.
clauses. Besides algorithmic issues, the above scenario poses               As shown in [7], these two layers can be separated by means
serious performance challenges for SAT solver configuration                 of a clean interface.
and tuning. As well known in other application fields, finding                 Performance of IC3 turns out to be both highly sensitive
a good compromise between learning and overhead is key to
performance. We address solver cleanup and restart heuristics,
                                                                            to the various internal behaviours of SAT solvers, and strictly
as well as clause database minimality, based on on-demand clause            dependent on the way the top-level algorithm is integrated with
loading: transition relation clauses are loaded in solver based             the underlying SAT solving framework.
on structural dependency and phase analysis. We also compare                   The peculiar characteristics exposed by the SAT queries of
different solutions for multiple specialized solvers, and we provide
                                                                            IC3 can thus be exploited to improve the overall performance
an experimental evaluation on benchmarks from the HWMCC
suite. Though not finding a clear winner, the work outlines several         of the algorithm in two different manners:
potential improvements for a portfolio-based verification tool                1) Tuning the internal behaviours of the particular SAT
with multiple engines and tunings.
                                                                                 solver employed to better fit IC3 needs.
                         I. I NTRODUCTION                                     2) Defining better strategies to manage the SAT solving
                                                                                 work required by IC3.
   IC3 [1] is a SAT-based invariant verification algorithm
for bit-level Unbounded Model Checking (UMC). Since its                        In this paper we address this second issue, proposing and
introduction, IC3 has immediately generated strong interest,                comparing different implementation strategies for handling
and is now considered one of the major recent breakthroughs                 SAT queries in IC3. The aim of this paper is to identify the
in Model Checking. IC3 proved to be impressively effective on               most efficient way to manage SAT solving in IC3. To achieve
solving industrial verification problems. Our experience with               this goal we experimentally compare a number of different
the algorithm shows that IC3 is the single invariant verification           implementation strategies over a selected set of benchmarks
algorithm capable of solving the largest number of instances                from the recent HWMCC.
among the benchmarks of the last editions of the Hardware                      The experimental work has been done by two different
Model Checking Competition (HWMCC).                                         research groups, on two different state-of-the-art verification
                                                                            tools, ABC [8] and PdTRAV [9], that share similar architec-
A. Motivations                                                              tural and algorithmnic choices in their implementation of IC3.
   IC3 heavily relies on SAT solvers to drive several parts of                 The focus of this paper is neither on the IC3 algorithm
the verification algorithm: a typical run of IC3 is characterized           itself nor on the internal details of the SAT solving procedures
by a huge amount of SAT queries. As stated by Bradley in [2],               employed, but rather on the implementation details of the
the queries posed by IC3 to SAT solvers differ significantly                integration between IC3 and the underlying SAT solving
in character from those posed by other SAT-based invariant                  framework.
verification algorithms (such as Bounded Model Checking [3],
k-induction [4] [5] or interpolation [6]). Most notably, SAT
                                                                            B. Contributions
queries posed by IC3 don’t involve the unrolling of the transi-
tion relation for more than one step and are thus characterized               The main contributions of this paper are:
by small-sized formulas.
                                                                              • A characterization of SAT queries posed by IC3.
   IC3 can be thought as composed of two different layers:
                                                                              • Novel approaches to solver allocation, loading and clean
at the top level, the algorithm itself drives the verification
                                                                                up in IC3.
  1 This work was supported in part by SRC Contracts No. 2012-TJ-2328 and     • An experimental evaluation of performance using two
No. 2265.001                                                                    verification tools.


                                                                                                                                   19
C. Outline                                                                   P holds for every reachable state in S: ∀s ∈ R(S) : s |= P .
   First in Section II we introduce the notation used and give               An algorithm used to solve the invariant verification problem
some background on IC3. Then, in Section III we present a                    is called an invariant verificatio algorithm.
systematic characterization of the SAT solving work required                 Definition 8. Given a transition system S = hM, I, T i, a
by IC3. Section IV introduces the problem of handling SAT                    boolean predicate F over M is called an inductive invariant
queries posed by IC3 efficiently. Both commonly used and                     for S if the following two conditions hold:
novel approaches to the allocation, loading and cleaning up of                  • Base case: I → F
SAT solvers in IC3 are discussed in Sections V, VI and VII                      • Inductive case: F ∧ T → F
                                                                                                              ′
respectively. Experimental data comparing these approaches
                                                                             A boolean predicate F over M is called an inductive invariant
are presented in Section VIII. Finally, in Section IX we draw
                                                                             for S relative to another boolean predicate G if the following
some conclusions and give summarizing remarks.
                                                                             two conditions hold:
                         II. BACKGROUND                                         • Base case: I → F
                                                                                                                           ′
                                                                                • Relative inductive case: G ∧ F ∧ T → F
A. Notation
                                                                             Lemma 1. Given a transition system S = hM, I, T i, an
Definition 1. A transition system is the triple S = hM, I, T i
                                                                             inductive invariant F for S is an over-approximation to the
where M is a set of boolean variables called state variables of
                                                                             set of reachable states R(S).
the system, I(M ) is a boolean predicate over M representing
the set of initial states of the system and T (M, M ′ ) is a                 Definition 9. Given a transition system S = hM, I, T i and a
predicate over M, M ′ that represents the transition relation                boolean predicate P over M , an inductive strengthening of P
of the system.                                                               for S is an inductive invariant F for S such that F is stronger
                                                                             than P .
Definition 2. A state of the system is represented by a
complete assignment s to the state variables M . A set of                    Lemma 2. Given a transition system S = hM, I, T i and a
states of the system is represented by a boolean predicate                   boolean predicate P over M , if an inductive strengthening of
over M . Given a boolean predicate F over M , a complete                     P can be found, then the property P holds for every reachable
assignement s such that s satisfies F (i.e. s |= F ) represents              state of S. The invariant verification problem can be seen as
a state contained in F and is called an F -state. Primed                     the problem to find an inductive strengthening of P for S.
state variables M ′ are used to represent future states and,
                                                                             B. IC3
accordingly, a boolean predicate over M ′ represent a set of
future states.                                                                  Given a transition system S = hM, I, T i and a safety prop-
                                                                             erty P over M, IC3 aims to find an inductive strengthening of
Definition 3. A boolean predicate F is said to be stronger than              P for S. For this purpose, it maintains a sequence of formulas
another boolean predicate G if F → G, i.e. every F -state is                 Fk = F0 , F1 , . . . Fk such that, for every 0 ≤ i < k, Fi is an
also a G-state.                                                              over-approximation of the set of states reachable in at most
Definition 4. A literal is a boolean variable or the negation of             i steps in S. Each of these over-approximations is called a
a boolean variable. A disjunction of literals is called a clause             time frame and is represented by a set of clauses, denoted by
while a conjunction of literals is called a cube. A formula                  clauses(Fi ). The sequence of time frames Fk is called trace
is said to be in Conjunctive Normal Form (CNF) if it is a                    and is maintained by IC3 in such a way that the following
conjunction of clauses.                                                      conditions hold throughout the algorithm:
                                                                               (1) F0 = I
Definition 5. Given a transition system S = hM, I, T i, if an                  (2) Fi → Fi+1 , for all 0 ≤ i < k
assignment s, t′ satisfies the transition relation T (i.e. if s, t′ |=         (3) Fi ∧ T → Fi+1   ′
                                                                                                      , for all 0 ≤ i < k
T ) then s is said to be a predecessor of t and t is said to be                (4) Fi → P , for all 0 ≤ i < k
a successor of s. A sequence of states s0 , s1 , . . . , sn is said
                                                                                Condition (1) states that the first time frame of the trace is
to be a path in S if every couple of adjacent states si , si+1 ,
                                                                             special and is simply assigned to the set of initial states of
i ≤ 0 < n satisfies the transition relation (i.e. if si , s′i+1 |= T ).
                                                                             S. The remaining conditions, claim that for every time frame
Definition 6. Given a transition system S = hM, I, T i, a state              Fi but the last one: (2) every Fi -state is also a Fi+1 -state, (3)
s is said to be reachable in S if there exists a path s0 , s1 , . . . , s,   every successor of an Fi -state is an Fi+1 -state and (4) every
such that s0 is an initial state (i.e. s0 |= I). We denote with              Fi -state is safe. Condition (2) is maintained syntactically,
Rn (S) the set of states that are reachable in S in at most n                enforcing the condition (2’) clauses(Fi+1 ) ⊆ clauses(Fi ).
steps. We denote with R(S) the overall S set of states that are              Lemma 3. Let S = hM, I, T i be a transition system,
reachable in S. Note that R(S) = i≥0 Ri (S).
                                                                             Fk = F0 , F1 , . . . Fk a sequence of boolean formulas over
Definition 7. Given a transition system S = hM, I, T i and a                 M and let conditions (1-3) hold for Fk . Then each Fi ,
boolean predicate P over M (called A safety property), the                   with 0 ≤ i < k, is an over-approximation to the set of states
invariant verification problem is the problem of determining if              reachable within i steps in S.


                                                                                                                                       20
Lemma 4. Let S = hM, I, T i be a transition system, P a              to a bad cube s. This is done by means of the Extend(t)
safety property over M, Fk = F0 , F1 , . . . Fk a sequence of        procedure (line 5), not reported here, that employs ternary
boolean formulas over M and let conditions (1-4) hold for            simulation to remove some literals from t. The resulting cube
Fk . Then P is satisfied up to k − 1 steps in S (i.e. there          s includes t and violates the property P , it is thus a bad
doesn’t exist any counter-example to P of length less or equal       cube. The algorithm then tries to block the whole bad cube
than k − 1).                                                         s rather than t. It is showed in [7] that extending bad states
                                                                     into bad cubes before blocking them dramatically improves
   The main procedure of IC3 is described in Algorithm 1 and
                                                                     IC3 performance.
is composed of two nested iterations. Major iterations (lines 3-
                                                                        Once a bad cube s is found, it is blocked in Fk calling
16) try to prove that P is satisfied up to k steps in S, for
                                                                     the BlockCube(s, Q, Fk ) procedure (line 6). This procedure
increasing values of k. To prove so, in minor iterations (lines 4-
                                                                     is described in Algorithm 2.
9), IC3 refines the trace Fk computed so far, by adding new
relative inductive clauses to some of its time frames. The
algorithm iterates until either (i) an inductive strengthening        Input: s: bad cube in Fk ; Q: priority queue; Fk : trace
of the property is produced (line 4), or (ii) a counter-example       Output: SUCCESS or FAIL(σ), with σ counter-example
to the property is found (line 7).                                     1: add a proof-obligation (s, k) to the queue Q
                                                                       2: while Q is not empty do
 Input: S = hM, I, T i ; P (M)                                         3:   extract (s, j) with minimal j from Q
 Output: SUCCESS or FAIL(σ), with σ counter-example                    4:   if j > k or t 6|= Fj then continue;
  1: k ← 0                                                             5:   if j = 0 then return FAIL(σ)
  2: Fk ← I                                                            6:   if ∃t, v ′ : t, v ′ |= Fj−1 ∧ T ∧ ¬s ∧ s′ then
  3: repeat                                                            7:      p ← Extend(t)
  4:   while ∃t : t |= Fk ∧ ¬P do                                      8:      add (p, j − 1) and (s, j) to Q
  5:      s ← Extend(t)                                                9:   else
  6:      if BlockCube(s, Q, Fk ) = FAIL(σ) then                      10:      c ← Generalize(j, s, Fk )
  7:         return FAIL(σ)                                           11:      Fi ← Fi ∪ c for 0 < i ≤ j
  8:      end if                                                      12:      add (j + 1, c) to Q
  9:   end while                                                      13:   end if
 10:   Fk+1 ← ∅                                                       14: end while
 11:   k ←k+1                                                         15: return SUCCESS
 12:   Fk ← Propagate(Fk )                                                       Algorithm 2. BlockCube(s, Q, Fk )
 13:   if Fi = Fi+1 for some 0 ≤ i < k then
 14:      return SUCCESS
                                                                        Otherwise, if Q1 is UNSAT, every bad state of Fk has been
 15:   end if
                                                                     blocked so far, conditions (1-4) hold for k + 1 and IC3 can
 16: until forever
                                                                     safely move to the next major iteration, trying to prove that
                   Algorithm 1. IC3(S, P )                           P is satisfied up to k + 1 steps. Before moving to the next
                                                                     iteration, a new empty time frame Fk+1 is created (line 10).
  At major iteration k, the algorithm has computed a trace           Initially, clauses(Fk+1 ) = ∅ and such time frame represent
Fk such that conditions (1-4) hold. From Lemma 4, it follows         the entire state space, i.e. Fk+1 = Space(S). Note that
that P is satisfied up to k − 1 steps in S. IC3 then tries to        Space(S) is a valid over-approximation to the set of states
prove that P is satisfied up to k steps as well. This is done by     reachable within k + 1 steps in S. Then a phase called prop-
enumerating Fk -states that violate P and then trying to block       agation takes place (line 12). The procedure Propagate(Fk)
them in Fk .                                                         (Algorithm 4), which is discussed later, handles that phase.
                                                                     During propagation, IC3 tries to refine every time frame Fi ,
Definition 10. Blocking a state (or, more generally, a cube)
                                                                     with i < 0 ≤ k, by checking if some clauses of one time
s in a time frame Fk means proving s unreachable within k
                                                                     frame can be pushed forward to the following time frame.
steps in S, and consequently refine Fk to exclude it.
                                                                     Possibly, propagation refines the outmost timeframe Fk so that
   To enumerate each state of Fk that violates P (line 4), the       Fk ⊂ Space(S). Propagation can lead to two adjacent time
algorithm poses SAT queries to the underlying SAT solving            frames becoming equivalent. If that happens, the algorithm
framework in the following form:                                     has found an inductive strengthening of P S (equal to those
                                                                     time frames), so the property P holds for for every reachable
                       SAT ?(Fk ∧ ¬P )                      (Q1 )
                                                                     state of S and IC3 return success (line 13).
If Q1 is SAT, a bad state t in Fk can be extracted from the             The procedure BlockCube(s, Q, Fk ) (Algorithm 2) is re-
satisfying assignment. That state must be blocked in Fk . To         sponsible for refining the trace Fk in order to block a bad cube
increase performance of the algorithm, as suggested in [7],          s found in Fk . To preserve condition (3), prior to blocking a
the bad state t generated this way is first (possibly) extended      cube in a certain time frame, IC3 has to recursively block its


                                                                                                                            21
predecessor states in the preceding time frames. To keep track        in clause c = ¬s while maintaining its inductiveness relative
of the states (or cubes) that must be blocked in certain time         to Fj−1 , in order to preserve condition (2). The resulting
frames, IC3 uses the formalism of proof-obligations.                  clause is added not only to Fj , but also to every time frame
                                                                      Fi , 0 < i < j (line 11). Doing so discharges the proof-
Definition 11. Given a cube s and a time frame Fj , a proof-
                                                                      obligation (s, j). In fact, this refinement rule out s from every
obligation is a couple (s, j) formalizing the fact that s must
                                                                      Fi with 0 < i ≤ j. Since the sets Fi with i > j are larger
be blocked in Fj .
                                                                      than Fj , s may still be present in one of them and (s, j + 1)
  Given a proof obligation (s, j), the cube s can either              may become a new proof-obligation. To address this issue,
represent a set of bad states or a set of states that can all         Algorithm 2 adds (s, j + 1) to the priority queue (line 12).
reach a bad state in one or more transitions. The number j               Otherwise, if Q3 is SAT (lines 7-8), a predecessor t of s
indicates the position in the trace where s must be proved            in Fj−1 can be extracted from the satisfying assignment. To
unreachable, or else the property fails.                              preserve condition (3), before blocking a cube s in a time
                                                                      frame Fj , every predecessor of s must be blocked in Fj−1 . So,
Definition 12. A proof obligation (s, j) is said to be dis-
                                                                      the predecessor t is extended with ternary simulation (line 7)
charged when s becomes blocked in Fj .
                                                                      into the cube p, and then both proof-obligations (p, j − 1) and
   IC3 maintains a priority queue Q of proof-obligations.             (s, j) are added to the queue (line 8).
During the blocking of a cube, proof-obligations (s, j) are
extracted from Q and discharged for increasing values of               Input: j: time frame index; s: cube such that ¬s is
j, ensuring that every predecessor of a bad cube s will be                 inductive relative to Fj−1 ; Fk : trace
blocked in Fj (j < k) before s will be blocked in Fk . In              Output: c : a sub-clause of ¬s
the BlockCube(s, Q, Fk ) procedure, first a proof-obligation            1: c ← ¬s
encoding the fact that s must be blocked in Fk is added to              2: for all literals l in c do
Q (line 1). Then proof-obligations are iteratively extracted            3:   try ← the clause obtained by deleting l from c
from the queue and discharged (lines 2-14).                             4:   if 6 ∃t, v ′ : t, v ′ |= Fj−1 ∧ T ∧ try ∧ ¬try ′ then
   Prior to discharge the proof-obligation (s, j) extracted, IC3        5:       if 6 ∃t |= I ∧ ¬try then
checks if that proof-obligation still needs to be discharged.           6:           c ← try
It is in fact possible for an enqueued proof-obligation to              7:       end if
become discharged as a result of the discharging of some                8:   end if
previously extracted proof-obligations. To perform this check,          9: end for
the following SAT query is posed (line 4):                             10: return c
                         SAT ?(Fj ∧ s)                       (Q2 )                  Algorithm 3. Generalize(j, s, Fk )
   If the result of Q2 is SAT, then the cube s is still in Fj and
(s, j) still needs to be discharged. Otherwise, s has already            The Generalize(j, s, Fk ) procedure (Algorithm 3) tries to
been blocked in Fj and the procedure can move on to the               remove literals from ¬s while keeping it relatively inductive
next iteration.                                                       to Fj−1 . To do so, a clause c intialized with ¬s (line 1) is used
   If the proof-obligation (s, j) still needs to be discharged,       to represent the current minimal inductive clause. For every
then IC3 checks if the time frame identified by j is the initial      literal of c, the clause try obtained by dropping that literal
time frame (line 5). If so, the states represented by s are initial   from c (line 3), is checked for inductiveness relative to Fj−1
states that can reach a violation of the property P . A counter-      by means of the following SAT query (line 4):
example σ to P can be constructed going up the chain of
proof-obligations that led to (s, 0). In that case, the procedure                    SAT ?(Fj−1 ∧ try ∧ T ∧ ¬try ′ )                (Q4 )
terminates with failure returning the counter-example found.
                                                                         If Q4 is UNSAT, the iductive case for the reduced formula
   To discharge a proof-obligation (s, j), i.e. to block a cube
                                                                      still holds. Since dropping literals from a relatively inductive
s in Fj , IC3 tries to derive a clause c such that c ⊆ ¬s and
                                                                      clause can break both the inductive case and the base case, the
c is inductive relative to Fj−1 . This is done by posing the
                                                                      latter must be explicilty checked too for the reduced clause
following SAT query (line 6):
                                                                      try (line 5). This is done by posing the following SAT query:
                  SAT ?(Fj−1 ∧ ¬s ∧ T ∧ s′ )                 (Q3 )
                                                                                             SAT ?(I ∧ ¬try)                        (Q5 )
   If Q3 is UNSAT (lines 10-12), then the clause ¬s is
inductive relative to Fj−1 and can be used to refine Fj , ruling         If both the inductive case and the base case hold for the
out s. To pursue a stronger refinement of Fj , the inductive          reduced clause try, the currently minimal inductive clause c
clause found undergoes a process called inductive generaliza-         is updated with try (line 6).
tion (line 10) prior to being added to Fi . Inductive gener-             The Propagate(Fk) procedure (Algorithm 4) handles the
alization is carried out by the procedure Generalize(j, s, Fk )       propagation phase. For every clause c of each time frame Fj ,
(Algorithm 3), which tries to minimize the number of literals         with 0 ≤ j < k − 1, the procedure checks if c can be pushed


                                                                                                                               22
 Input: Fk : trace                                                     Name           Query Type                     Query
 Output: Fk : updated trace                                             Q1         Target Intersection         SAT ?(Fk ∧ ¬P )
                                                                        Q2           Blocked Cube                SAT ?(Fi ∧ s)
  1: for j = 0 to k − 1 do                                              Q3         Relative Induction        SAT ?(Fi ∧ ¬s ∧ T ∧ s′ )
  2:   for all c ∈ Fj do                                                Q4      Inductive Generalization     SAT ?(Fi ∧ c ∧ T ∧ ¬c′ )
  3:      if ∃t, v ′ : t, v ′ |= Fj ∧ T ∧ c′ then                       Q5         Base of Induction             SAT ?(I ∧ ¬c)
  4:         Fj+1 ← Fj+1 ∪ {c}                                          Q6        Clause Propagation          SAT ?(Fi ∧ T ∧ ¬c′ )
  5:      end if
  6:   end for                                                                 Table I: SAT Queries Breakdown in IC3
  7: end for
  8: return Fk
                                                                       • SAT calls involved in inductive generalization are by
                 Algorithm 4. Propagate(Fk )                             far the most frequent ones. These are in fact the calls
                                                                         that appears at the finest granularity. In the worst case
                                                                         scenario, one call is posed for every literal of every
forward to Fj+1 (line 3). To do so, it poses the following SAT           inductive clause found.
query:                                                                 • Inductive generalization and propagation checks are the
                     SAT ?(Fj ∧ T ∧ c′ )                   (Q6 )         most expensive queries in terms of average SAT solving
  If the result of Q6 is SAT, then it is safe, with respect to           time required.
condition (3), to push clause c forward to Fi+1 . Otherwise, c         • Target intersection calls are very infrequent and don’t take

can’t be pushed and the procedure iterates to the next clause.           much time to be solved.
                                                                       • Blocked cube and relative induction checks are close in
C. Related works                                                         the number of calls and solving time.
  In [2], Aaron Bradley outlined the opportunity for SAT and
SMT researchers to directly address the problem of improving
IC3’s performance by exploiting the peculiar character of the                   Query          Calls              Avg Time
                                                                                          [Number]         [%]      [ms]
SAT queries it poses. A description of the IC3 algorithm,
                                                                                 Q1            483          0.1          81
specifically addressing implementation issues, is given in [7].                  Q2          27891          6.8         219
                                                                                 Q3          31172          7.6         334
                  III. SAT SOLVING IN IC3                                        Q4         142327         34.7         575
   SAT queries posed by IC3 have some specific characteris-                      Q5         147248         35.9         112
tics:                                                                            Q6          61114         14.9         681
   • Small-sized formulas: they employ no more than a single
      instance of the transition relation;                           Table II: SAT queries statistics in IC3: Number of calls,
   • Large number of calls: reasoning during the verification
                                                                     percentage, and average time spent in different SAT queries
      process is highly localized and takes place at relatively-     during an IC3 run.
      small-cubes granularity;
   • Separated contexts: each SAT query is relative to a single
                                                                                 IV. H ANDLE SAT SOLVING IN IC3
      time frame;
   • Related calls: subsequent calls may expose a certain               Subsequent SAT calls in IC3 are often applied to highly
      correlation (for instance, inductive generalization calls      different formulas. In the general case, two subsequent calls
      take place on progressively reduced formulas).                 can in fact be posed in the context of different time frames,
   We performed an analysis of the implementation of IC3             thus involving different sets of clauses. In addition, one of
within the academic model checking suite PdTRAV, closely             them may require the use of the transition relation, while
following the description of IC3 given in [7] (there called          the other may not, and each query can involve the use of
PDR: Property Directed Reachability). The experimental anal-         temporary clauses/cubes that are needed only to respond to that
ysis lead us to identify six different types of SAT queries that     particular query (e.g. the candidate inductive clause used to
the algorithm poses during its execution. These queries are          check relative inductiveness during cube blocking or inductive
the ones already outlined in Section II-B. The type of these         generalization).
queries is reported in Table I.                                         In the general case, whenever a new query is posed by
   For each of the queries identified, we have measured the          IC3 to the underlying SAT solving framework, the formula
average number of calls and the average solving time. These          currently loaded in the solver must be modified to accomodate
statistics are reported in Table II. The results were collected by   the new query. For this reason, IC3 requires the use of
running PdTRAV’s implementation of IC3 on the complete set           SAT solvers that expose an incremental SAT interface. An
of 310 single property benchmarks of the HWMCC’12, using             incremental SAT interface for IC3 must support the following
time and memory limits of 900 s and 2 GB, respectively.              features:
   Such statistics can be summarized as follows:                        • Pushing and popping clauses to or from the formula.




                                                                                                                              23
  •   Specifying literal assumptions.                                 The reason why inductive generalization calls are so ex-
   Many state-of-the-art SAT solvers, like MiniSAT [10], fea-      pensive can be due to the fact that during the inductive
ture an incremental interface capable of pushing new clauses       generalization of a clause, at every iteration a slightly changing
into the formula and allowing literal assumptions. Removing        formula is queried for a satisfying assignment in increasingly
clauses from the current formula is a more difficult task since    larger subspaces. Given two subsequent queries in inductive
one have to remove not only the single clause specified,           generalization, in fact, it can be noticed that their formulas
but also every learned clause that has been derived from           can differ only for one literal of the present state clause try
it. Although solvers such as zchaff [11] directly support          and one literal of the next state cube ¬try. As the subspace
clause removal, the majority of the state-of-the-art SAT solvers   becomes larger solving times for those queries increases.
feature an interface like the one of MiniSAT, in which clause      The average expensiveness of clause propagation calls can
removal can only be simulated. This is done through the use of     be intuitively motivated by noticing that they are posed one
literal assumptions and the introduction of auxiliary variables    time for every clause of every time frame. The innermost
known as activation variables, as described in [12]. In such       time frames are the ones with the largest number of clauses,
solvers, clauses aren’t actually removed from the formula          and thus require the largest number of propagation calls.
but only made redundant for the purpose of determining the         Unfortunately, given the high number of clauses in those time
satisfiability of the rest of the formula. Since such clauses      frames, the CNF formulas upon which such calls act are highly
are not removed from the formula, they still participate in        constrained and usually harder to solve. So during propagation
the Boolean Constraint Propagation (BCP) and, thus, degrade        there are, in general, more hard queries than simple queries,
the overall SAT solving performance. In order to minimize          making the average SAT solving time for those queries high.
this degradation, each solver employed by IC3 must be pe-             In an attempt to reduce the burden of each time frame’s SAT
riodically cleaned up, i.e. emptied and re-loaded with only        solver, we have experimented the use of specialized solvers for
relevant clauses. In this work we assume the use of a SAT          handling such queries. For every time frame, a second solver is
solver exposing an incremental interface similar to the one of     instantiated and used to respond a particular type of query (Q4
MiniSAT.                                                           or Q6 ). Table III summarize the results of such experiment.
   Once an efficient incremental SAT solving tool has been
                                                                                        VI. S OLVER LOADING
chosen, any implementation of IC3 must face the problem
of deciding how to integrate the top-level algorithm with the         To minimize the size of the formula to be loaded into each
underlying SAT solving layer. Such problem can be divided          solver, a common approach is to load, for every SAT call that
into the following three sub-problems:                             queries the dynamic behavior of the system, only the necessary
   • SAT solvers allocation: decide how many different SAT
                                                                   part of the transition relation.
      solvers to employ and how to distribute workload among          It is easy to observe that every SAT call that uses the
      them.                                                        transition relation involves a constraint on the next state
   • SAT solvers loading: decide which clauses must be loaded
                                                                   variables of the system in the form of a cube c′ . Such queries
      in each solver to make them capable of responding            ask if there is a state of the system satisfying some constraints
      correctly to the SAT queries posed by IC3.                   on the present state, which can reach in one transition states
   • SAT solvers clean up: decide when and how often the
                                                                   represented by c′ . Since c′ is a cube, the desired state must
      algorithm must clean up each solver, in order to avoid       have a successor p such that p correspond to c′ for the value of
      performance degradation.                                     every variable of c′ . It’s easy to see that the only present state
                                                                   variables that are relevant in determining if such a state exists,
               V. SAT SOLVERS ALLOCATION                           are those in the structural support of the next state variables
  We assume in this work the use of multiple SAT solvers,          of c′ . It follows that the only portions of the transition relation
one for each different time frame. Using multiple solvers, we      required to answer such queries are the logic cones of the next
observed that performance is highly related to:                    state variables of c′ .
  • Solver cleanup frequency: cleaning up the solver means            Such loading strategy, known as lazy loading of transition
     removal of incrementally loaded problem clauses and           relation, is commonly employed in various implementations
     learned clauses                                               of IC3, as the ones of PdTRAV and ABC. We observed in
  • Clause loading strategy: on-demand loading of transition       average 50% reduction in the size of the CNF formula for the
     relation clauses based on topological dependency              transition relation using lazy loading of transition relation.
                                                                      We noticed that, for these queries, the portions of the
A. Specialized solvers                                             transition relation loaded can be further minimized employing
  From the statistical results of reported in Table II it’s easy   a CNF encoding technique, called Plaisted-Greenbaum CNF
to see that inductive generalization and clause propagation        encoding [13] (henceforth simply called PG encoding). The
queries are by far the most expensive ones in terms of average     AIG representation of the transition relation together with
SAT solving time. Inductive generalization queries, in addition    the assumptions specified by the next state cube c′ can be
of being expensive, are also the most frequent type of query       viewed as a constrained boolean circuit [14], i.e. a boolean
posed.                                                             circuit in which some of the outputs are constrained to assume


                                                                                                                              24
certain values. The Plaisted-Greenbaum encoding is a special          These heuristics clean up each solver as soon as the computed
encoding that can be applied in the translation of a constrained      estimate meets some, often static, threshold.
boolean circuit into a CNF formula.                                      Our experiments with IC3 prove that the frequency of
   Below we give an informal description of the PG encoding.          the cleanups play a crucial role in determining the overall
For a complete description refer to [13] or [14].                     verification performance. We explored the use of new cleanup
   Given an AIG representation of a circuit, a CNF encoding           heuristics based on more precise measures of the number of
first subdivides that circuit into a set of functional blocks, i.e.   irrelevant clauses loaded and able to exploit correlation among
gates or group of connected gates representing certain boolean        different SAT calls to dynamically adjust the frequency of
functions, and introduces a new CNF variable a for each of            cleanups.
these blocks. For each functional block representing a function          For SAT calls in IC3, notice that there are two sources of
f on the input variables x1 , x2 , . . . , xn , a set of clauses is   irrelevant clauses loaded in a solver:
derived translating into CNF the formula:
                                                                        1) Deactivated clauses loaded for previous inductive checks
                     a ↔ f (x1 , x2 , . . . , xn )             (1)         (Q3 and Q4 queries).
                                                                        2) Portions of logic cones loaded for previous calls query-
The final CNF formula is obtained by the conjunction of these
                                                                           ing the dynamic behavour of the system.
sets of clauses. Different CNF encodings differ in the way the
gates are grouped together to form functional blocks and in              Cleanup heuristics commonly used, such as the ones used
the way these blocks are translated into clauses. The idea of         in baseline versions of ABC and PdTRAV, typically take into
PG encoding is to start from a base CNF encoding, and then            account only the number of deactivated clauses in the solver to
use both output assumptions and topological information of            compute an estimate of the number of irrelevant clauses. We
the circuit to get rid of some of the derived clauses, while          investigated the use of a new heuristic measure taking into
still producing an equi-satisfiable CNF formula. Based on the         account the second source of irrelevant clauses, i.e. irrelevant
output constraints and the number of negations that can be            portions of previously loaded cones.
found in every path from a gate to an output node, it can be             Every time a new query requires to load a new portion of the
shown that, for some gates of the circuit, an equi-satisfiable        transition relation, to compute a measure of the number of the
encoding can be produced by only translating one of the two           irrelevant transition relation’s clauses, the following quantities
sides of the bi-implication (1). The CNF formula produced             are computed (assuming that c′ is the cube constraining the
by PG encoding will be a subset of the one produced by the            next state variables for that query):
traditional encoding.                                                   • A: the number of transition relation clauses already
   PG encoding proves to be effective in reducing the size                loaded into the solver (before loading the logic cones
of loaded formulas, but it is not certain whether it is more              required by the current query);
efficient for SAT solving, since it may have worst propagation                ′
                                                                        • S(c ): the size (number of clauses) of the logic cones
behaviour [15].                                                           required for solving the current query;
   We observed a 10-20% reduction in the size of the CNF                      ′
                                                                        • L(c ): the number of clauses in the required logic cones to
formula for the transition relation.                                      be loaded into the solver (equal to the size of the required
                   VII. S OLVERS CLEAN UP                                 logic cone minus the number of clauses that such cone
                                                                          shares with previously loaded cones);
   A natural question arises regarding how frequently and at
what conditions SAT solvers cleanups should be scheduled.                A measure of the number of irrelevant transition relation’s
Cleaning up a SAT solver, in fact, introduces a certain over-         clauses loaded for c′ , denoted by u(c′ ), can be computed as
head. This is because:                                                follows:
   • Relevant clauses for the current query must be reloaded.                          u(c′ ) = A − (S(c′ ) − L(c′ ))               (2)
   • Relevant inferences previously derived must be derived
      again from those clauses.                                          Such a heuristic measure, divided by the number of clauses
   A heuristic cleanup strategy is needed in order to achieve a       loaded into the solver, indicates the percentage of irrelevant
trade-off between the overhead introduced and the slowdown            clauses loaded in the solver w.r.t the current query. In Sec-
in BCP avoided. The purpose of that heuristic is to determine         tion VIII we investigate the use new cleanup strategies based
when the number of irrelevant clauses (w.r.t. the current query)      on this measure. In order to take into account correlation
loaded in a solver becomes large enough to justify a cleanup.         between subsequent SAT calls, we consider such measure
To do so, a heuristic measure representing an estimate of the         averaged on a time window of the last SAT calls.
number of currently loaded irrelevant clauses is compared to a
certain threshold. If that measure exceeds the threshold, then                       VIII. E XPERIMENTAL R ESULTS
the solver is cleaned up.
   Clean up heuristics currently used in state-of-the-art tools,         We have compared different cleanup and clause loading
like ABC and PdTRAV, rely on loose estimates of the size              heuristics in both PdTRAV and ABC. In this section, we
of irrelevant portions of the formula loaded into each solver.        briefly summarize the obtained results.


                                                                                                                               25
A. PG encoding                                                         •   H4: H2 || H3;
   A first set of experiments was done on the full set of               Heuristic H1 is the cleanup heuristic used in the baseline
310 HWMCC’12 benchmarks [16], with a lower timeout, of               versions of both PdTRAV and ABC. The static threshold of
900 seconds, in order to evaluate the use of PG encoding.            300 activation literals was determined experimentally. Heuris-
Results were controversial. A run in PdTRAV, with 900                tic H2 cleans up each solver as soon as half of its variables
seconds timeout, showed a reduction in the number of solved          are used for activation literals. It can be seen as a first simple
instances from 79 to 63 (3 of which previously unsolved). The        attempt to adopt a dynamic cleanup threshold. Heuristic H3 is
percentage reduction of loaded transition relation clauses was       the first heuristic proposed to take into account the second
21.32%.                                                              source of irrelevant clauses described in Section VII, i.e.
   A similar run on ABC, showed a more stable compari-               irrelevant portions of previously loaded cones. In H3 a solver
son, from 80 to 79 solved instances (3 of which previously           is cleaned up as soon as the percentage of irrelevant transition
unsolved). So a first conclusion is that, PG encoding was            relation’s clauses loaded, averaged on a window of the last
not able to win over the standard encoding, suggesting it can        1000 calls, reaches 50%. Heuristic H4 takes into account both
indeed suffer of worst propagation behaviour. Nonetheless, it        sources of irrelevant clauses, combining H2 and H3.
was very interesting to observe that the overall number of
                                                                           Configuration   Solved [#]    New [#]      Avg Time [s]
solved problems grew from 79 to 82 in PdTRAV and from 80                     P0 (H1)           64                       137.00
to 85 in ABC.                                                                P3 (H2)           60           1           122.19
   Different results between the two tools in this experi-                   P4 (H3)           44                       116.28
mentation could be partially due to different light-weight                   P5 (H4)           62           3           125.94
preprocessing done by default within them. We started a more
detailed comparison, in order to better understand the partially                  Table IV: Tests on clean up heuristics.
controversial results.
                                                                        Table IV shows a comparison among H1 (row P 0), H2,
B. Experiments with PdTRAV                                           H3, and H4, in rows P 3, P 4, and P 5, respectively. No PG
  We then focused on a subset of 70 selected circuits, for           encoding, nor specialized solvers were used. Heuristic H1,
which preprocessing was done off-line, and the tools were            that employs a static threshold, was able to solve the largest
run on the same problem instances. In the following tables,          number of instances. Among dynamic threshold heuristics,
the P 0 rows always represent the default setting of PdTRAV.         both H2 and H3 take into account a single source of irrelevant
We report number of solver instances (out of 70) and average         loaded clauses, respectively the deactivated clauses in H2
execution time (time limit 900 seconds). Column labeled              and the unused portions of transition relation in H3. Data
N ew, when present, shows the number of instances not solved         clearly indicates that H3 has worse performance. This sug-
by P 0.                                                              gests that deactivated clauses play a bigger role in degrading
                                                                     SAT solvers’ performance than irrelevant transition relation’s
   Configuration     Solved [#]     New [#]      Avg Time [s]        clauses do. Taking into account only the latter source of
   P0 (baseline)         64                        137.00            irrelevant clauses it’s not sufficient. It can be noticed that
   P1 (Q4 spec.)         66            4           144.18
   P2 (Q6 spec.)         60            3           134.25
                                                                     heuristic H4, that takes into account both sources, outperforms
                                                                     both H2 and H3. This proves that considering irrelevant
           Table III: Tests on specialized solvers.                  clauses arising from previoulsy loaded cones in addition to
                                                                     deactivated clauses can be beneficial. In addition Table IV
                                                                     shows that dynamic heuristics were able solve some instances
   Table III shows two different implementations with special-
                                                                     that can’t be solved by the static heuristic H1 and viceversa. In
ized solvers (so two solvers per time frame): in the first one
                                                                     terms of overall number of solved instances, the static heuristic
(P 1) the second solver handles generalization calls while in
                                                                     H1 outperformes our best dynamic heuristic H4. This can be
the second one (P 2) the it handles propagation calls. Overall,
                                                                     due to the fact that the threshold parameter of H1 results from
we see a little improvement by P 1, with two more instances
                                                                     extensive experimentation while to determine the parameters
solved w.r.t P 0, including 4 instances not solved by P 0.
                                                                     of H4 (window size and percentage thresholds) a narrower
   The next two tables show an evaluation of different solver
                                                                     experimentation has been performed.
cleanup heuristics. Let a be the number of activation variables
                                                                        We then focused on H4, and benchmarked it in different
in the solver, |vars| the total number of variables in the solver,
                                                                     setups. Results are showed in table V.
|clauses| the total number of clauses in the solver, u(c′ ) the
                                                                        Here PG encoding was used in configurations P 6 and P 8,
heuristic measure discussed in Section VII and W (x, n) a
                                                                     single solver per time frame in P 6, additional specialized
sliding window containing the values of x for the last n SAT
                                                                     solver for generalization in P 7 and P 8. We see that the
calls. The heuristics compared are the following:
                                                                     specialized solver configuration appears to perform worse
   • H1: a > 300;                                                    with PG encoding. Also, adding a specialized solver for
                1
   • H2: a > 2 |vars|;                                               generalization to the dynamic heuristic H4 doesn’t seem to
                     u(c′ )       
   • H3: avg W |clause| , 1000       > 0.5                           be as effective as it is when using the static heuristic H1.


                                                                                                                              26
   Configuration      Solved [#]     New [#]      Avg Time [s]
                                                                     static heuristic H1 hasn’t been found yet, we believe that a
   P0 (baseline)          64                        137.00
     P6 (PG)              59               3        208.85           more extensive experimentation could lead to better results.
   P7 (Q4 spec)           58               1        111.59              Nonetheless, the results are more interesting if we consider
 P8 (PG+Q4 spec)          50               1        178.56           them from the standpoint of a portfolio-based tool, since the
                                                                     overall coverage (by the union of all setups) is definitely
Table V: Tests on mixed strategies for cleanup heuristic H4.         higher.
                                                                        So we believe that more effort in implementation, experi-
                                                                     mentation, and detailed analysis of case sudies, needs to be
This can be due to the fact that irrelevant clauses arising from     done. We also deem that this work contributes to the discussion
generalization calls are not taken into account to schedule the      of new developments in the research related to IC3.
clean up of the main solver that, in turn, is cleaned up less
frequently.                                                                                          R EFERENCES
                                                                      [1] A. R. Bradley, “SAT-based model checking without unrolling,” in
C. Experiments with ABC                                                   VMCAI, Austin, Texas, Jan. 2011, pp. 70–87.
                                                                      [2] A. R. Bradley, “Understanding IC3,” in SAT, ser. Lecture Notes in
   The 70 selected circuits were also benchmarked with ABC,               Computer Science, A. Cimatti and R. Sebastiani, Eds., vol. 7317.
with same pre-processing used in PdTRAV: Table VI report in               Springer, 2012, pp. 1–14.
row A0 the default setting of ABC. Row A1 shows the variant           [3] A. Biere, A. Cimatti, E. M. Clarke, M. Fujita, and Y. Zhu, “Symbolic
                                                                          Model Checking using SAT procedures instead of BDDs,” in Proc. 36th
with PG encoding, row A2 shows a run without dynamic TR                   Design Automation Conf. New Orleans, Louisiana: IEEE Computer
clause loading. Row A3 finally shows a different period for               Society, Jun. 1999, pp. 317–320.
solver cleanup (1000 variables instead of 300).                       [4] M. Sheeran, S. Singh, and G. Stålmarck, “Checking Safety Properties
                                                                          Using Induction and a SAT Solver,” in Proc. Formal Methods in
                                                                          Computer-Aided Design, ser. LNCS, W. A. Hunt and S. D. Johnson,
   Configuration     Solved [#]     New [#]      Avg Time [s]             Eds., vol. 1954. Austin, Texas, USA: Springer, Nov. 2000, pp. 108–
        A0               64                        138.66                 125.
        A1               63            1           152.18             [5] P. Bjesse and K. Claessen, “SAT–Based Verification without State Space
        A2               63            2           158.75                 Traversal,” in Proc. Formal Methods in Computer-Aided Design, ser.
        A3               64                        138.45                 LNCS, vol. 1954. Austin, TX, USA: Springer, 2000.
                                                                      [6] K. L. McMillan, “Interpolation and SAT-Based Model Checking,” in
                                                                          Proc. Computer Aided Verification, ser. LNCS, vol. 2725. Boulder,
     Table VI: Tests on ABC with different strategies.                    CO, USA: Springer, 2003, pp. 1–13.
                                                                      [7] N. Eén, A. Mishchenko, and R. K. Brayton, “Efficient implementation
                                                                          of property directed reachability,” in FMCAD, 2011, pp. 125–134.
   Overall, results show little variance among different settings,    [8] A. Mishchenko, “ABC: A System for Sequential Synthesis and Verifi-
which could suggest lesser sensitivity of ABC to different                cation, http://www.eecs.berkeley.edu/∼alanmi/abc/,” 2005.
tunings. Nonetheless, a further experimentation with ABC on           [9] G. Cabodi, S. Nocco, and S. Quer, “Benchmarking a model checker for
                                                                          algorithmic improvements and tuning for performance,” Formal Methods
the full set of 310 benchmarks (with 300 seconds time limit),             in System Design, vol. 39, no. 2, pp. 205–227, 2011.
showed a 14% improvement in the number of solved problems            [10] N. Eén and N. Sörensson, “The Minisat SAT Solver, http://minisat.se,”
(from 71 to 81), which indicate a potential improvement for               Apr. 2009.
                                                                     [11] M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik, “Chaff:
a portfolio-based tool, able to concurrently exploit multiple             Engineering an Efficient SAT Solver,” in Proc. 38th Design Automation
settings.                                                                 Conf. Las Vegas, Nevada: IEEE Computer Society, Jun. 2001.
                                                                     [12] N. Eén and N. Sörensson, “Temporal induction by incremental SAT
                      IX. C ONCLUSIONS                                    solving,” Electr. Notes Theor. Comput. Sci., vol. 89, no. 4, pp. 543–560,
                                                                          2003.
   The paper shows a detailed analysis and characterization of       [13] D. A. Plaisted and S. Greenbaum, “A structure-preserving clause form
SAT queries posed by IC3. We also discuss new ideas for                   translation,” J. Symb. Comput., vol. 2, no. 3, pp. 293–304, 1986.
                                                                     [14] M. Järvisalo, A. Biere, and M. Heule, “Simulating circuit-level simpli-
solver allocation/loading/restarting. The experimental evalu-             fications on CNF,” J. Autom. Reasoning, vol. 49, no. 4, pp. 583–619,
ation done on two different state-of-the-art academic tools,              2012.
shows lights and shadows, as no breakthrough or clear winner         [15] N. Eén, “Practical SAT: a tutorial on applied satisfiability solving,” Slides
                                                                          of invited talk at FMCAD, 2007.
emerges from the new ideas.                                          [16] A. Biere and T. Jussila, “The Model Checking Competition Web Page,
   PG encoding showed to be less effective than expected.                 http://fmv.jku.at/hwmcc.”
This is probably because the benefits introduced in terms of
loaded formula size will be overwhelmed by the supposed
worst propagation behaviour of that formula.
   The use of specialized solvers seems to be effective when a
static cleanup heuristic is used, less effective when combined
with PG encoding or a dynamic heuristic.
   Our experiments showed that, when a dynamic cleanup
heuristic is used, IC3’s performance can be improved by taking
into account both deactivated clauses and irrelevant portions of
previously loaded cones. Even if a parameter configuration for
H4 that is able to outperform the currently used, well-rounded


                                                                                                                                             27
                   Lemmas on Demand for Lambdas
                              Mathias Preiner, Aina Niemetz, and Armin Biere
                                Institute for Formal Models and Verification
                                 Johannes Kepler University, Linz, Austria


   Abstract—We generalize the lemmas on demand de-          proposed [10], which uses λ -terms similarly to UCLID.
cision procedure for array logic as implemented in          This extension provides support for modeling memset,
Boolector to handle non-recursive and non-extensional
lambda terms. We focus on the implementation aspects        memcpy and loop summarizations. However, it does
of our new approach and discuss the involved algorithms     not make use of native support of λ -terms provided
and optimizations in more detail. Further, we show how
arrays, array operations and SMT-LIB v2 macros are          by an SMT solver. Instead, it reduces instances in the
represented as lambda terms and lazily handled with         theory of arrays with λ -terms to a theory combination
lemmas on demand. We provide experimental results that      supported by solvers such as Boolector [3] (without
demonstrate the effect of native lambda support within
an SMT solver and give an outlook on future work.           native support for λ -terms), CVC4, STP [12], and
                                                            Z3 [6].
                                                               In this paper, we generalize the decision procedure
                  I. I NTRODUCTION                          for the theory of arrays with bit vectors as intro-
   The theory of arrays as axiomatized by Mc-               duced in [3] to lazily handle non-recursive and non-
Carthy [14] enables us to reason about memory (com-         extensional λ -terms. We show how arrays, array op-
ponents) in software and hardware verification, and         erations and SMT-LIB v2 macros are represented in
is particularly important in the context of deciding        Boolector as λ -terms and introduce a lemmas on de-
satisfiability of first order formulas w.r.t. first order   mand procedure for lazily handling λ -terms in Boolec-
theories, also known as Satisfiability Modulo Theories      tor in detail. We summarize an experimental evaluation
(SMT). However, it is restricted to array operations on     and compare our results to solvers with SMT-LIB v2
single array indices and lacks support for efficiently      macro support (CVC4, MathSAT [5], SONOLAR [13]
modeling operations such as memory initialization and       and Z3) and finally, give an outlook on future work.
parallel updates (memset and memcpy in the standard
C library).                                                                   II. P RELIMINARIES
   In 2002, Seshia et al. [4] introduced an approach to        We assume the usual notions and terminology of first
overcome these limitations by using restricted λ -terms     order logic and are mainly interested in many-sorted
to model array expressions (such as memset and mem-         languages, where bit vectors of different bit width cor-
cpy), ordered data structures and partially interpreted     respond to different sorts and array sorts correspond to a
functions within the SMT solver UCLID [17]. The             mapping (τi ⇒ τe ) from index sort τi to element sort τe .
SMT solver UCLID employs an eager SMT solving               Our approach is focused primarily on the quantifier-free
approach and therefore eliminates all λ -terms through      first order theories of fixed size bit vectors, arrays and
β -reduction, which replaces each argument variable         equality with uninterpreted functions, but not restricted
with the corresponding argument term as a preliminary       to the above.
rewriting step. Other SMT solvers that employ a lazy           We call 0-arity function symbols constant symbols
SMT solving approach and natively support λ -terms          and a, b, i, j, and e denote constants, where a and b are
such as CVC4 [1] or Yices [8] also treat them eagerly,      used for array constants, i and j for array indices, and
similarly to UCLID, and eliminate all occurrences of        e for an array value. For each bit vector of size n, the
λ -terms by substituting them with their instantiated       equality =n is interpreted as the identity relation over bit
function body (cf. C-style macros). Eagerly eliminating     vectors of size n. We further interpret the if-then-else bit
λ -terms via β -reduction, however, may result in an        vector term iten as ite(>,t, e) =n t and ite(⊥,t, e) =n e.
exponential blow-up in the size of the formula [17].        As a notational convention, the subscript might be
Recently, an extension of the theory of arrays was          omitted in the following. We identify read and write
  This work was funded by the Austrian Science Fund (FWF)
                                                            as basic operations on array elements, where read(a, i)
under NFN Grant S11408-N23 (RiSE).                          denotes the value of array a at index i, and write(a, i, e)

                                                                                                                  28
denotes the modified array a overwritten at position i                              III. λ - TERMS IN B OOLECTOR
with value e. The theory of arrays (without extensional-
ity) is axiomatized by the following axioms, originally                    In contrast to λ -term handling in other SMT solvers
introduced by McCarthy in [14]:                                        such as e.g. UCLID or CVC4, where λ -terms are
                                                                       eagerly eliminated, in Boolector we provide a lazy λ -
               i = j → read(a, i) = read(a, j)        (A1)
                                                                       term handling with lemmas on demand. We generalized
             i = j → read(write(a, i, e), j) = e      (A2)             the lemmas on demand decision procedure for the
        i 6= j → read(write(a, i, e), j) = read(a, j) (A3)             extensional theory of arrays introduced in [3] to handle
The array congruence axiom A1 asserts that accessing                   lemmas on demand for λ -terms as follows.
array a at two equal indices i and j produces the                          In order to provide a uniform handling of arrays
same element. The read-over-write Axioms A2 and A3                     and λ -terms within Boolector, we generalized all arrays
ensure a basic characteristic of arrays: A2 asserts that               (and array operations) to λ -terms (and operations on λ -
accessing a modification to an array a at the index it                 terms) by representing array variables as uninterpreted
has most recently been updated (i), produces the value it              functions (UF), read operations as function applica-
has been updated with (e). A3 captures the case when                   tions, and write and if-then-else operations on arrays
a modification to an array a is accessed at an index                   as λ -terms. We further interpret macros (as provided
other than the one it has most recently been updated at                by the command define-fun in the SMT-LIB v2 format)
( j), which produces the unchanged value of the original               as (curried) λ -terms. Note that in contrast to [3], our
array a at position j. Note that we assume that all                    implementation currently does not support extensional-
variables a, i, j and e in axioms A1, A2 and A3 are                    ity (equality) over arrays (λ -terms).
universally quantified.                                                    We represent an array as exactly one λ -term with
    From the theory of equality with uninterpreted func-               exactly one bound variable (parameter) and define its
tions we primarily focus on the following axiom:                       representation as λ j . t( j). Given an array of sort (τi ⇒
                          n
                          ^                                            τe ) and its λ -term representation λ j . t( j), we require
               ∀x̄, ȳ.         xi = yi → f (x̄) = f (ȳ)   (EUF)
                          i=1
                                                                       that bound variable j is of sort index τi and term t( j) is
The function congruence axiom (EUF) asserts that a                     of sort element τe . Term t( j) is not required to contain
function evaluates to the same value for the same                       j and if it does not contain j, it represents a constant λ -
argument values.                                                       term (e.g. λ j . 0). In contrast to SMT-LIB v2 macros,
   We only consider a non-recursive λ -calculus, as-                   it is not required to represent arrays with curried λ -
suming the usual notation and terminology, includ-                     chains, as arrays are accessed at one single index at a
ing the notion of function application, currying and                   time (cf. read and write operations on arrays).
β -reduction. In general, we denote a λ -term λx as                        We treat array variables as UF with exactly one
λ x.t(x), where x is a variable bound by λx and t(x)                   argument and represent them as fa for array variable a.
is a term in which x may or might not occur. We                            We interpret read operations as function applica-
interpret t(x) as defining the scope of bound variable                 tions on either UF or λ -terms with read index i as
x. Without loss of generality, the number of bound                     argument and represent them as read(a, i) ≡ fa (i) and
variables per λ -term is restricted to exactly one. Func-              read(λ j . t( j), i) ≡ (λ j . t( j))(i), respectively.
tions with more than one parameter are transformed                         We interpret write operations as λ -terms model-
into a chain of nested λ -terms by means of currying                   ing the result of the write operation on array a
(e.g. f (x, y) := x + y is rewritten as λ x . λ y . x + y). As         at index i with value e, and represent them as
a notational convention, we will use λx̄ as a shorthand                write(a, i, e) ≡ λ j . ite(i = j, e, fa ( j)). A function appli-
for λ x0 . . . λ xk . t(x0 , . . . , xk ) for k ≥ 0. We identify the   cation on a λ -term λw representing a write operation
function application as an explicit operation on λ -terms              yields value e if j is equal to the modified index
and interpret it as instantiating a bound variable (all                i, and the unmodified value fa ( j), otherwise. Note
bound variables) of a λ -term (a curried λ -chain). We                 that applying β -reduction to a λ -term λw yields the
interpret β -reduction as a form of function application,              same behaviour described by array axioms A2 and
where all formal parameter variables (bound variables)                 A3. Consider a function application on λw (k), where
are substituted with their actual parameter terms. We                  k represents the position to be read from. If k = i
will use λx̄ [x0 \a0 , . . . , xn \an ] to indicate β -reduction of    (A2), β -reduction yields the written value e, whereas if
a λ -term λx̄ , where the formal parameters x0 , . . . , xn are        k 6= i (A3), β -reduction returns the unmodified value of
substituted with the actual argument terms a0 , . . . , an .           array a at position k represented by fa (k). Hence, these

                                                                                                                                29
axioms do not need to be explicitly checked during
consistency checking. This is in essence the approach                                                                                          eq

                                                                                                                                           2        1
to handle arrays taken by UCLID [17].
    We interpret if-then-else operations on arrays                                                                          applyi                            applyj

                                                                                                                            1 2                          1
a and b as λ -terms, and represent them as                                                                                                                     2


ite(c, a, b) ≡ λ j . ite(c, fa ( j), fb ( j)). Condition c yields                                                  lambda     args              eq            args

                                                                                                                   21                          2 1
either function application fa ( j) or fb ( j), which repre-
                                                                                                                                     var                var
sent the values of arrays a and b at index j, respectively.                                                  ite                      i                  j
                                                                                                      2 13
    In addition to the base array operations introduced
                                                                                    apply              slt
above, λ -terms enable us to succinctly model array
                                                                                    1 2               2 1
operations like e.g. memcpy and memset from the
                                                                           lambda    args     const
standard C library, which we previously were not able
                                                                            12
to efficiently express by means of read, write and ite                     param                              param
                                                                             y                                  x
operations on arrays. In particular, both memcpy and
memset could only be represented by a fixed sequence                      Fig. 1: DAG representation of formula ψ1 .
of read and write operations within a constant index
range, such as copying exactly 5 words etc. It was                     In contrast to treating SMT-LIB v2 macros as C-style
not possible to express a variable range, e.g. copying n            macros, i.e., substituting every function application with
words, where n is a symbolic (bit vector) variable.                 the instantiated function body, in Boolector, we directly
    With λ -terms however, we do not require a sequence             translate SMT-LIB v2 macros into λ -terms, which are
of array operations as it usually suffices to model a par-          then handled lazily via lemmas on demand. Formulas
allel array operation by means of exactly one λ -term.              are represented as directed acyclic graphs (DAG) of
Further, the index range does not have to be fixed and              bit vector and array expressions. Further, in this paper,
can therefore be within a variable range. This type of              we propose to treat arrays and array operations as λ -
high level modeling turned out to be useful for applica-            terms and operations on λ -terms, which results in an
tions in software model checking [10]. See also [17] for            expression graph with no expressions of sort array (τi ⇒
more examples. For instance, the memset with signature              τe ). Instead, we introduce the following four additional
memset (a, i, n, e), which sets each element of array a             expression types of sort bit vector:
within the range [i, i + n[ to value e, can be represented           • a param expression serves as a placeholder variable
as λ j . ite(i ≤ j ∧ j < i + n, e, fa ( j)). Note, n can be             for a variable bound by a λ -term
symbolic, and does not have to be a constant. In the                 • a lambda expression binds exactly one param ex-
same way, memcpy with signature memcpy (a, b, i, k, n),                 pression, which may occur in a bit vector expression
which copies all elements of array a within the range                   that represents the body of the λ -term
[i, i + n[ to array b, starting from index k, is represented         • an args expression is a list of function arguments
as λ j . ite(k ≤ j ∧ j < k + n, fa (i + j − k), fb ( j)). As a       • an apply expression represents a function application
special case of memset, we represent array initialization               that applies arguments args to a lambda expression
operations, where all elements of an array are initialized             Example 1: Consider ψ1 ≡ f (i) = f ( j) ∧ i 6= j with
with some (constant or symbolic) value e, as λ j . e.               functions f (x) := ite(x < 0, g(x), x), g(y) := −y as de-
    Introducing λ -terms does not only enable us to model           picted in Fig. 1. Both functions are represented as λ -
                                                                    terms, where function g(y) returns the negation of y
arrays and array operations, but further provides support
                                                                    and is used in function f (x), which computes the abso-
for arbitrary functions (macros) by means of currying,              lute value of x. Dotted nodes indicate parameterized
with the following restrictions: (1) functions may not              expressions, i.e., expressions that depend on param
be recursive and (2) arguments to functions may not be              expressions, and serve as templates that are instantiated
functions. The first restriction enables keeping the im-            as soon as β -reduction is applied.
plementation of λ -term handling in Boolector as simple                In order to lazily evaluate λ -terms in Boolector we
as possible, whereas the second restriction limits λ -term          implemented two β -reduction approaches, which we
handling in Boolector to non-higher order functions.                will discuss in the next section in more detail.
Relaxing these restrictions will turn the considered λ -
calculus to be Turing-complete and in general render                                        IV. β - REDUCTION
the decision problem to be undecidable. As future work                 In this section we discuss how concepts from the
it might be interesting to consider some relaxations.               λ -calculus have been adapted and implemented in

                                                                                                                                                                       30
our SMT solver Boolector. We focus on reduction
algorithms for the non-recursive λ -calculus, which is                                                                                                        eq

                                                                                                                                                         1
rather atypical for the (vast) literature on λ -calculus.                                                                                                             2


In the context of Boolector, we distinguish between                                                                                    2
                                                                                                                                                apply                         apply
                                                                                                                                                                          1            2
full and partial β -reduction. They mainly differ in                                                                                                  1

                                                                                             args                                                                                                          args
their application and the depth up to which λ -terms                                   1                                                             2
                                                                                                                                                          lambda

                                                                                                2                                                                             1                            1       2
are expanded. In essence, given a function application
                                                                                                var                                                                                                        var               var
λx̄ (a0 , . . . , an ) partial β -reduction reduces only the top-                                j
                                                                                                                           1
                                                                                                                                lambda                                                                      k                 l
                                                                                                                                 2
most λ -term λx̄ , whereas full β -reduction reduces λx̄
and every λ -term in the scope of λx̄ .                                                                                     3
                                                                                                                                  ite

                                                                                                                                 1         2
    Full β -reduction of a function application on λ -term
                                                                                                             apply                                   apply
λx̄ consists of a series of β -reductions, where λ -term
                                                                                                             2             1                                      2
                                                                                                                                                         1
λx̄ and every λ -term λȳ within the scope of λx̄ are in-
                                                                                                        args                                                                         args
stantiated, substituting all formal parameters with actual                                                             1
                                                                                                                                ult
                                                                                                                                        2
                                                                                                                                                     lambda
                                                                                                                                                      2       1
parameter terms. Since we do not allow partial function
                                                                                                    param                                                                         param
                                                                                                                                      ite
applications, full β -reduction guarantees to yield a term                                            y                    1                                                        x
                                                                                                                            2          3
which is free of λ -terms. Given a formula with λ -terms,
                                                                                           eq                     var
                                                                                                                                           mul
we usually employ full β -reduction in order to eliminate                                                          e
                                                                                                                                  1
                                                                                       1            2                                       2
all λ -terms by substituting every function application
                                                                                      var                         param
                                                                                                                                               const
with the term obtained by applying full β -reduction                                   i                            x


on that function application. In the worst case, full β -                                                               (a) Original formula ψ2 .
reduction results in an exponential blow-up. However,
in practice, it is often beneficial to employ full β -
reduction, since it usually leads to significant simplifi-                                                                                               eq
                                                                                                                                                 2
cations through rewriting. In Boolector, we incorporate                                                                                                       1


this method as an optional rewriting step. We will use                                                                     1
                                                                                                                                      ite
                                                                                                                                                 3
                                                                                                                                                                               ite
                                                                                                                                 2                                        3                 1
                                                                                                                                                                                      2
λx̄ [x0 \a0 , . . . , xn \an ]f as a shorthand for applying full
β -reduction to λx̄ with arguments a0 , . . . , an .                              1
                                                                                      and
                                                                                                                                                          1
                                                                                                                                                                  ite
                                                                                                                                                                                                 2
                                                                                                                                                                                                     ite

                                                                                       2                                                                                                                   3
    Partial β -reduction of a λ -term λx̄ , on the other                                                                                                          3           2                      1

                                                                                                                                                                                           var
hand, essentially works in the same way as what is                     eq              ult
                                                                                                         1
                                                                                                                 mul
                                                                                                                                 1
                                                                                                                                        eq                    mul                           e    1
                                                                                                                                                                                                      eq
                                                                                                                                                                                                                   2
                                                                                                                                                                                                                       mul                 ult
                                                                                                                        2
                                                                            2 1        1        2                                                2            2
                                                                                                                                                                          1                                    2       1           1   2
referred to as β -reduction in the λ -calculus. Given a
                                                                                       var                         var                                                               var                               var
function application λx̄ (a0 , . . . , an ), partial β -reduction                       j                           i                                    const                        l                                 k

substitutes formal parameters x0 , . . . , xn with the actual                         (b) Formula ψ20 after full β -reduction of ψ2 .
argument terms a0 , . . . , an without applying β -reduction                      Fig. 2: Full β -reduction of formula ψ2 .
to other λ -terms within the scope of λx̄ . This has the
effect that λ -terms are expanded function-wise, which                a) Visit arguments b0 , . . . , bm first, and obtain rebuilt
we require for consistency checking. In the following,                     arguments b00 , . . . , b0m .
we use λx̄ [x0 \a0 , . . . , xn \an ]p to denote the application      b) Assign rebuilt arguments b00 , . . . , b0m to λȳ and
of partial β -reduction to λx̄ with arguments a0 , . . . , an .            apply β -reduction to λȳ (b00 , . . . , b0m ).
                                                                       This ensures a bottom-up construction of the β -
A. Full β -reduction
                                                                       reduced DAG (see step 3.), since all arguments
  Given a function application λx̄ (a0 , . . . , an ) and a            b00 , . . . , b0m passed to a λ -term λȳ are β -reduced and
DAG representation of λx̄ . Full β -reduction of λx̄                   rebuilt prior to applying β -reduction to λȳ .
consecutively substitutes formal parameters with actual             3) During up-traversal of the DAG we rebuild all
argument terms while traversing and rebuilding the                     visited expressions bottom-up and require special
DAG in depth-first-search (DFS) post-order as follows.                 handling for the following expressions:
1) Initially, we instantiate λx̄ by assigning arguments                • param: substitute param expression yi with cur-
   a0 , . . . , an to the formal parameters x0 , . . . , xn .              rent instantiation b0i
2) While traversing down, for any λ -term λȳ in the                   • apply: substitute expression λȳ (b0 , . . . , bm ) with
   scope of λx̄ , we need special handling for each                        λȳ [y0 \b00 , . . . , ym \b0m ]f
   function application λȳ (b0 , . . . , bm ) as follows.

                                                                                                                                                                                                                                       31
We further employ following optimizations to improve
the performance of the full β -reduction algorithm.
                                                                                                                               eq

 • Skip expressions that do not need rebuilding                                                                        2                 1

     Given an expression e within the scope of a λ -term                                                    ite                                         ite
     λx̄ . If e is not parameterized and does not contain                                           3
                                                                                                        1      2                                    2 3 1

     any λ -term, e is not dependent on arguments passed                                apply                      apply                                      apply
                                                                                                                                                        1
     to λx̄ and may therefore be skipped.                                           2           1              2
                                                                                                                           1                                  2

  • λ -scope caching                                                         args        ult            args                   lambda                         args                 ult
                                                                                    1                                                                                      1
     We cache rebuilt expressions in a λ -scope to prevent                               2
                                                                                                                           1
                                                                                                                                   2                                           2

     rebuilding parameterized expressions several times.                      var
                                                                               l
                                                                                          var
                                                                                           k                                       ite
                                                                                                                                                                  var
                                                                                                                                                                   j

    Example 2: Given a formula ψ2 ≡ f (i, j) = f (k, l)                                                                        3         1 2

and two functions g(x) := ite(x = i, e, 2 ∗ x) and                                                                     mul                     eq              var
                                                                                                                                                                e
                                                                                                                                       2
 f (x, y) := ite(y < x, g(x), g(y)) as depicted in Fig. 2a.                                                        2           1                    1

Applying full β -reduction to formula ψ2 yields formula                                                     param                      const
                                                                                                                                                                     var
                                                                                                                                                                      i
ψ20 as illustrated in Fig. 2b. Function application f (i, j)
has been reduced to ite( j ≥ i ∧ i 6= j, 2 ∗ j, e) and f (k, l)           Fig. 3: Partial β -reduction of formula ψ2 .
to ite(l < k, ite(k = i, e, 2 ∗ k), ite(l = i, e, 2 ∗ l)).
                                                                      • if-then-else: substitute expression ite(c,t1 ,t2 ) with
                                                                          t1 if c = >, and t2 otherwise
B. Partial β -reduction                                               For partial β -reduction, we have to modify the first of
   Given a function application λx̄ (a0 , . . . , an ) and a       the two optimizations introduced for full β -reduction.
DAG representation of λx̄ . The scope of a partial β -               • Skip expressions that do not need rebuilding
reduction β p (λx̄ ) is defined as the sub-DAG obtained                Given an expression e in the scope of partial β -
by cutting off all λ -terms in the scope of λx̄ . Assume               reduction β p (λx̄ ). If e is not parameterized, in the
that we have an assignment for arguments a0 , . . . , an ,             context of partial β -reduction, e is not dependent
and for all non-parameterized expressions in the scope                 on arguments passed to λx̄ and may be skipped.
of β p (λx̄ ). The partial β -reduction algorithm substi-             Example 3: Consider ψ2 from Ex. 2. Applying partial
tutes param expressions x0 , . . . , xn with a0 , . . . , an and   β -reduction to ψ2 yields the formula depicted in Fig. 3,
rebuilds λx̄ . Similar to full β -reduction, we perform a          where function application f (i, j) has been reduced to
DFS post-order traversal of the DAG as follows.                    ite( j < i, e, g( j)) and f (k, l) to ite(l < k, g(k), g(l)).
1) Initially, we instantiate λx̄ by assigning arguments
                                                                                    V. D ECISION P ROCEDURE
   a0 , . . . , an to the formal parameters x0 , . . . , xn .
2) While traversing down the DAG, we require special                  The idea of lemmas on demand goes back to [7]
   handling for the following expressions:                         and actually represents one extreme variant of the lazy
   • function applications λȳ (b0 , . . . , bm )
                                                                   SMT approach [16]. Around the same time, a related
                                                                   technique was developed in the context of bounded
       a) Visit arguments b0 , . . . , bm , obtain rebuilt ar-
                                                                   model checking [9], which lazily encodes all-different
              guments b00 , . . . , b0m .
                                                                   constraints over bit vectors (see also [2]). In constraint
       b) Do not assign rebuilt arguments b00 , . . . , b0m to
                                                                   programming the related technique of lazy clause gen-
              λȳ and stop down-traversal at λȳ .
                                                                   eration [15] is effective too.
   • ite(c,t1 ,t2 )                                                   In this section, we introduce lemmas on demand for
      Since we have an assignment for all non-                     non-recursive λ -terms based on the algorithm intro-
      parameterized expressions within the scope of                duced in [3]. A top-level view of our lemmas on demand
      β p (λx̄ ), we are able to evaluate condition c. Based       decision procedure for λ -terms (DPλ ) is illustrated in
      on that we either select t1 or t2 to further traverse        Fig. 4 and proceeds as follows. Given a formula φ ,
      down (the other branch is omitted).                          DPλ uses a bit vector skeleton of the preprocessed
3) During up-traversal of the DAG we rebuild all                   formula π as formula abstraction αλ (π). In each itera-
   visited expressions bottom-up and require special               tion, an underlying decision procedure DPB determines
   handling for the following expressions:                         the satisfiability of the formula abstraction refined by
   • param: substitute param expression yi with cur-               formula refinement ξ , i.e., in DPB , we eagerly encode
      rent instantiation b0i                                       the refined formula abstraction Γ to SAT and determine

                                                                                                                                                                                         32
p r o c e d u r e DPλ ( φ )
        π ← preprocess(φ )                                                                 eq                        eq

        ξ ←>                                                                              1 2                    1    2
        loop                                                                        var         var
                                                                                                      αλ (applyi )        αλ (applyj )
               Γ ← αλ (π) ∧ ξ                                                        i           j

               (r, σ ) ← DPB (Γ)
                i f r = unsatisfiable r e t u r n unsatisfiable            Fig. 5: Formula abstraction αλ (ψ1 ).
                i f consistentλ (π, σ ) r e t u r n satisfiable
               ξ ← ξ ∧ αλ (lemmaλ (π, σ ))                           Example 4: Consider formula ψ1 from Ex. 1, which
                                                                  has two roots. The abstraction function αλ performs a
     Fig. 4: Lemmas on demand for λ -terms DPλ .                  consecutive down-traversal of the DAG from both roots.
                                                                  The resulting abstraction is a mapping of all bit vector
its satisfiability by means of a SAT solver. As Γ is              terms encountered during traversal, according to the
an over-approximation of φ , we immediately conclude              rules 1-3 above. For function applications (e.g. applyi )
with unsatisfiable if Γ is unsatisfiable. If Γ is satisfiable,    fresh bit vector variables (e.g. αλ (applyi )) are intro-
we have to check if the current satisfying assign-                duced, where the remaining sub-DAGs are therefore cut
ment σ (as provided by procedure DPB ) is consistent              off. The resulting abstraction αλ (ψ1 ) is given in Fig. 5.
w.r.t. preprocessed formula π. If σ is consistent, i.e., if
it can be extended to a valid satisfying assignment for                      VII. C ONSISTENCY C HECKING
the preprocessed formula π, we immediately conclude
                                                                     In this section, we introduce a consistency checking
with satisfiable. Otherwise, assignment σ is spurious,
                                                                  algorithm consistentλ as a generalization of the con-
consistentλ (π, σ ) identifies a violation of the function
                                                                  sistency checking approach presented in [3]. However,
congruence axiom EUF, and we generate a symbolic
                                                                  in contrast to [3], we do not propagate so-called access
lemma lemmaλ (π, σ ) which is added to formula re-
                                                                  nodes but function applications and further check axiom
finement ξ in its abstracted form αλ (lemmaλ (π, σ )).
                                                                  EUF (while applying partial β -reduction to evaluate
   Note that in φ , in contrast to the decision procedure
                                                                  function applications under a current assignment) in-
introduced in [3], all array variables and array opera-
                                                                  stead of checking array axioms A1 and A2. Given a
tions in the original input have been abstracted away
                                                                  satisfiable over-approximated and refined formula Γ,
and replaced by corresponding λ -terms and operations
                                                                  procedure consistentλ determines whether a current
on λ -terms. Hence, various integral components of the
                                                                  satisfying assignment σ (as provided by the under-
original procedure (αλ , consistentλ , lemmaλ ) have been
                                                                  lying decision procedure DPB ) is spurious, or if it
adapted to handle λ -terms as follows.
                                                                  can be extended to a valid satisfying assignment for
                                                                  the preprocessed input formula π. Therefore, for each
               VI. F ORMULA A BSTRACTION
                                                                  function application in π, we have to check both if
   In this section, we introduce a partial formula ab-            the assignment of the corresponding abstraction vari-
straction function αλ as a generalization of the ab-              able is consistent with the value obtained by applying
straction approach presented in [3]. Analogous to [3],            partial β -reduction, and if axiom EUF is violated. If
we replace function applications by fresh bit vector              consistentλ does not find any conflict, we immediately
variables and generate a bit vector skeleton as for-              conclude that formula π is satisfiable. However, if
mula abstraction. Given π as the preprocessed input               current assignment σ is spurious w.r.t. preprocessed
formula φ , our abstraction function αλ traverses down            formula π, either axiom EUF is violated or partial β -
the DAG structure starting from the roots, and generates          reduction yields a conflicting value for some function
an over-approximation of π as follows.                            application in π. In both cases, we generate a lemma
1) Each bit vector variable and symbolic constant is              as formula refinement. In the following we will equally
    mapped to itself.                                             use function symbols f , g, and h for UF symbols and
2) Each function application λx̄ (a0 , . . . , an ) is mapped     λ -terms.
    to a fresh bit vector variable.                                  In order to check axiom EUF, for each λ -term and UF
3) Each bit vector term t(y0 , . . . , ym ) is mapped to          symbol we maintain a hash table ρ, which maps λ -
    t(αλ (y0 ), . . . , αλ (ym )).                                terms and UF symbols to function applications. We
Note that by introducing fresh variables for function             check consistency w.r.t. π by applying the following
applications, we essentially cut off λ -terms and UF              rules.
and therefore yield a pure bit vector skeleton, which             I: For each f (ā), if ā is not parameterized,
is subsequently eagerly encoded to SAT.                               add f (ā) to ρ( f )

                                                                                                                                         33
C: For any pair s := g(ā), t := h(b̄) ∈ ρ( f ) check                            p r o c e d u r e consistentλ (π, σ )
    n
    ^                                                                                    S ← nonparam_apps ( π )
          σ (αλ (ai )) = σ (αλ (bi )) → σ (αλ (s)) = σ (αλ (t))                          w h i l e S 6= 0/
    i=0                                                                                          (g, f (a0 , . . . , an )) ← pop ( S )
                                                                                                 encode ( f (a0 , . . . , an ) )
B: For any s := λȳ (a0 , . . . , an ) ∈ ρ(λx̄ ) with                                            /∗ check r u l e C ∗/
   t := λx̄ [x0 \a0 , . . . , xn \an ]p ,                                                         i f n o t congruent ( g, f (a0 , . . . , an ) )
                                                                                                         return ⊥
   check rule P, if P fails, check eval(t) = σ (αλ (s))                                          add ( f (a0 , . . . , an ), ρ(g) )
P: For any s := λȳ (a0 , . . . , an ) ∈ ρ(λx̄ ) with                                             i f is_UF ( g ) c o n t i n u e
                                                                                                 encode ( g )
   t := g(b0 , . . . , bm ) = λx̄ [x0 \a0 , . . . , xn \an ]p ,                                  /∗ check r u l e B ∗/
                     n
                     ^                                                                          t ← g[x0 \a0 , . . . , xn \an ]p
    if n = m ∧             ai = bi , propagate s to ρ(g)                                          i f assigned ( t )
                                                                                                          i f σ (t) 6= σ (αλ ( f (a0 , . . . , an )))
                     i=0                                                                                           return ⊥
                                                                                                  e l i f t = h(a0 , . . . , an ) / ∗ c h e c k r u l e P ∗ /
   Given a λ -term (UF symbol) f and a correspond-                                                       push ( S, (h, f (a0 , . . . , an )) )
ing hash table ρ( f ). Rule I, the initialization rule,                                                  continue
                                                                                                 else
initializes ρ( f ) with all non-parameterized function                                                   apps ← f resh apps(t)
applications on f . Rule C corresponds to the function                                                   f o r a ∈ apps
                                                                                                                  encode ( a )
congruence axiom and is applied whenever we add                                                           i f eval ( t ) 6= σ (αλ ( f (a0 , . . . , an )))
a function application g(a0 , . . . , an ) to ρ( f ). Rule B                                                       return ⊥
                                                                                                         f o r h(b0 , . . . , bm ) ∈ apps
is a consistency check w.r.t. the current assignment                                                              push ( S, (h, h(b0 , . . . , bm )) )
σ , i.e., for every function application s in ρ( f ), we                                 return >
check if the assignment of σ (αλ (s)) corresponds to
                                                                                       Fig. 6: Procedure consistentλ in pseudo-code.
the assignment evaluated by the partially β -reduced
term λx̄ [x0 \a0 , . . . , xn \an ]p . Finally, rule P represents a              loop, we check rules C and B for each tuple as follows.
crucial optimization of consistentλ , as it avoids unnec-                        First we check if f (a0 , . . . , an ) violates the function con-
essary conflicts while checking B. If P applies, both                            gruence axiom EUF w.r.t. function g and return ⊥ if this
function applications s and t have the same arguments.                           is the case. Note that for checking rule C, we require an
As function application s ∈ ρ(λx̄ ), rule C implies                              assignment for arguments a0 , . . . , an , hence we encode
that s = λx̄ (a0 , . . . , an ). Therefore, function applications                them on-the-fly. If rule C is not violated and function f
s and t must produce the same function value as                                  is an uninterpreted function, we continue to check the
t := λx̄ [x0 \a0 , . . . , xn \an ]p = λȳ [x0 \a0 , . . . , xn \an ]p , i.e.,   next tuple on stack S. However, if f is a λ -term we
function application t must be equal to the result of                            still need to check rule B, i.e., we need to check if the
applying partial β -reduction to function application s.                         assignment σ (αλ ( f (a0 , . . . , an ))) is consistent with the
Assume we encode t and add it to the formula. If DPB                             value produced by g[x0 \a0 , . . . , xn \an ]p . Therefore, we
guesses an assignment s.t. σ (αλ (t)) 6= σ (αλ (s)) holds,                       first encode all non-parameterized expressions in the
we have a conflict and need to add a lemma. However,                             scope of partial β -reduction β p (g) (cf. encode(g))
this conflict is unnecessary, as we know from the start                          before applying partial β -reduction with arguments
that both function applications must map to the same                             a0 , . . . , an , which yields term t. If term t has an as-
function value in order to be consistent. We avoid this                          signment, we can immediately check if it differs from
conflict by propagating s to ρ(g).                                               assignment σ (αλ ( f (a0 , . . . , an ))) and return ⊥ if this is
    Figure 6 illustrates our consistency checking algo-                          the case. However, if term t does not have an assign-
rithm consistentλ , which takes the preprocessed input                           ment, which is the case when t has been instantiated
formula π and a current assignment σ as arguments, and                           from a parameterized expression, we have to compute
proceeds as follows. First, we initialize stack S with all                       the value for term t. Note that we could also encode
non-parameterized function applications in formula π                             term t to get an assignment σ (t), but this might add a
(cf. nonparam_apps(π)) and order them top-down,                                  considerable amount of superfluous clauses to the SAT
according to their appearance in the DAG represen-                               solver. Before computing a value for t we check if rule
tation of π. The top-most function application then                              P applies and propagate f (a0 , . . . , an ) to h if applicable.
represents the top of stack S, which consists of tuples                          Otherwise, we need to compute a value for t and
(g, f (a0 , . . . , an )), where f and g are initially equal and                 check if t contains any function applications that were
 f (a0 , . . . , an ) denotes the function application propa-                    instantiated and not yet encoded (cf. fresh_apps(t))
gated to function g. In the main consistency checking                            and encode them if necessary. Finally, we compute

                                                                                                                                                         34
the value for t (cf. eval(t)) and compare it to the                                we run DPB on αλ (ψ1 ) and it returns a satisfying
assignment of αλ ( f (a0 , . . . , an )). If the values differ,                    assignment σ such that σ (i) 6= σ ( j), σ (ai ) = σ (a j ),
we found an inconsistency and return ⊥. Otherwise,                                 σ (i) < 0 and σ (ai ) 6= σ (−i). First, we check con-
we continue consistency checking the newly encoded                                 sistency for λx (i) and check rule C, which is not
function applications apps. We conclude with >, if all                             violated as σ (i) 6= σ ( j), and continue with checking
                                                                                   rule B. We apply partial β -reduction and obtain term
function applications have been checked successfully                               t := λx [x/i]p = λy (i) (since σ (i) < 0) for which rule P
and no inconsistencies have been found.                                            is applicable. We propagate λx (i) to λy , check if λx (i)
                                                                                   is consistent w.r.t. λy , apply partial β -reduction, obtain
A. Lemma generation                                                                t := λy [y/i]p = −i and find an inconsistency according
   Following [3], we introduce a lemma generation                                  to rule B: σ (ai ) 6= σ (−i) but we obtained σ (ai ) =
                                                                                   σ (−i). We generate lemma i < 0 → ai = −i. Assume
procedure lemmaλ , which generates a symbolic lemma
                                                                                   that in the next iteration DBP returns a new satisfying
whenever our consistency checker detects an inconsis-                              assignment σ such that σ (i) 6= σ ( j), σ (ai ) = σ (a j ),
tency. Depending on whether rule C or B was violated,                              σ (i) < 0, σ (ai ) = σ (−i) and σ ( j) > σ (−i). We first
we generate a symbolic lemma as follows. Assume                                    check consistency for λx (i), which is consistent due to
that rule C was violated by function applications s :=                             the lemma we previously generated. Next, we check
g(a0 , . . . , an ), t := h(b0 , . . . , bn ) ∈ ρ( f ). We first collect           rule C for λx ( j), which is not violated since σ (i) 6=
all conditions that lead to the conflict as follows.                               σ ( j), and continue with checking rule B. We apply
                                                                                   partial β -reduction and obtain term t := λx [x/ j]p = j
1) Find the shortest possible propagation path ps (pt )
                                                                                   (since σ ( j) > σ (−i) and σ (i) < 0) and find an incon-
    from function application s (t) to function f .                                sistency as σ (ai ) = σ (−i), σ (ai ) = σ (a j ) and σ ( j) >
2) Collect all ite conditions cs0 , . . . , csj (ct0 , . . . , ctl ) on            σ (−i), but σ (a j ) = σ ( j). We then generate lemma
    path ps (pt ) that were > under given assignment σ .                            j > 0 → a j = j.
3) Collect all ite conditions cs0 , . . . , csk (ct0 , . . . , ctm ) on
    path ps (pt ) that were ⊥ under given assignment σ .                                               VIII. E XPERIMENTS
We generate the following (in general symbolic)                                       We applied our lemmas on demand approach for
lemma:                                                                             λ -terms on three different benchmark categories: (1)
 j
 ^             k
               ^              l
                              ^             m
                                            ^              n
                                                           ^                       crafted, (2) SMT’12, and (3) application. For the crafted
       csi ∧         ¬csi ∧         cti ∧         ¬cti ∧         ai = bi → s = t   category, we generated benchmarks using SMT-LIB v2
 i=0           i=0            i=0           i=0            i=0                     macros, where the instances of the first benchmark set
   Assume that rule B was violated by a function                                   (macro blow-up) tend to blow up in formula size if
application s := λȳ (a0 , . . . , an ) ∈ ρ(λx̄ ). We obtained                     SMT-LIB v2 macros are treated as C-style macros.
t := λx̄ [x0 \a0 , . . . , xn \an ]p and collect all conditions that               The benchmark sets fisher-yates SAT and fisher-yates
lead to the conflict as follows.                                                   UNSAT encode an incorrect and correct but naive im-
1) Collect ite conditions cs0 , . . . , csj and cs0 , . . . , csk for s            plementation of the Fisher-Yates shuffle algorithm [11],
    as in steps 1-3 above.                                                         where the instances of the fisher-yates SAT also tend
                                                                                   to blow up in the size of the formula if SMT-LIB
2) Collect all ite conditions ct0 , . . . , ctl that evaluated to
                                                                                   v2 macros are treated as C-style macros. The SMT’12
    > under current assignment σ when partially β -
                                                                                   category consists of all non-extensional QF AUFBV
    reducing λx̄ to obtain t.
                                                                                   benchmarks used in the SMT competition 2012. For
3) Collect all ite conditions ct0 , . . . , ctm that evaluated
                                                                                   the application category, we considered the instantia-
    to ⊥ under current assignment σ when partially β -
                                                                                   tion benchmarks1 generated with LLBMC as presented
    reducing λx̄ to obtain t.
                                                                                   in [10]. The authors also kindly provided the same
We generate the following (in general symbolic)                                    benchmark family using λ -terms as arrays, which is
lemma: j               k             l      m                                      denoted as lambda.
          ^             ^              ^             ^
                csi ∧         ¬csi ∧         cti ∧         ¬cti → s = t               We performed all experiments on 2.83GHz Intel Core
          i=0           i=0            i=0           i=0                           2 Quad machines with 8GB of memory running Ubuntu
   Example 5: Consider formula ψ1 and its prepro-                                  12.04.2 setting a memory limit of 7GB and a time limit
cessed formula abstraction αλ (ψ1 ) from Ex. 1. For the                            for the crafted and the SMT’12 benchmarks of 1200
sake of better readability, we will use λx and λy to                               seconds. For the application benchmarks, as in [10]
denote functions f and g, and further use ai and a j
as a shorthand for αλ (applyi ) and αλ (applyj ). Assume                             1 http://llbmc.org/files/downloads/vstte-2013.tgz



                                                                                                                                         35
                                                    Time     Space                                                  Time     Space
                 Solver         Solved   TO   MO                                Solver          Solved   TO   MO
                                                   [103 s]    [GB]                                                 [103 s]    [GB]
                 Boolector        100     0    0     24.2       9.4             Boolector         139    10    0     19.9      14.8




                                                                       SMT’12
 macro blow-up

                 Boolectornop     100     0    0     18.2       8.4             Boolectornop      134    15    0     26.3      14.5
                 Boolectorβ        28    49   23     91.5    160.0              Boolectorβ        137    11    1     21.5      22.7
                 CVC4              21     0   79     95.7    551.6              Boolectorsc12     140     9    0     15.9      10.3
                 MathSAT           51     2   47     64.6    395.0
                 SONOLAR           26    74    0     90.2       1.7              TABLE II: Results SMT’12 benchmark.
                 Z3                21     0   79     95.0    552.2
                 Boolector          7    10    1     14.0       7.5   mark set is SONOLAR. However, it is not clear how
                 Boolectornop       4    13    1     17.3       7.0
 fisher-yates




                 Boolectorβ         6     1   11     15.0      76.4   SONOLAR handles SMT-LIB v2 macros. Surprisingly,
     SAT




                 CVC4               5     1   12     15.7      83.6   on these benchmarks Boolectornop performs better than
                 MathSAT            6    10    2     14.7      17.3   Boolector with optimization rule P, which needs fur-
                 SONOLAR            3    14    1     18.1       6.9
                 Z3                 6    12    0     14.8       0.2   ther investigation. On the fisher-yates SAT benchmarks
                 Boolector          5    13    1     17.4       7.1   Boolector not only solves the most instances, but re-
                 Boolectornop       4    14    1     18.2       6.9
 fisher-yates




                                                                      quires 107 seconds for the first 6 instances, for which
   UNSAT




                 Boolectorβ         9     0   10     12.1      72.0
                 CVC4               3     4   12     19.2      82.1   Boolectorβ , MathSAT and Z3 need more than 300
                 MathSAT            6    11    2     15.9      14.7   seconds each. Boolectornop does not perform as well
                 SONOLAR            3    15    1     19.2       6.8
                 Z3                10     9    0     11.2       2.2   as Boolector due to the fact that on these benchmarks
                                                                      optimization rule P is heavily applied. In fact, on these
                 TABLE I: Results crafted benchmark.
                                                                      benchmarks, rule P applies to approx. 90% of all prop-
we used a time limit of 60 seconds. We evaluated                      agated function applications on average. On the fisher-
four different versions of Boolector: (1) our lemmas                  yates UNSAT benchmarks Z3 and Boolectorβ solve the
on demand for λ -terms approach DPλ (Boolector),                      most instances, whereas Boolector and Boolectornop do
(2) DPλ without optimization rule P (Boolectornop ),                  not perform so well. This is mostly due to the fact
(3) DPλ with full β -reduction (Boolectorβ ), and (4)                 that these benchmarks can be simplified significantly
the version submitted to the SMT competition 2012                     when macros are eagerly eliminated, whereas partial
(Boolectorsc12 ). For comparison we used the following                β -reduction does not yield as much simplifications.
SMT solvers: CVC4 1.2, MathSAT 5.2.6, SONOLAR                         We measured overhead of β -reduction in Boolector on
2013-05-15, STP 1673 (svn revision), and Z3 4.3.1.                    these benchmarks and it turned out that for the macro
Note that we limited the set of solvers to those which                blow-up and fisher-yates UNSAT instances the overhead
currently support SMT-LIB v2 macros and the theory                    is negligible (max. 3% of total run time), whereas for
of fixed-size bit vectors. As a consequence, we did not               the fisher-yates SAT instances β -reduction requires over
compare our approach to UCLID (no bit vector support)                 50% of total run time.
and Yices, which both have native λ -term support, but                   Table II summarizes the results of running all
lack support for the SMT-LIB v2 standard.                             four Boolector versions on the SMT’12 benchmark
   As indicated in Tables I, II and III, we measured the              set. We compared our three approaches Boolector,
number of solved instances (Solved), timeouts (TO),                   Boolectornop , and Boolectorβ to Boolectorsc12 , which
memory outs (MO), total CPU time (Time), and total                    won the QF AUFBV track in the SMT competition
memory consumption (Space) required by each solver                    2012. In comparison to Boolectorβ , Boolector solves 5
for solving an instance. If a solver ran into a timeout,              unique instances, whereas Boolectorβ solves 3 unique
1200 seconds (60 seconds for category application)                    instances. In comparison to Boolectorsc12 , both solvers
were added to the total time as a penalty. In case of a               combined solve 2 unique instances. Overall, on the
memory out, 1200 seconds (60 seconds for application)                 SMT’12 benchmarks Boolectorsc12 still outperforms
and 7GB were added to the total CPU time and total                    the other approaches. However, our results still look
memory consumption, respectively.                                     promising since none of the approaches Boolector,
   Table I summarizes the results of the crafted bench-               Boolectornop and Boolectorβ are heavily optimized yet.
mark category. On the macro blow-up benchmarks,                       On these benchmarks, the overhead of β -reduction in
Boolector and Boolectornop benefit from lazy λ -term                  Boolector is around 7% of the total run time.
handling and thus, outperform all those solvers which                    Finally, Table III summarizes the results of the appli-
try to eagerly eliminate SMT-LIB v2 macros with a                     cation category. We used the benchmarks obtained from
very high memory consumption as a result. The only                    the instantiation-based reduction approach presented
solver not having memory problems on this bench-                      in [10] (instantiation benchmarks) and compared our

                                                                                                                                36
                                                       Time    Space
                   Solver           Solved   TO   MO                   in the SMT competition 2012. With the application
                                                         [s]   [MB]
                   Boolector           37     8    0    576      235   benchmarks, we demonstrated the potential of native
  instantiation

                   Boolectornop        35    10    0    673      196   λ -term support within an SMT solver. Our experiments
                   Boolectorβ          44     1    0    138      961
                   Boolectorsc12       39     6    0    535      308   look promising even though we employ a rather naive
                   STP                 44     1    0    141     3814   implementation of β -reduction in Boolector and also
                   Boolector           37     8    0    594      236
                   Boolectornop        35    10    0    709      166   do not incorporate any λ -term specific rewriting rules
  lambda




                   Boolectorβ          45     0    0     52      676   except full β -reduction.
                   Boolectorsc12        -     -    -       -       -      In future work we will address the performance
                   STP                  -     -    -       -       -
                                                                       bottleneck of the β -reduction implementation and will
                  TABLE III: Results application benchmarks.           further add λ -term specific rewriting rules. We will
                                                                       analyze the impact of various β -reduction strategies on
new approaches to STP, the same version of the solver
                                                                       our lemmas on demand procedure and will further add
that outperformed all other solvers on these benchmarks
                                                                       support for extensionality over λ -terms. Finally, with
in the experimental evaluation of [10]. On the instanti-
                                                                       the recent and ongoing discussion within the SMT-LIB
ation benchmarks Boolectorβ and STP solve the same
                                                                       community to add support for recursive functions, we
number of instances in roughly the same time. However,
                                                                       consider extending our approach to recursive λ -terms.
Boolectorβ requires less memory for solving those
instances. Boolector, Boolectornop and Boolectorsc12                                  X. ACKNOWLEDGEMENTS
did not perform so well on these benchmarks because                      We would like to thank Stephan Falke, Florian Merz
in contrast to Boolectorβ and STP, they do not ea-                     and Carsten Sinz for sharing benchmarks and Bruno
gerly eliminate read operations, which is beneficial on                Duterte for explaining the implementation and limits
these benchmarks. The lambda benchmarks consist of                     of lambdas in SMT solvers, and more specifically in
the same problems as instantiation, using λ -terms for                 Yices.
representing arrays. On these benchmarks, Boolectorβ
clearly outperforms Boolector and Boolectornop and                                             R EFERENCES
solves all 45 instances within a fraction of time.                      [1] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jo-
Boolectorsc12 and STP do not support λ -terms as arrays                     vanovic, T. King, A. Reynolds, and C. Tinelli. CVC4. In
                                                                            CAV, volume 6806 of LNCS, pages 171–177. Springer, 2011.
and therefore were not able to participate on this bench-               [2] A. Biere and R. Brummayer. Consistency Checking of All
                                                                            Different Constraints over Bit-Vectors within a SAT Solver.
mark set. By exploiting the native λ -term support for                      In FMCAD, pages 1–4. IEEE, 2008.
arrays in Boolectorβ , in comparison to the instantiation               [3] R. Brummayer and A. Biere. Lemmas on Demand for the
                                                                            Extensional Theory of Arrays. JSAT, 6(1-3):165–201, 2009.
benchmarks we achieve even better results. Note that on                 [4] R. E. Bryant, S. K. Lahiri, and S. A. Seshia. Modeling and
the lambda (instantiation) benchmarks, the overhead in                      Verifying Systems Using a Logic of Counter Arithmetic with
                                                                            Lambda Expressions and Uninterpreted Functions. In CAV,
Boolectorβ for applying full β -reduction was around                        volume 2404 of LNCS, pages 78–92. Springer, 2002.
                                                                        [5] A. Cimatti, A. Griggio, B. J. Schaafsma, and R. Sebastiani.
15% (less than 2%) of the total run time.                                   The MathSAT5 SMT Solver. In TACAS, volume 7795 of
   Benchmarks, binaries of Boolector and all log files                      LNCS, pages 93–107. Springer, 2013.
                                                                        [6] L. De Moura and N. Bjørner. Z3: an efficient SMT solver.
of our experiments can be found at: http://fmv.jku.at/                      In Proc. ETAPS’08, pages 337–340, 2008.
difts-rev-13/lloddifts13.tar.gz.                                        [7] L. M. de Moura, H. Rueß, and M. Sorea. Lazy Theorem
                                                                            Proving for Bounded Model Checking over Infinite Domains.
                                                                            In CADE, volume 2392 of LNCS. Springer, 2002.
                                                                        [8] B. Dutertre and L. de Moura. The Yices SMT solver. Tool
                                   IX. C ONCLUSION                          paper at http://yices.csl.sri.com/tool-paper.pdf, Aug. 2006.
                                                                        [9] N. Eén and N. Sörensson. Temporal induction by incremental
   In this paper, we introduced a new decision procedure                    SAT solving. ENTCS, 89(4):543–560, 2003.
for handling non-recursive and non-extensional λ -terms                [10] S. Falke, F. Merz, and C. Sinz. Extending the Theory of
                                                                            Arrays: memset, memcpy, and Beyond. In Proc. VSTTE’13.
as a generalization of the array decision procedure                    [11] R. Fisher and F. Yates. Statistical tables for biological,
                                                                            agricultural and medical research. Oliver and Boyd, 1953.
presented in [3]. We showed how arrays, array op-                      [12] V. Ganesh and D. L. Dill. A Decision Procedure for Bit-
erations and SMT-LIB v2 macros are represented in                           Vectors and Arrays. In Proc. CAV’07. Springer-Verlag, 2007.
                                                                       [13] F. Lapschies, J. Peleska, E. Gorbachuk, and T. Mangels.
Boolector and evaluated our new approach with 3                             SONOLAR SMT-Solver. System Desc. SMT-COMP’12. http:
different benchmark categories: crafted, SMT’12 and                         //smtcomp.sourceforge.net/2012/reports/sonolar.pdf, 2012.
                                                                       [14] J. McCarthy. Towards a Mathematical Science of Computa-
application. The crafted category showed the benefit                        tion. In IFIP Congress, pages 21–28, 1962.
                                                                       [15] O. Ohrimenko, P. J. Stuckey, and M. Codish. Propagation via
of lazily handling SMT-LIB v2 macros where eager                            lazy clause generation. Constraints, 14(3):357–391, 2009.
macro elimination tends to blow-up the formula in size.                [16] R. Sebastiani. Lazy Satisability Modulo Theories. JSAT, 3(3-
                                                                            4):141–224, 2007.
We further compared our new implementation to the                      [17] S. A. Seshia. Adaptive Eager Boolean Encoding for Arith-
version of Boolector that won the QF AUFBV track                            metic Reasoning in Verification. PhD thesis, CMU, 2005.


                                                                                                                                 37
        CHIMP: a Tool for Assertion-Based Dynamic
            Verification of SystemC Models

                  Sonali Dutta                           Deian Tabakov                             Moshe Y. Vardi
               Rice University                            Schlumberger                            Rice University
         Email:Sonali.Dutta@rice.edu               Email:deian@dtabakov.com                    Email:vardi@cs.rice.edu


   Abstract—CHIMP is a tool for assertion-based dynamic              The growing popularity of SystemC has motivated
verification of SystemC models. The various features of          research aimed at assertion-based dynamic verification
CHIMP include automatic generation of monitors from              (ABDV) of SystemC models [9]. ABDV involves two
temporal assertions, automatic instrumentation of the            steps: generating run-time monitors from input asser-
model-under-verification (MUV), and three-way commu-
                                                                 tions [8], and executing the model-under-verification
nication among the MUV, the generated monitors, and the
SystemC simulation kernel during the monitored execution
                                                                 (MUV) while running the monitors along with the model.
of the instrumented MUV. Empirical results show that             The monitors observe the execution of the MUV and
CHIMP puts minimal runtime overhead on the monitored             report if the observed behavior is consistent with the
execution of the MUV.                                            specified behavior. For discussion of related work in
   A newly added path in CHIMP results in a significant
                                                                 ABDV of SystemC see [7].
(over 75%) reduction of average monitor generation and               CHIMP, available as an open-source tool1 , imple-
compilation time. The average size of the monitors is re-
                                                                 ments the monitoring framework for temporal SystemC
duced by over 60%, without increasing runtime overhead.
                                                                 properties described in [9]–see discussion below. CHIMP
                                                                 consists of (1) off-the-self components, (2) modified com-
                   I.   I NTRODUCTION                            ponents, and (3) an original component. CHIMP has
                                                                 two off-the-self components: (1) spot-1.1.1 [3], a
    SystemC (IEEE Standard 1666-2005) has emerged as             C++ library used for LTL-to-Büchi conversion , and (2)
a de facto standard for modeling of hardware/software            AspectC++-1.1 [6], a C++ aspect compiler that is
systems [4], supporting different levels of abstraction,         used for instrumentation of the MUV. CHIMP has two
iterative model refinement, and execution of the model           modified components: (1) a patched version of the OSCI
during each design stage. SystemC is implemented as              kernel-2.2 to facilitate communication between the
a C++ library with macros and base classes for mod-              kernel and the monitors [9], and (2) an extension of
eling processes, modules, channels, signals, ports, and          Automaton-1.11 [5], a Java tool used for deter-
the like, while an event-driven simulation kernel allows         minization and minimization of finite automata, with the
efficient simulation of concurrent models. Thus, on one          ability to read automata descriptions from file. Finally,
hand, SystemC leverages the natural object-oriented en-          the original component is MONASGEN, a C++ tool for
capsulation, data hiding, and well-defined inheritance           automatic generation of monitors from assertions [8] and
mechanism of C++, and, on the other hand, it allows              for automatic generation of an aspect advice file for
modeling and efficient simulation of hardware/software           instrumentation [11]. Fig. 1 shows the five components of
designs by its simulation kernel and predefined hardware-        CHIMP as described above. The component Automaton-
specific macros and classes. The SystemC code is avail-          1.11 is dotted because a newly added improved path
able as open source, including a single-core reference           in CHIMP has been able to remove the dependency of
simulation kernel, referred to as the OSCI kernel; see           CHIMP on Automaton-1.11 (explained in Section V).
http://www.accellera.org/downloads/standards/systemc.
                                                                     CHIMP takes the MUV and a set of temporal as-
  Work supported in part by NSF grants CNS 1049862 and CCF-      sertions about the behavior of that MUV as inputs,
1139011, by NSF Expeditions in Computing project ”ExCAPE:
Expeditions in Computer Augmented Program Engineering”, by BSF
                                                                   1
grant 9800096, and by gift from Intel.                                 www.sourceforge.net/projects/chimp-rice



                                                                                                                     38
                                                                 II.   A SSERTIONS : S YNTAX AND S EMANTICS
                                                                CHIMP accepts input assertions, defined using the
                                                            temporal specification framework for SystemC described
                                                            in [10], where a temporal assertion consists of a linear
                                                            temporal formula accompanied by a Boolean expression
                                                            serving as a clock. This supports assertions written at
                                                            different levels of abstraction with different temporal
                                                            resolutions. The framework of [10] proposes a set of
                                                            SystemC-specific Boolean variables that refer to Sys-
                                                            temC’s software features and its simulation semantics;
                                                            see examples below. Input assertions in CHIMP are of the
                                                            form “hLT L f ormulai@hclock expressioni”, where
                                                            the LT L f ormula expresses a temporal property and the
                                                            clock expression denotes when CHIMP should sample
                                                            the execution trace of the MUV.
Fig. 1.   CHIMP components
                                                                Several of the additional Boolean variables pro-
                                                            posed in [10] refer to the simulation phases. Accord-
                                                            ing to SystemC’s formal semantics, there are 18 pre-
and outputs “FAILED” or “NOT FAILED”, for each
                                                            defined kernel phases. CHIMP has Boolean variables
assertion. CHIMP performs white-box validation of user
                                                            that enable referring to these phases. For example
code and black-box validation of library code. (If a user
                                                            MON DELTA CYCLE END denotes the end of each
wishes to do white-box validation of library code, it can
                                                            delta cycle and MON THREAD SUSPEND denotes the
be accomplished by treating the library code as part of
                                                            suspension moment of each thread. By using these vari-
the user code.) The two main features of CHIMP are: (1)
                                                            ables as clock expressions, we can sample the execu-
CHIMP generates C++ monitors, one for each assertion
                                                            tion trace at different temporal resolutions. (By default,
to be verified, and (2) CHIMP automatically instruments
                                                            CHIMP samples at each kernel phase.) Other Boolean
the user model [11] to expose the user model’s states and
                                                            variables refer to SystemC events, which are key ele-
syntax to the monitors. CHIMP puts nominal overhead
                                                            ments of SystemC’s event-driven simulation semantics.
on the runtime of the MUV, supports a rich set of
                                                            CHIMP is also able to sample the execution at various
assertions, and can handle a wide array of abstractions,
                                                            key locations, e.g., function calls and function returns.
from statement level to system level.
                                                            This gives the user the flexibility to write assertions at
   A recently added path in CHIMP from LTL to               different levels of abstraction, from the level of individual
monitor can bypass the old performance bottleneck,          C++ statements to the level of SystemC kernel phases.
Automaton-1.1, and improves the monitor generation and         The Boolean primitives supported by CHIMP are
compilation time by a significant amount. It also reduces   summarized below; see, [10] and [11] for further details:
the size of the generated monitors notably. This entirely
                                                                Function primitives: Let f() be a C++ function in the
removes the dependency of CHIMP on Automaton-1.1.
                                                            user or library code. The Boolean primitives f:call and
   The theoretical and algorithmic foundations of           f:return refer to locations in the source code that contain
CHIMP were described in [8]–[11]. In this paper we          the function call, and to locations immediately after
describe the architecture, usage, and recent evolution      the function call, respectively. The primitives f:entry
of CHIMP. The rest of the paper is organized as fol-        and f:exit refer to the locations immediately before the
lows. Section II describes the syntax and semantics of      first executable statement and immediately after the last
assertions. Section III presents an overall picture about   executable statement in f(), respectively. If f() is a
the usage, implementation and performance of CHIMP.         library function then the entry and exit primitives are not
Section IV describes the C++ monitor generated by           supported (black-box verification model for libraries).
CHIMP. Section V describes the new improved path in            Value primitives: If a function f() has k arguments,
CHIMP and its improved performance. Finally, Section        CHIMP defines variables f : 1, . . . , f : k , where the
VI presents a summary and talks about future work.          value and type of f : i are equal to the value and type


                                                                                                               39
of the ith parameter of function f() before executing
the first statement in the definition of f(). CHIMP
also defines the variable f : 0, whose value and type
are equal to the value and type of the object that f()
returns. For example, if the function int divide(int
dividend, int divisor) is defined in the MUV,
then the formula G (division:entry -> “division:2 !=
0”) asserts that the divisor is nonzero whenever division
function starts execution.
   Phase primitives: The user can refer to the 18 pre-
defined kernel states [10] in the assertions to spec-
ify when the state of the MUV should be sam-
pled. For example, the assertion G (“p == 0”) @
MON DELTA CYCLE END requires the value of vari-
able p to be zero at the end of every delta cycle.
   Event primitives: For each SystemC event E , CHIMP            Fig. 2.   Monitor generation flow
provides a Boolean primitive E.notified that is true
only when the OSCI kernel actually notifies E . For ex-
ample, the assertion G (s.value changed event().notified)
@MON UPDATE PHASE END says that the signal s
changes value at the end of every update phase.

III.   U SAGE , I MPLEMENTATION AND P ERFORMANCE
    Running CHIMP consists of three steps: (1) In the
first step, the user writes a configuration file containing
all assertions to be verified, as well as other necessary
information. The user can also provide inputs through
command-line switches. For each LTL assertion in the
configuration file, MONASGEN first generates a non-
deterministic Büchi automaton on words (NBW) using
SPOT, then converts that NBW to a minimal deter-
ministic finite automaton on words (DFW), using the
Automaton-1.11 tool for determinization and minimiza-
tion. Then, MONASGEN generates the C++ monitors
from the DFW, one monitor for each assertion (Fig. 2).           Fig. 3.   MUV instrumentation flow
(2) MONASGEN produces an aspect-advice file that is
then used by AspectC++ to generate the instrumented
MUV (Fig. 3). (3) Finally, the monitors and the instru-          of the system submit requests and the system uses a
mented MUV are compiled together and linked to the               randomly generated flight database to find a direct flight
patched OSCI kernel, and the code is then run to execute         or a sequence of up to three connecting flights. Those are
the simulation with inputs provided by user. The inputs          returned to the user for approval, payment and booking.
can be generated using any standard stimuli-generation           This model is intended to run forever. It is inspired
technique. For every assertion, CHIMP produces output            by actual request/grant subsystems currently used in
indicating if the assertion held or failed for that particular   hardware design.
input (Fig. 4).
                                                                    We used a patched version of the 2.2.0 OSCI kernel
   For experimental evaluation we used a SystemC                 and compiled it using the default settings in the Makefile.
model with about 3,000 LOC, implementing a system                The empirical results below were measured on a Pentium
for reserving and purchasing airplane tickets. The users         4, 3.20GHz CPU, 1 GB RAM machine running GNU


                                                                                                                  40
                                                                                                                  ARS: Overhead per monitor for Properties (1) and (2)
                                                                                                        1.4
                                                                                                                                                          Property 1 (safety)




                                                                 Overhead per monitor (% of baseline)
                                                                                                                                                          Property 2 (liveness)
                                                                                                        1.2
                                                                                                                                                          Both properties

                                                                                                         1

                                                                                                        0.8

                                                                                                        0.6

                                                                                                        0.4

                                                                                                        0.2

                                                                                                         0 1                                    2                                     3
                                                                                                         10                                    10                              10
                                                                                                                                     Number of monitors


                                                               Fig. 5. Run time overhead for monitoring Properties (1) and
                                                               (2)
Fig. 4. Running instrumented MUV with monitors using patched
OSCI kernel
                                                                                                                   −4
                                                                                                              x 10
                                                                                                         2


Linux. To assess runtime overhead imposed by the mon-                                                   1.8



                                                                  Percentage overhead per call
itors, we measured the effect of running with different                                                 1.6

assertions and also increasing number of assertions [9].                                                1.4
In each case we first ran the model without monitors and
                                                                                                        1.2
user-code instrumentation to establish the baseline, and
                                                                                                         1
then ran several simulations with instrumentation and an
increasing number of copies of the same monitor. The                                                    0.8

results report runtime overhead per monitor as a per-                                                   0.6

centage of the baseline. (For these experiments monitors                                                0.4
were constructed manually.)
                                                                                                        0.2
                                                                                                              0         0.5   1      1.5   2        2.5    3     3.5     4        4.5
   The first property we checked is a safety                                                                                      Number of monitor calls                    x 10
                                                                                                                                                                                  5



property asserting that whenever the event
new requests nonfull is notified, the corresponding            Fig. 6.     Instrumentation overhead per monitor call
queue new planning requests must have space for at             as a percentage of the baseline run time. Y axis
                                                               is ((instrumentation overhead − baseline)/(baseline ×
least one request.                                             number of monitors)) × 100%

 G "new_planning_requests.size() < capacity"
@ new_requests_nonfull.notified
                                                        (1)
                                                               the above two properties as a percentage of the baseline.
    The second property says that the system must propa-       Checking Property 2 is relatively expensive because it
gate each request through each channel (or through each        requires a lot of communication from the model to the
module) within 5 cycles of the slow clock. This property       monitor. It is shown in [1] that the finite state monitors
is a conjunction of 16 bounded liveness assertions similar     have less runtime overhead than transaction-based moni-
to the one shown here.                                         tors generated by tools like Synopsys VCS.

...                                                                We also evaluated the cost of instrumentation sepa-
// Propagate through module within 5 clock ticks               rately by simulating one million SystemC clock cycles
ALWAYS (io_module_receive_transaction($1) ->                   with focus on the overhead of instrumentation [11]. The
 ( within [5 slow_clock.pos()]
    io_module_send_to_master($2) & ($1 == $2)
                                                               average wall-clock execution time of the system over 10
 ) AND ...                                                     runs without instrumentation was 33 seconds. We call
                                                        (2)
                                                               this “baseline execution”. Fig. 6 shows the cost of the
   Fig. 5 presents the runtime overhead of monitoring          instrumentation per monitor call, as a percentage of the


                                                                                                                                                                                          41
baseline execution. The data suggest that there is a fixed
cost of the instrumentation, which, when amortized over
more and more calls, leads to lower average cost. The
average cost per call stabilizes after 300,000 calls, and is
less than 0.5 × 10−4 %.
    Another testbench we used is an Adder model that
implements the squaring function by repeated increment
by 1. It uses all three kinds of event notifications. It is    Fig. 7. The DFW generated by MONASGEN from the LTL
scalable as it spawns a SystemC thread for each addition       formula G(a → Xb). All the states are accepting. The DFW
                                                               rejects bad prefixes by no available transition. State 0 is the
in the squaring function. We monitored several properties      initial state.
of the Adder model using CHIMP.
   To study the effect of monitor encoding on runtime
                                                               possible transitions from each state are encoded using an
overhead, Tabakov and Vardi [8] describes 33 different
                                                               inner if-elseif block, the condition statement being the
monitor encodings, and shows that front det switch en-
                                                               guard on a transition.
coding with state minimization and no alphabet mini-
mization is the best in terms of runtime overhead. The            As a running example, we show how the monitor step()
downside is that this flow suffers from a rather slow          function is encoded for an assertion G(a → Xb) @ clk .
monitor generation and compilation time. This led us to        clk here is some boolean expression specifying when the
develop a new flow for the tool, as described below.           step() function needs to be called during the simulation.
                                                               This assertion asserts that, always, if a is true, then in the
                IV.   CHIMP MONITORS                           next clock cycle b has to be true. The DFW generated by
                                                               CHIMP for this assertion is shown in Fig. 7.
    In CHIMP, the LT L f ormula in every assertion
hLT L f ormulai @ hclock expressioni is converted                  Listing 1 shows the step() function of the C++ monitor
into a C++ monitor class. Each C++ monitor has a step()        generated by CHIMP for the assertion G(a → Xb) @
function that implements the transition function of the        clk . The variables current_state and next_state
DFW generated from the LT L f ormula (as described             are member variables of the monitor class and store the
in Fig. 9). This step() function is called at every sam-       current automaton state and next automaton state respec-
pling point defined by clock expression. The monitor.h         tively. If next state becomes −1 after the execution of the
and monitor.cc files, generated by MONASGEN, contains          step() function, it means that no transition can be made
one monitor class for every assertion and a class called       from the current state. In this case, the monitor outputs
local_observer that is responsible for invoking the            ”FAILED”. The initial state 0 of the monitor is assigned
callback function, which invokes step() function of the ap-    to the variable current_state inside the constructor
propriate monitor class at the right sampling point during     of the monitor class as shown in Listing 2.
the monitored simulation. Different encodings have been
                                                               Listing 1.   The step() function of the monitor for G(a → Xb)
explored for writing the monitor’s step() function. For the
                                                               /∗ ∗
new flow of CHIMP (described below), Fig. 13 shows that
                                                               ∗ Simulate a step of the monitor .
front det ifelse encoding is the best among all of them
                                                               ∗/
in terms of runtime overhead.
                                                               void
    In front det ifelse encoding, the C++ monitor pro-         monitor0 : : step ( ) {
duced is deterministic in terms of transitions. This means        / / I f t h e p r o p e r t y has not f a i l e d y e t
from one state, either one or no transition is possible. If       i f ( s t a t u s == NOT FAILED ) {
no transition is possible from a state, the monitor rejects           / / number o f s t e p s e x e c u t e d s o f a r
and outputs ”FAILED”. Else, the monitor keeps executing               n u m s t e p s ++;
until the end of simulation and outputs ”NOT FAILED”.                 / / assign next s t a t e to current s t a t e
In front det ifelse encoding, each state of the monitor               current state = next state ;
is encoded as an integer, from 0 upto the total number of             / / make n e x t s t a t e i n v a l i d
states. The step() function of the monitor uses an outer              n e x t s t a t e = − 1;
if-elseif statement block to determine the next state. The


                                                                                                                         42
             i f ( c u r r e n t s t a t e == 0 ) {
                if (!( a) )
                    { next state = 0; }
                else if (( a) )
                    { next state = 1; }
             } / / i f ( c u r r e n t s t a t e == 0 )

             e l s e i f ( c u r r e n t s t a t e == 1 ) {
                 i f ( ( b ) && ! ( a ) )                        Fig. 8.   Old LTL to monitor path in CHIMP
                       { next state = 0; }
                 e l s e i f ( ( a ) && ( b ) )
                       { next state = 1; }
             } / / i f ( c u r r e n t s t a t e == 1 )
                                                                 Fig. 9.   New LTL to monitor path in CHIMP
    / / FAILED i f no t r a n s i t i o n p o s s i b l e
      b o o l n o t s t u c k = ( n e x t s t a t e ! = − 1);
      if (! not stuck ) {
          property failed ();                                    uses SPOT to convert the LTL formula to a nondeter-
      }                                                          ministic Büchi Automaton (NBW). Then MONASGEN
  } / / i f ( s t a t u s == NOT FAILED )                        prunes the NBW to remove all states that do not lead to
} / / step ()                                                    any accepting state, and generate the corresponding non-
                                                                 deterministic finite automaton (NFW), see [2]. MONAS-
                                                                 GEN then uses the Automaton Tool to determinize and
Listing 2.    The constructor of the monitor for G(a → Xb)       minimize the NFW to generate minimal deterministic
/∗ ∗                                                             finite automaton (DFW). Finally MONASGEN converts
 ∗ The c o n s t r u c t o r                                     the DFW to C++ monitor.
 ∗/
monitor0 : : monitor0 ( . . . ) :                                    The main bottleneck in this flow was minimization
sc core : : mon prototype ( ) {                                  and determination of NFW using the Automaton tool as
   next state = 0; / / i n i t i a l              s t a t e id   it consumes 90% of total monitor generation and com-
   c u r r e n t s t a t e = − 1;                                pilation time. Also, the Automaton tool may generate
   s t a t u s = NOT FAILED ;                                    multiple edges between two states, resulting in quite large
   num steps = 0;                                                monitors. The newest version of CHIMP introduces a
   . . .                                                         new path as shown in Fig. 9, which bypasses Automaton
} / / Constructor                                                tool completely and uses only SPOT. So the component
                                                                 Automaton-1.11 in Fig. 1 is not needed by CHIMP any-
    The sampling points can be kernel phases, e.g.,              more. This new path leverages the new functionality of
MON DELTA CYCLE END, or event notification, e.g.,                SPOT-1.1.1 to convert an LTL formula to a minimal DFW
E.notified (E is an event). In such cases, the OSCI kernel       that explicitly rejects all bad prefixes.
needs to communicate with the local_observer at
                                                                     After replacing the Automaton Tool by SPOT to con-
the right time (when a delta cycle ends or when event E
                                                                 vert the NBW to minimal DFW, the improvement in
is notified) to call the step() function of the monitor. This
                                                                 compilation time and monitor size is evident. This new
communication is done by the patch, put on OSCI ker-
                                                                 flow results in 75.93% improvement in monitor genera-
nel. This patch contains a class called mon_observer,
                                                                 tion and compilation time. To evaluate the performance of
which communicates with the local_observer class
                                                                 the revised CHIMP, we use the same set of 162 pattern
on behalf of the OSCI kernel.
                                                                 formulas and 1200 random formulas as mentioned in [8].
                                                                 The scatter plot on Fig. 10 shows the comparison of
                 V.    E VOLUTION OF CHIMP
                                                                 monitor generation and compilation time of the new flow
   Fig. 8 shows the old path of generating C++ monitor           vs the old flow. Most of the points are above the line with
from an LTL property in CHIMP. First, MONASGEN                   slope 1, which indicates that for most of the monitors, the


                                                                                                                  43
generation and compilation time by old CHIMP is more
than by new CHIMP. Fig. 10 - Fig. 14 are all scatter plots2
and interpreted in the same way.
   The new flow also merges multiple edges between
two states into one edge guarded by the disjunction of
the guards on all edges between the states. In this way
the average reduction in monitor size in bytes is 61.27%.
Fig. 11 shows the comparison of the size in bytes of the
monitors generated by the new flow vs the old flow.
    Since the main focus of CHIMP has always been
minimizing runtime overhead, we need to ensure that this
new flow does not incur more runtime overhead compared
to the old flow as a cost of reduced monitor generation and
compilation time. So we ran the same set of 162 pattern
formulas and 1200 random formulas as mentioned above,
to compare the runtime overhead incurred by the new flow
vs the old flow. Fig. 12 shows that the runtime overhead      Fig. 10. Comparison of monitor generation and compilation
of CHIMP using the new flow has been reduced compared         time of the old CHIMP vs new CHIMP
to the old flow. The reduction is 7.97% on average.
    As CHIMP has evolved to follow a different and more
efficient path and the monitors have been reduced in size,
it was not clear a priori which monitor encoding is the
best. We conducted the same experiment as in [8] with
the same set of formulas, 162 pattern formulas and 1200
random formulas, to identify the best encodings in terms
of both runtime overhead and monitor generation and
compilation time. We identified two new best encodings.
Fig. 13 shows that the new best encoding in terms of
runtime overhead is now front det ifelse. Fig. 14 shows
that the new best encoding in terms of monitor genera-
tion and compilation time is back ass alpha. CHIMP
now provides two configurations for monitor generation,
best runtime, which has minimum runtime overhead
and best compiletime, which has minimum monitor
generation and compilation time. Since the bigger concern
is usually runtime overhead, best runtime is the default
                                                              Fig. 11. Comparison of monitor size (bytes) generated by the
configuration, given that its monitor generation and com-     old CHIMP vs new CHIMP
pilation time is very close to that of best compiletime.

                         VI.    C ONCLUSION                   we plan to look at the possibility of verifying parametric
    We present CHIMP, an Assertion-Based Dynamic Ver-         properties, for example G(send(id) → F (receive(id)))
ification tool for SystemC models, and show that it puts      where one can pass a parameter (here id), to the variables
minimal overhead on the runtime of the MUV. We show           in the LTL formula of the assertion. The above property
how the new path in CHIMP results in significant re-          means that always if the message with ID id is sent, it
duction of monitor generation and compilation time and        should be received eventually. Also at present the user
monitor size, as well as runtime overhead. In the future      needs to know the system architecture to declare an asser-
                                                              tion. We would like to make CHIMP work with assertions
  2
      http://en.wikipedia.org/wiki/Scatter plot               that are declared in the elaboration phase.


                                                                                                                      44
                                                                        Fig. 13. Comparison of runtime overhead of front det ifelse
                                                                        encoding vs all other encodings
Fig. 12. Comparison of runtime overhead incurred by old
CHIMP vs new CHIMP



                          R EFERENCES
 [1]   R. Armoni, D. Korchemny, A. Tiemeyer, M. Vardi, and Y. Zbar.
       Deterministic dynamic monitors for linear-time assertions. In
       Proc. Workshop on Formal Approaches to Testing and Run-
       time Verification, volume 4262 of Lecture Notes in Computer
       Science. Springer, 2006.
 [2]   M. d’Amorim and G. Ro¸su. Efficient monitoring of ω-
       languages. In Proc. 17th International Conference on Com-
       puter Aided Verification, pages 364–378, 2005.
 [3]   A. Duret-Lutz and D. Poitrenaud. SPOT: An extensible model
       checking library using transition-based generalized Büchi au-
       tomata. Modeling, Analysis, and Simulation of Computer
       Systems, 0:76–83, 2004.
 [4]   C. Helmstetter, F. Maraninchi, L. Maillet-Contoz, and M. Moy.
       Automatic generation of schedulings for improving the test
       coverage of Systems-on-a-Chip. In FMCAD ’06: Proceedings
       of the Formal Methods in Computer Aided Design, pages 171–
       178, Washington, DC, USA, 2006. IEEE Computer Society.
 [5]   A.         Møller.                         dk.brics.automaton.   Fig. 14. Comparison of monitor generation time and com-
       http://www.brics.dk/automaton/, 2004.                            pile time overhead of back ass alpha encoding vs all other
 [6]   O. Spinczyk, A. Gal, and W. Schröder-Preikschat. AspectC++:     encodings
       an aspect-oriented extension to the C++ programming lan-
       guage. In CRPIT ’02: Proceedings of the Fortieth Interna-
       tional Conference on Tools Pacific, pages 53–60, Darlinghurst,   [10]   D. Tabakov, M. Vardi, G. Kamhi, and E. Singerman. A
       Australia, Australia, 2002. Australian Computer Society, Inc.           temporal language for SystemC. In FMCAD ’08: Proc. Int.
 [7]   D. Tabakov. Dynamic Assertion-Based Verification for Sys-               Conf. on Formal Methods in Computer-Aided Design, pages
       temC. PhD thesis, Rice University, Houston, 2010.                       1–9. IEEE Press, 2008.
 [8]   D. Tabakov, K. Rozier, and M. Y. Vardi. Optimized temporal       [11]   D. Tabakov and M. Y. Vardi. Automatic aspectization of
       monitors for SystemC. Formal Methods in System Design,                  SystemC. In Proceedings of the 2012 workshop on Modularity
       41(3):236–268, 2012.                                                    in Systems Software, MISS ’12, pages 9–14, New York, NY,
 [9]   D. Tabakov and M. Vardi. Monitoring temporal SystemC                    USA, 2012. ACM.
       properties. In Proc. 8th Int’l Conf. on Formal Methods and
       Models for Codesign, pages 123–132. IEEE, July 2010.



                                                                                                                               45
         Abstraction-Based Livelock/Deadlock Checking for
                       Hardware Verification

                                              In-Ho Moon and Kevin Harer
                                                          Synopsys Inc.
                                            {mooni, kevinh}@synopsys.com


ABSTRACT                                                           has any incoming edges to the SCC in the state transi-
Livelock/deadlock is a well known and important problem            tion graph representing a hardware design. However, even
in both hardware and software systems. In hardware verifi-          though the method in [27] is an improved method from its
cation, a livelock is a situation where the state of a design      previous work [13] in symbolic approaches, it is still infea-
changes within only a smaller subset of the states reachable       sible to apply the method to the industrial designs, simply
from the initial states of the design. Deadlock is a special       due to the capacity problem of symbolic methods. In gen-
case in which there is only one state in a livelock. However,      eral, any BDD-based method can handle only up to several
livelock/deadlock checking has never been actively used in         hundred latches without any abstraction or approximation
hardware verification in practice, mainly due to the com-           techniques, whereas there can be millions of latches in in-
plexity of the computation which involves finding strongly          dustrial designs.
connected components.                                                 In this paper, we first present an improved BDD-based
   This paper presents a practical abstraction-based live-         algorithm finding TSCCs from[27] in the following aspects.
lock/deadlock checking algorithm for hardware verification.         First, initial state is taken into account in finding TSCCs.
The proposed livelock/deadlock checking works on FSMs              Especially, the improved algorithm handles multiple initial
rather than the whole design. For each FSM, we make an             states efficiently. Secondly, unreachable TSCCs are distin-
abstract machine of manageable size from the cone of influ-         guished from reachable TSCCs which are more interesting to
ence of the FSM. Once a livelock is found on an abstract           designers. Thirdly, we provide more intuitive state classifi-
machine, the livelock is justified on the concrete machine          cation as main, transient, and livelock groups as opposed to
with trace concretization. Experimental results shows that         transient and recurrence classes in [27]. In our classification,
the proposed abstraction-based livelock checking finds real         a set of transient states is further classified into main and
livelock errors in industrial designs.                             transient groups. Recurrence class in [27] is mapped into
                                                                   livelock group in our classification. Main group is an SCC
1. INTRODUCTION                                                    that contains the initial state. Transient group is a set of
   Livelock/deadlock is a well known and important problem         states that belong to neither main nor livelock group. There
in both hardware and software systems. In hardware verifi-          is one or zero main group in a design per one initial state.
cation, a livelock is a situation where the state of a design         This paper also presents a practical approach for checking
changes within only a subset of the states reachable from          livelock using abstraction techniques. The proposed live-
the initial states of the design. In a state transition graph, a   lock checking works on FSMs(Finite State Machines)2 rather
livelock is a set of states from which there is no path going      than the whole design. For each FSM, we make an abstract
to any other states that are reachable from the initial state.     machine (by localization reduction [15]) of manageable size
Since deadlock is a special case in which there is only one        by the improved BDD method from the COI(Cone of Influ-
state in a livelock, deadlock checking can be done by livelock     ence) of the FSM. Once a livelock is found on an abstract
checking. Thus, livelock implies both livelock and deadlock        machine, the livelock is justified on the concrete machine
in this paper. However, livelock checking1 has never been          with trace concretization using SAT (Satisfiability[9]) and
actively used in hardware verification in practice, mainly due      simulation. When there is no livelock on the abstract ma-
to the complexity of the computation which involves find-           chine, there is no guarantee for no livelock on the concrete
ing SCCs (Strongly Connected Components). Thus, livelock           machine. However, the bigger the abstract size is, the more
checking has been on hardware designer’s wish list to verify       confidence we have that no livelock exists on the concrete
their designs.                                                     machine. The key benefit of this abstraction-based livelock
   There have been many approaches on finding SCCs [13,             checking is that it enables finding real livelock groups that
27, 28, 3, 20, 12]. Among these work, Xie and Beeral pro-          cannot be found by tackling whole design directly.
posed a symbolic method finding terminal SCCs (in short,               Once an FSM is given, its COI is first computed. Then,
TSCCs) using BDDs (Binary Decision Diagrams [4]) in [27].          an abstract machine is computed by finding N influential
In a state transition graph, a TSCC is an SCC that does            latches from the COI. Influential latches are the latches that
not have any outgoing edges to any state outside the SCC.          are likely related with the FSM. N is either pre-defined or a
Thus, a TSCC becomes a livelock group when the TSCC                user-defined number of latches in the abstract machine, or
1                                                                  2
  Livelock checking is different from liveness checking and           FSMs are either automatically extracted [26] or any sets of
the difference will be explained in Section 2.3.                    sequential elements that are user-specified.




                                                                                                                       46
gradually increased. In general, N is up to a few hundred          algorithms. Ravi et al. [20] provided a taxonomy of those
latches. Influential latches are computed mainly by approx-         symbolic algorithms. One is SCC-hull algorithms (without
imate state decomposition [6]. However, in many cases, the         enumerating SCCs) [11, 14, 24], and the other is SCC enu-
size of COIs is too big for even approximate state decom-          meration algorithms [28, 3, 12, 20]. The details are in [20].
position. Thus, a structural abstraction is applied by using
connectivity and sequential depth before approximate state         2.2    Finding TSCCs
decomposition. This structural abstraction reduces the COI            Even though TSCCs are a subset of SCCs in the states of a
to a manageable size by approximate state decomposition.           design, the algorithms for finding TSCCs can be significantly
   There is another important hardware property called tog-        optimized since not all SCCs are of interest.
gle deadlock. A state variable has a toggle deadlock if the           This section recapitulates the work on finding TSCCs by
state variable initially toggles and the state variable becomes    Xie and Beeral [27]. This algorithm classifies all states into
a constant after a certain number of transitions. However,         either recurrence or transient class. Recurrence class is a set
notice that this is not a constant variable since it initially     of TSCCs and the rest belongs to transient class. Let S be
toggles. Toggle deadlock may or may not happen depending           the set of states. With i, j ∈ S, i → j denotes that there is
on input stimuli in simulation. Therefore, toggle deadlock is      at least one path from i to j. Definition 1 defines forward
also an important property to check with formal approaches.        set and backward set of a state.
   Experimental results shows that the proposed abstraction-          Definition 1. The forward set of state i ∈ S, denoted by
based approach finds real livelock and toggle deadlock errors       F (i), is the set of states that have a path from i. That is,
from industrial designs.                                           F (i) = {j ∈ S | i → j}. Similarly, the backward set of state
   The contributions of this paper are in the three aspects.       i, denoted by B(i), is the set of states that have a path to i.
    • Improved algorithm for livelock checking                     That is, B(i) = {j ∈ S | j → i}.
      The proposed algorithm improved the existing algo-             Lemma 1. Let i, j ∈ S. If j ∈ F (i), then F (j) ⊆ F (i).
      rithm [27] in many aspects, such as providing new            Similarly, if j ∈ B(i), then B(j) ⊆ B(i).
      state classification with initial state, handling multi-
      ple initial states, refining the search space efficiently          Theorem 1. A state i ∈ S is recurrent if and only if
      with care states, early termination, and trimming out        F (i) ⊆ B(i). In other words, i is transient if and only if
      transient states.                                            F (i) * B(i).
    • Abstraction-based livelock checking                             Theorem 2. If state i ∈ S is transient, then states in
      This paper presents theories and an implementation           B(i) are all transient. If state i is recurrent, on the other
      on abstraction-based livelock checking to handle large       hand, states in F (i) are all recurrent. In the latter case, set
      designs in practice.                                         F (i) is a recurrence class, and set B(i)\F (i) (if not empty)
    • Toggle deadlock checking                                     contains only transient states.
      To the best of our knowledge, this paper presents the          Lemma 1, Theorem 1 and 2 are from [27]. Lemma 1 shows
      first method to solve toggle deadlock problem.                a subset relation between two forward sets as well as two
   The remainder of this paper is organized as follows. Sec-       backward sets when j is in either F (i) or B(i). Theorem 1
tion 2 briefly recapitulates finding SCCs and TSCCs, and             and 2 show how a state is determined whether the state
describes related work. Section 3 describes our improved al-       belongs to either recurrence or transient class. Based on
gorithm for finding TSCCs. Section 4 explains how livelock          Theorem 1 and 2, all TSCCs can be found by performing
is checked on FSM using abstraction. Section 5 describes           forward and backward reachability iteratively. The detailed
how to check toggle deadlocks. Experimental results are            algorithm can be found in [27] and our improved algorithm is
presented and discussed in Section 6. We conclude with             described in Section 3.2 with the comparisons to the original
Section 7.                                                         algorithm.
2. PRELIMINARIES                                                   2.3    Related work
2.1 Finding SCCs                                                      There are two types of properties in model checking; safety
   Given a graph, G = (V, E) where G is an infinite transition      and liveness properties [16]. A safety property represents
system of the Kripke structure [8], V is a finite set of states     ’something bad never happens’, whereas a liveness property
and E ⊆ V × V is the set of edges, a strongly connected            represents ’something good eventually happens’. Liveness
component (SCC) is a maximal set of state U ⊆ V such               checking with a liveness property can be performed by find-
that for every pair (u, v) ∈ U , u and v are reachable from        ing SCCs [20]. Liveness checking can also be performed by
each other, that is, u is reachable from v and v is reachable      safety checking with proper transformations [1].
from u [27, 28].                                                      Livelock checking is different from liveness checking in the
   Finding SCCs has a variety of applications in formal ver-       sense that liveness checking requires a liveness property to
ification such as Buchi emptiness [11], LTL model check-            work on a design, whereas livelock checking does not require
ing [25], CTL model checking with fairness constraints [10],       any property and works on a design directly.
Liveness checking [16, 1], and so on.                                 There have been many publications on finding all SCCs [24,
   The traditional approach to find SCCs is to use Tarjan’s         28, 3, 12]. Even though all TSCCs can be found by any of
method [23]. Since this method manipulates the states of the       these approaches on finding all SCCs, it is not necessary to
graph explicitly, even though it runs in linear time in the size   find all SCCs for finding all TSCCs since we are interested
of the graph, the size of the graph grows exponentially as         in finding only all TSCCs for livelock checking.
the number of state variables grows.                                  Hachtel et al. [13] proposed a symbolic approach to find
   To overcome this state explosion problem in explicit al-        all recurrence classes concurrently identifying all TSCCs by
gorithms, there have been many publications on symbolic            computing transitive closure [17] on the transition graph




                                                                                                                       47
with the Markov chain. Due to the complexity of transi-            states, the reachable states are a through i inside the dot-
tive closure, this approach takes significantly more time and       ted rectangle. The unreachable states are j through o out-
memory than a reachability-based approach does.                    side the dotted rectangle. There are five SCCs that are
   Qadeer et al. [19] proposed an algorithm to find single          {a, b, c}, {e, f, g}, {h, i}, {j, k, l}, and{m, n, o}. Since a is the
TSCC in the context of safe replacement in sequential equiv-       initial state, {a, b, c} becomes the main group. {h, i} and
alence checking [21, 22]. In this approach, multiple TSCCs         {m, n, o} are TSCCs and only {h, i} is a livelock group (reach-
are not considered.                                                able STSCC) since it is reachable from a. {m, n, o} is called
   Xie and Beeral proposed a reachability-based algorithm          an unreachable TSCC. The rest states, {d, e, f, g, j, k, l}, be-
to find all TSCCs iteratively [27]. This is also a symbolic         long to the transient group in which the states are contained
approach that outperforms the method in [13]. However,             in neither the main group nor the TSCCs.
this approach does not consider initial states.
   None of the above previous work on finding TSCCs was
                                                                   3.2      Finding Livelock
used in real designs in practice, due to the design sizes. Our        We first define transition relation in Definition 3 to explain
abstraction-based approach is the first in publication to han-      our algorithms to check livelock.
dle large designs in practice.                                        Definition 3. Let x = {x1 , . . . , xn }, y = {y1 , . . . , yn },
   Case et al. [5] proposed a method finding transient sig-         and w = {w1 , . . . , wp } be sets of variables ranging over B =
nals using ternary simulation. A transient signal is a toggle      {0, 1}. A (finite state) machine is a pair of boolean functions
deadlock on over-approximate reachable states. The toggle          ⟨Q(x, w, y), I(x)⟩, where Q : B 2n+p → B is 1 if and only if
deadlock checking in this paper finds transients signals in         there is a transition from the state encoded by x to the state
exact reachable states.                                            encoded by y under the input encoded by w. I : B n → B is
                                                                   1 if the state encoded by x is an initial state. Q(x, w, y) is
3. IMPROVED LIVELOCK CHECKING                                      called transition relation. The sets x, y, and w are called the
3.1 State Classification                                           present state, next state, and input variables, respectively.
   The state classification in [27] consists of one transient          The procedure ComputeF orwardSet in Figure 2 is a mod-
class and one or more recurrence classes. However, in hard-        ified version of the procedure f orward set in [27] in order
ware verification, initial states are given to verify the hard-     to compute forward set of a given state s only within the
ware behavior only in reachable state space. One problem of        given care states careSet in the procedure and to perform
the state classification in [27] is that there is no distinction    early termination when stop is not ZERO (empty BDD). ⇓
between reachable TSCCs and unreachable TSCCs from the             represents a restrict operator [7] that is used to minimize
initial states. Also, the reachable TSCCs may vary depend-         the transition relation with respect to careSet in Line 2.
ing on initial states.                                             The minimized transition relation is denoted by Q̃. In Line
   We propose a new state classification that is shown in Fig-      7, y ← x represents that y variables are replaced by x vari-
ure 1, assuming that there is one single initial state. Han-       ables by BDD substitution. Early termination is another
dling multiple initial states is explained in Section 3.3.         big difference from f orward set in [27] and is used in Fig-
   Definition 2. STSCC is a sink TSCC that has incoming            ure 3. This is to bail out computing forward set as soon as
edges from any states outside the TSCC.                            any newly reached state intersects with the states in stop
   We first define sink TSCC (in short, STSCC) in Defini-             as in Line 11. BddIteConstant is a BDD ITE(if-then-else)
tion 2. The new state classification consists of main group,        operation without creating a new BDD node. O is an array
transient group, livelock groups (reachable STSCCs) and            of states to store newly reached states at each iteration and
unreachable TSCCs for a given initial state. The transient         O is called onion rings. These onion rings are used later in
class in [27] is further classified into main group or tran-        Section 3.3. ComputeF orwardSet returns the forward set
sient group. Main group is an SCC containing the initial           F (s) and the onion rings O.
state and there exists either one or no main group. The                 ComputeForwardSet(Q, careSet, s, stop) {
recurrence classes in [27] are further classified into livelock     1      F (s) = ZERO;
groups (reachable STSCCs) and unreachable TSCCs. When              2      Q̃(x, w, y) = Q(x, w, y) ⇓ careSet;
there is no livelock, there exists only one SCC which is the       3      f rontier(x) = s;
                                                                   4      Put s in O;
main group.                                                        5      while (f rontier(x) ̸= ZERO) {
                                                                   6          image(y) = ∃x, w. Q̃(x, w, y) ∧ f rontier(x);
                                        Unreachable TSCC           7          image(x) = image(y)|y←x ;
                                                                   8          F (s) = F (s) ∨ image(x);
                                                                   9          f rontier = image(x) ∧ ¬F (s);
                           k        l        m       n             10         Put f rontier in O;
                                                                   11         if (BddIteConstant(f rontier, stop, ZERO) != ZERO)
                                j                o                 12             break;
        Reachable States                                           13     }
                                                                   14     return (F (s), O);
                 c                           g           i              }
                                                                               Figure 2: Computing forward set.
             a       b         d         e       f       h
                                                 Livelock Group      ComputeBackwardSet is a dual procedure to Compute
           Main Group          Transient Group (Reachable STSCC)   F orwardSet, except not using stop and not computing the
                                                                   onion rings O.
                                                                     Figure 3 is a procedure for finding TSCCs from the given
              Figure 1: State classification.
                                                                   set of states S. The procedure F indT SCCs is a modified
   In Figure 1, there are states a through o and a is the          version of the procedure State classif ication in [27]. The
initial state that is marked with thick circle. Among all          modified procedure utilizes care states careSet, assuming S




                                                                                                                           48
is not necessarily all state space. T is a set of transient states   Line 11. TR represents the set of transient states that are
in S, and R is an array of TSCCs in S. P ickOneState in              reachable from s. In Line 12, R and TR are computed by
Line 5 picks a random state from careSet as a seed state to          calling F indT SCCs with careSet. If s ∈    / M (means that
find a TSCC. In Line 7, early termination is used in comput-          the main group is empty), s is added to TR in Line 13-14.
ing the forward set F (s), by setting stop in Figure 2 as the        TU represents the set of transient states that are unreachable
negation of B(s). This is because while we compute F (s)             from s and TU is computed in Line 15. T is computed by
within B(s) for the state s, once any state outside B(s)             union of TR and TU in Line 16.
is reachable from s, all states in B(s) are transient. An-
other big difference is trimming transient states in Line 12          3.3    Multiple Initial States
and 16. T rimT ransient(Q, careSet, T, dir) trims out the               It is possible for a design to have multiple initial states
transient states from the current care states by the given           when some of the state variables do not have concrete initial
direction (dir) that is either PREFIX, SUFFIX, or BOTH.              values. In the presence of multiple initial states, finding
PREFIX(SUFFIX) means to trim out the lasso prefix(suffix)               livelock groups has to be devised correctly to avoid false
states. This is the same technique used in finding SCCs [20].         positives and redundant computations.
Finally, F indT SCCs returns R (a set of TSCCs) and T (a                Figure 5 shows an example with multiple initial states.
set of transient states).                                            In this example, there are six states, S = {a, b, c, d, e, f }.
     FindTSCCs(Q, S) {                                               There are two SCCs, {a, b, c} and {d, e, f }. We can see
1       R = { };                                                     that {d, e, f } is a TSCC. a and d are initial states, I =
2       T = ZERO;                                                    {a, d}, as shown with thick circles. Suppose that we com-
3       careSet = S;
4       while (careSet ̸= ZERO) {                                    pute livelock by calling F indLivelock(Q, S, I). Then, we
5          s = PickOneState(careSet);                                get F (I) = B(I) = M = {a, b, c, d, e, f } and R = {} which
6          B(s) = ComputeBackwardSet(Q, careSet, s);                 is not correct since there is a reachable TSCC. Now, let
7          F (s) = ComputeForwardSet(Q, careSet, s, ¬B(s));
8          if (F (s) ⊆ B(s)) {                                       us try to call F indLivelock for each single initial state.
9              R = R ∪ F (s);                                        First for the initial state a, we get F (a) = {a, b, c, d, e, f }
10             T = T ∨ (B(s) ∧ ¬F (s));                              and B(a) = {a, b, c}. This gives us Ma = {a, b, c} and
11             careSet = careSet ∧ ¬B(s);
12             TrimTransient(Q, careSet, T , PREFIX);                Ra = {d, e, f }. There is a livelock group Ra for the initial
13         } else {                                                  state a. Now, for the initial state d, F (d) = {d, e, f } and
14             T = T ∨ (s ∨ B(s));                                   B(d) = {a, b, c, d, e, f }. This gives us Md = {d, e, f } and
15             careSet = careSet ∧ ¬(s ∨ B(s));
16             TrimTransient(Q, careSet, T , BOTH);
                                                                     Rd = {} and TU = {a, b, c}. There is no livelock group for
17         }                                                         the initial state d. Therefore, we can see that livelock check-
18      }                                                            ing has to be applied for each single initial state separately
19      return (R, T );
     }
                                                                     in the presence of multiple initial states.
                 Figure 3: Finding TSCCs.
   FindLivelock(Q, S, s) {
                                                                                             c                 f
1     (F (s), O) = ComputeForwardSet(Q, S, s, ZERO);
2     B(s) = ComputeBackwardSet(Q, S, s);
3     reached = F (s) ∨ s;                                                              a        b        d        e
4     if (F (s) ⊆ B(s)) {
5         M = F (s);
6         R = { };                                                               Figure 5: Multiple initial states.
7         T = ZERO;
8     } else {
9         M = F (s) ∧ B(s);
                                                                         Theorem 3. When there are two initial states (i0 and
10        careSet = F (s) ∧ ¬(M ∨ s);                                i1 ), if i1 is included in the reached states from i0 , the livelock
11        TrimTransient(Q, careSet, T , PREFIX);                     groups from i1 are a subset of the livelock groups from i0 .
12        (R, TR ) = FindTSCCs(Q, careSet);
13        if (s ∈
                / M)                                                     Proof. Since i1 is included in the reached states from
14            TR = TR ∨ s;                                           i0 , i1 is in either main, transient, or livelock groups from
15        TU = B(s) ∧ ¬M ;
16        T = TR ∨ TU ;
                                                                     i0 . When i1 is in the main group, the same livelock groups
17    }                                                              from i1 are obtained. When i1 is in the transient group, all
18    return (M, R, T, reached, O);                                  or a subset of the livelock groups i1 is obtained. When i1
   }
                                                                     is in one of the livelock groups, the livelock group including
                 Figure 4: Finding livelock.                         i1 becomes the main group from i1 , and no livelock group
   Figure 4 shows the procedure to perform our new state             exists from i1 since the other livelock groups from i0 become
classification. As explained in Section 3.1, we find main              unreachable TSCCs from i1 . From the above three cases, no
group (M ), transient group (T ), and livelock groups (R)            new livelock group is obtained from i1 compared to the ones
from the given initial state (s) within the given care states        from i0 . Therefore, the livelock groups from i1 are a subset
(S). F indLivelock starts computing forward set F (s) and            of the livelock groups from i0 .
backward set B(s) in Line 1 and 2. In Line 3, reached is the             Theorem 3 says that when there is large number of initial
reached states from s in S. If F (s) ⊆ B(s) in Line 4, there         states, we can skip livelock checking for any initial states
is no livelock in S. In this case, F (s) becomes the main            that are already in the forward sets of other initial states.
group and both R and T are set to empty in Line 5-7. If              In Figure 5, livelock checking for the initial state d can be
F (s) * B(s) in Line 8, there must exist at least one livelock       skipped because of d ∈ F (a), assuming that a is used first.
group. In this case, M is computed by intersecting F (s) and         However, there is an order dependency on which initial state
B(s) in Line 9. careSet is set to a subset of F (s) in Line          is used first. If d is used first, we still need to run live-
10. The lasso prefix states in careSet are trimmed out in             lock checking with a. In practice, the number of calls to




                                                                                                                             49
F indLivelock is greatly reduced because of Theorem 3 in            propose a framework for abstraction-based livelock checking
the presence of multiple initial states.                            on an abstracted COI of the FSM. Once we find a livelock on
   Figure 6 is the top-level procedure that checks livelock         the abstract machine, we justify whether the livelock exists
with multiple initial states. CheckLivelock takes transition        on the concrete machine. Notice that a livelock on the ab-
relation(Q), a set of states(S), a set of initial states(I), and    stract machine can be mapped into more than one livelock
a concrete machine(C) as procedure inputs. The use of C             on the concrete machine.
is explained in Section 4. CheckLivelock first finds live-               Figure 7 shows how an abstract machine is obtained from
lock groups in the reachable states in Line 1-17 and then it        the COI of an FSM. Suppose an FSM that has two state
finds TSCCs in the unreachable states in Line 18-23. The             variables f and g. Then, we compute the COI of the FSM.
while loop (Line 6-17) performs livelock checking for a cur-        Suppose that there are state variables {a, b, c, d, e} in the
rent initial state s until all initial states are covered with      COI of the FSM. The size of the abstract machine is pre-
iteration index k. For this, remaining is initially set to          defined and let us suppose that the size is N . Then, a set
I in Line 3 and updated by eliminating the newly reached            of influential latches from the COI is computed from the
states reachedk from remaining in Line 13. reached is the           FSM variables. The minimum abstract machine is the FSM
reached states from all initial states. reached is initially set    itself and the maximum abstract machine is the concrete
to ZERO in Line 1 and updated by adding reachedk that is            machine. In this example, N =4 and we get the abstract
the reached states from s in Line 12. Then, the next initial        machine {f, g, d, e}.
state is chosen from remaining in Line 15. TU is the union
of unreachable transient states from each initial state. TU is                                          Abstract Machine
initially set to ZERO in Line 2 and updated by adding the                             COI of FSM
                                                                                                                   FSM
unreachable states of Tk in Line 14. For the current initial                              a
                                                                                                           d        f
state s, F indLivelock is called in Line 7. |Rk | represents                              b
                                                                                                           e        g
the number of livelock groups in Rk in Line 8. For each                                   c

Rkj , a trace tracejk is generated in Line 9 and the livelock
                                                                                              Concrete Machine
is reported with the trace in Line 10. Generating trace is
explained in Section 4.2 and reporting livelock is explained                      Figure 7: Abstract machine.
in Section 4.3.                                                        Theorem 4. If any state in a livelock group on an ab-
   CheckLivelock(Q, S, I, C) {                                      stract machine is reachable from the initial state on the con-
1    reached = ZERO;
2    TU = ZERO;                                                     crete machine, the livelock exists on the concrete machine.
3    remaining = I;                                                    Proof. Since the abstraction is an over-approximation,
4    k = 0;                                                         the set of all transitions on the abstract machine is a superset
5    s = PickOneState(I);
6    while (s ̸= ZERO) {
                                                                    of the set of all transitions on the concrete machine. Since
7        (Mk , Rk , Tk , reachedk , Ok ) = FindLivelock(Q, S, s);   there is no path from any state in the livelock group to any
8        for (j = 0; j < |Rk |; j++) {                              state in the main group on the abstract machine, there is
9            tracejk = GenerateTrace(C, Rk      j
                                                  , s, Ok );        still no path from any projected states of the livelock group
10           ReportLivelock(s, Mk , Rk , Tk , tracejk );
                                         j

11       }
                                                                    on the concrete machine to any projected states of the main
12       reached = reached ∨ reachedk ;                             group on the concrete machine. Now, suppose that the live-
13       remaining = remaining ∧ ¬reachedk ;                        lock does not exist on the concrete machine. In order for the
14       TU = TU ∨ (Tk ∧ ¬reachedk );
15       s = PickOneState(remaining);
                                                                    livelock not to exist on the concrete machine, the only con-
16       k++;                                                       dition is that there is no path from the projected main group
17   }                                                              to the projected livelock group on the concrete machine. In
18   careSet = ¬(reached ∨ TU );                                    other words, the projected livelock group has to be unreach-
19   if (careSet ̸= ZERO) {
20       Rk = FindTSCCs(Q, careSet);                                able from the initial state. However, this contradicts the
21       for (j = 0; j < |Rk |; j++)                                assumption that any state of the livelock group is reachable
                                           j
22           ReportUnreachLivelock(Rk        );                     from the initial state on the concrete machine. Therefore,
23   }
   }
                                                                    the livelock group still exists on the concrete machine.
                Figure 6: Checking livelock.                           Thanks to Theorem 4, this abstraction-based livelock finds
   Once all reachable livelock groups are found, we next find        a livelock on small abstract machine using BDD-based sym-
unreachable TSCCs. We set the care states careSet to the            bolic method, then justifies the existence of the livelock on
negation of all visited states so far in Line 18, then call         the concrete machine by trace concretization in Section 4.2,
F indT SCCs with careSet in Line 20. If there is any un-            by using SAT techniques that can handle large designs. The
reachable TSCC, the TSCC is reported in Line 22.                    abstraction-based livelock checking is an incomplete method
                                                                    in the sense that it does not provide the proof of no livelock
4. LIVELOCK CHECKING ON FSM                                         unless the checking is performed on a concrete machine. No
   To check whether a livelock exists in a design or not, the       livelock on an abstract machine does not guarantee no live-
checking should be done on the whole design. However, this          lock on the concrete machine. However, the abstraction-
is infeasible due to the size of the design in practice. Thus,      based livelock checking enables finding real livelock errors
we propose a practical method for checking livelock on FSMs         on industrial large designs.
on the design.
   Even when we check livelock on an FSM, the entire COI            4.1    Causality Checking
logic of the FSM must be considered in order to get an ex-            Let V be the set of state variables in an abstract machine
act result on livelock. However, this is still computationally      for livelock checking. Suppose that R(V ) is the reached
very expensive or not feasible, in most real designs. Thus, we      states in the abstract machine and L(V ) is a livelock group




                                                                                                                           50
containing a TSCC. Also, suppose that v is a state variable         using the onion rings Ok in Figure 6. The second step is to
in V . We are interested in whether v contributes to the            generate an abstract trace. Starting from the target state,
livelock as in Definition 4. This is called variable causality.      an abstract trace can be computed by applying BDD-based
   Definition 4. When a livelock exists in the abstract ma-         pre-image computation iteratively until the initial state is
chine, a variable v in V contributes to the livelock if the live-   reached. The third step is to generate a concrete trace by
lock disappears by eliminating v from the abstract machine.         making a BMC (Bounded Model Checking [2]) problem from
In other words, there is no livelock in another abstract ma-        the abstract trace, in order to see whether the livelock group
chine that is composed of the variables, V \v.                      is reachable on the concrete machine. An efficient approach
                                                                    for concretization was proposed in [18]
  Equation 1 shows a condition for existence of livelock.
                         L(V ) ⊂ R(V )                        (1)   4.3    Reporting Livelock
  Now, let R̃ be the quantified reached states and L̃ be the            Once a concrete trace is generated for a livelock group,
quantified livelock states with respect to a state variable v,       the livelock is real on the concrete machine. We report the
as shown in Equation 2 and 3.                                       livelock group with the state classification mentioned in Sec-
                                                                    tion 3.1. A livelock group is reported with its initial state,
                   R̃(V \v) = ∃v . R(V )                      (2)
                                                                    the main group, transient group, and the unreachable states
                   L̃(V \v) = ∃v . L(V )                      (3)   in terms of the number of states and the percentage in each
  Then, it is determined by Equation 4 to check whether             group on the abstract machine.
the variable v contributes to the livelock. Theorem 5 says             By looking at the transient and livelock groups, we can
that if Equation 4 holds, v contributes to the livelock.            see what fraction of the state space is in problematic zone.
                                                                    A good design is expected to have only one main group per
                       L̃(V \v) ⊂ R̃(V \v)                    (4)   one initial state without any transient and livelock groups,
   Theorem 5. When a livelock group is found on an ab-              unless the design has an intended reset sequence to a normal
stract machine (L(V ) ⊂ R(V )), if L̃(V \v) ⊂ R̃(V \v) holds        mode.
for a variable v, the variable v contributes to the livelock.
   Proof. Let M1 be the machine consisting of V and sup-            5.    TOGGLE DEADLOCK CHECKING
pose that a livelock group exists in M1 . Let M2 be the ma-           There is another important design property, called toggle
chine consisting of (V \v) by eliminating v from M1 . Also,         deadlock that is related to livelock. A livelock may occur for
let T1 (T2 ) be the set of transitions in M1 (M2 ), respectively.   multiple state variables of a design, whereas a toggle dead-
Since M2 is an over-approximated machine from M1 , M2 has           lock may occur on a single state variable. A state variable
more transitions than M1 (T1 ⊂ T2 ). Let Td be the differ-           has a toggle deadlock if the variable initially toggles, but
ence between T1 and T2 . If there is any transition (in Td )        the variable gets stuck at a constant value after a certain
that makes a path from any state in the livelock to any state       number of cycles.
in the main group in M2 , the livelock group merges into the          Figure 8 shows an example of toggle deadlock. There are
main group and both groups become a single SCC, yielding            two state variables {a, b} and four states {s0 , s1 , s2 , s3 } as in
L̃(V \v) = R̃(V \v). Thus, M2 becomes a machine without             the example. Provided that s0 is the initial state, the main
the livelock. This means that v is a necessary variable to          group is {s0 , s1 } and the livelock group is {s2 , s3 }. Once the
have the livelock in M1 . Therefore, if L̃(V \v) = R̃(V \v), v      state transition reaches to s2 that is a state in the livelock
contributes to the livelock.                                        group, the value of b gets stuck at 1, whereas a still toggles.
   This causality checking can also be applied to a set of          Thus, we say that b has a toggle deadlock.
variables, especially with FSM variables, in order to report                                 a=0, b=0    a=0, b=1
whether the livelocks are related with the FSM. Let F be the                                   s0          s3
set of variables in an FSM and C be the set of variables in the
COI of the FSM. Suppose that R(F, C) is the reached states
                                                                                               s1          s2
in the abstract machine and L(F, C) is a livelock group con-
                                                                                             a=1, b=0    a=1, b=1
taining a TSCC. The quantified reached states and the quan-
tified livelock states are computed in Equation 5 and 6 with                         Figure 8: Toggle deadlock.
respect to the FSM variables, respectively.
                   R̃(C) = ∃F . R(F, C)                       (5)     Theorem 6. If there is no STSCC in a design, there is
                                                                    no toggle deadlock on any variable.
                   L̃(C) = ∃F . L(F, C)                       (6)
                                                                      Proof. To be a toggle deadlock, a variable is supposed to
   Then, Equation 7 shows the causality checking with the           toggle at a cycle and to hold the value forever from the cycle.
FSM variables to check whether the FSM variables con-               No STSCC implies that there is only main group in the
tribute to the livelock.                                            design. If a variable appears as constant in the main group,
                         L̃(C) = R̃(C)                        (7)   the variable is a constant. However, the main group does not
                                                                    have any prefix behavior. This means it is not possible for
4.2 Trace concretization                                            the variable to get toggled before the main group. Therefore,
  Once a livelock group is found on an abstract machine, we         no STSCC implies no toggle deadlock.
need to justify whether the livelock group is reachable on the
concrete machine. This can be done by the following three             Theorem 6 shows that toggle deadlock occurs in the pres-
steps. The first step is to pick a target state in the livelock      ence of a livelock. It is also possible that there is no toggle
group. The target state is chosen randomly from the livelock        deadlock on a design that has a livelock. Thus, toggle dead-
group, but is one of the closest states to the initial states by    lock on a state variable can be computed by two steps. First,




                                                                                                                             51
                               Statistics                                                      Results
     Design       L        I         F    T   COI      N         Llk   Dlk             New1                       New2              TraceGen
                                                                               Time    Mem        Ops      Time    Mem      Ops    Time   Len
     D1          1163    1330        7    -     632    30        0       -     16:20   107.1       44      15:31  107.9       41       -     -
                                                       60        0       -     41:11   136.7       66      41:31  137.7       66       -     -
                                                       90        0       -   2:30:33   224.1       81    2:24:50  224.3       81       -     -
                                                      120        -       -       time-out (> 24h)           time-out (> 24h)           -     -
     D2           385     352      25     -     68     30        0       -      0:01    21.0       12       0:01   21.1       12       -     -
                                                       60        0       -      0:40    38.7       28       0:40   38.7       28       -     -
                                                       68    12032       -   4:47:52    95.7  386439     0:53:58   96.5   70329        -     -
     D3-F1      32541     912        2   -    28941    30        1       -      0:49   140.4       54       0:49  140.5       55    6:37   66
     D3-F2      32541     912        4   -    28930     4        1       -      0:09   129.1        9       0:09  129.1       12    1:54   14
                                     -   4    28930    30        -       1      1:31   132.8      168       1:31  132.8      172    2:11   14

                                                        Table 1: Experimental results.


we find STSCCs on an abstract machine from the state vari-                      image/pre-image computations(Ops). N ew1 is the proposed
able. The abstract machine is made in the same way as in                       method without the trimming technique, whereas N ew2 is
livelock checking on FSM. Secondly, we evaluate the value                      the proposed method with the trimming technique. The
of the state variable in the livelock if the livelock exists.                  times are in the form of hh:mm:ss and the memory con-
   Figure 9 shows the procedure that checks toggle deadlock                    sumptions are in M-byte. The final two columns(T raceGen)
on a given state variable t. CheckT oggleDeadlock takes                        show the results on trace generation on concrete machine for
transition relation (Q), a set of states (S), a set of initial                 the livelock or toggle deadlock found by N ew2, and T ime
states (I), a concrete machine (C), and the state variable                     shows the time spent for trace generation and Len shows
(t) as procedure inputs. CheckT oggleDeadlock is similar                       the trace length.
to CheckLivelock in Figure 6. For each reachable livelock                         We have chosen 3 industrial designs (D1, D2, and D3).
group Rkj in Line 6, T estT oggleDeadlock checks whether the                   For each design, we have run livelock or toggle deadlock
value of t toggles or not in the livelock group and returns dlk                checking on several sizes of abstract machines with the mul-
and c in Line 7. dlk represents whether the state variable is                  tiples of 30 latches. We have set the maximum run time to
in toggle deadlock or not, and c is the constant value (0 or                   24 CPU hours.
1) in the case of toggle deadlock.                                                In D1, there is one FSM automatically extracted. The
     CheckToggleDeadlock(Q, S, I, C, t) {                                      FSM consists of 7 latches and contains 632 latches in its COI.
1      remaining = I;                                                          We can see that the run time is exponentially increased,
2      k = 0;
3      s = PickOneState(I);                                                    depending on the size of the abstract machine. On this
4      while (s ̸= ZERO) {                                                     design, the livelock checking became infeasible when N =120.
5         (Rk , reachedk , Ok ) = FindLivelock(Q, S, s);                          In D2, there is also one FSM that was user-specified. The
6         for (j = 0; j < |Rk |; j++) {
7             (dlk, c) = TestToggleDeadlock(Rk   j
                                                   , t);
                                                                               FSM consists of 25 latches and contains only 68 latches in
8             if (dlk) {                                                       its COI. This design has a livelock group. However, the
                        j                        j
9                 tracek = GenerateTrace(C, Rk , s, Ok );                      livelock was not detected when N=30 and N=60. The live-
                                             j
10                ReportToggleDeadlock(s, Rk   , tracejk , c);                 lock was detected only when all the latches in the COI were
11            }                                                                included in the abstract machine. In other words, the ab-
12        }
13        remaining = remaining ∧ ¬reachedk ;                                  stract machine is the concrete machine at the FSM point of
14        s = PickOneState(remaining);                                         view. Since the livelock was found on the concrete machine,
15        k++;                                                                 trace concretization is not required since the abstract trace
16     }
     }                                                                         in Section 4.2 is already a concrete trace.
              Figure 9: Checking toggle deadlock.                                 D2 is the only design showing a significant performance
                                                                               difference between N ew1 and N ew2 in the table. This is be-
6. EXPERIMENTAL RESULTS                                                        cause this design has many transient states as well as many
   We have implemented the proposed livelock checking and                      livelock groups. In this case, the trimming technique signif-
toggle deadlock checking algorithms. Table 1 shows our ex-                     icantly reduced the number of image/pre-image operations
perimental results on livelock and toggle deadlock checking,                   from 386K to 70K (5.5X reduction) that gave big speed-up
generated on a 1.4 GHz Intel processor machine with 4 GB                       from 5 hours to 1 hour (5X speed-up). This shows that the
memory running Red Hat Linux.                                                  trimming technique helps the performance when there are
   The first column lists the design names. The next five                        many transient states. When there is no transient states,
columns present the statistics on the designs, in terms of the                 the trimming technique becomes a pure overhead as shown
number of latches (L), the number of inputs (I), the number                    in D3. However, the overhead is almost negligible from the
of latches in FSM (F ), the number of toggle signals to check                  experiment.
(T ), and the number of latches in the COI of either FSM                          In D3, there are two FSMs (F 1 and F 2). F 1 is composed
and a toggle signal (COI). The next three columns show the                     of 2 latches and a livelock was found with N =30 within 49
results on livelock and toggle deadlock checking. The col-                     seconds. The livelock was justified by trace concretization
umn with N shows how many latches were in the abstract                         that took 397 seconds, and the trace length was 66. F 2 is
machine. The column with Llk shows how many livelock                           composed of 4 latches and a livelock was found with N =4
groups are found and the column with Dlk shows how many                        (the FSM itself) in 9 seconds. The livelock was also justi-
toggle deadlock are found. The next six columns compare                        fied by trace concretization that took 114 seconds, and the
the performance between two methods (N ew1 and N ew2),                         trace length was 14. We have also tried the toggle dead-
in terms of time(T ime), memory(M em), and the number of




                                                                                                                                      52
lock checking on F 2 separately from the livelock checking.           [5] M. Case, H. Mony, J. Baumgartner, and R. Kanzelman.
A toggle deadlock was found in 90 seconds and the concrete                Enhanced verification by temporal decomposition. In Formal
                                                                          Methods in Computer Aided Design, pages 37–54, 2009.
trace was generated in 131 seconds. D3 shows the value of             [6] H. Cho, G. D. Hachtel, E. Macii, M. Poncino, and F. Somenzi.
abstraction-based livelock and toggle deadlock checking.                  Automatic state space decomposition for approximate fsm
   Table 2 shows a comparison on finding all SCCs with four                traversal based on circuit analysis. IEEE Transactions on
algorithms (XB [28], Lockstep [3], Skeleton [12], IXB [20])               Computer-Aided Design, 15(12):1451–1464, Dec. 1996.
                                                                      [7] O. Coudert and J. C. Madre. A unified framework for the
on the design D2 from Table 1. In this design, the number of              formal verification of sequential circuits. In Proceedings of the
recurrent states is 2.07e8 and the number of transient states             International Conference on Computer-Aided Design, pages
is 1.2e6 that is only 0.6% of all states. However, it turned out          126–129, Nov. 1990.
                                                                      [8] O. G. E. M. Clarke and D. Peled. Model Checking. The MIT
that how to handle these transient states efficiently is the key            Press, 1999.
factor in the performance. One main difference between XB              [9] N. Een and N. Sorensson. MiniSat.
and IXB is that IXB trims out those transient states as much              http://www.cs.chalmers.se/Cs/Research/FormalMethods/
as possible. This trimming technique makes the IXB method                 MiniSat.
                                                                     [10] E. A. Emerson and C. Lei. Modalities for model checking:
outperform on this design: faster in time (more than 15X)                 Branching time logic strikes back. Science of Computer
and fewer number of image operations (more than 10X) than                 Programming, 8:275–306, 1987.
the other methods. This explains why N ew2 outperformed              [11] E. A. Emerson and C.-L. Lei. Efficient model checking in
on D2 in Table 1. Table 2 also shows why livelock checking is             fragments of the propositional mu-calculus. In Proceedings of
                                                                          the First Annual Symposium of Logic in Computer Science,
done by finding TSCCs instead of SCCs. Finding all livelock                pages 267–278, June 1986.
groups took 54 minutes, whereas finding all SCCs took 100             [12] R. Gentilini, C. Piazza, and A. Policriti. Computing strongly
minutes (2X) even with IXB.                                               connected components in a linear number of symbolic steps. In
                                                                          SODA ’03: Proceedings of the fourteenth annual ACM-SIAM
   Method         Time    Memory         Ops    SCCs     States           symposium on Discrete algorithms, pages 573–582, 2003.
   XB          84:07:56      98.2    1013333                         [13] G. D. Hachtel, E. Macii, A. Pardo, and F. Somenzi. Markovian
   Lockstep    45:55:53     237.3    2590724    19458    2.08e8           analysis of large finite state machines. IEEE Transactions on
   Skeleton    26:01:54     266.5    2609008                              Computer-Aided Design, 15(12):1479–1493, Dec. 1996.
   IXB          1:39:47      92.5     102990                         [14] R. Hojati, H. Touati, R. P. Kurshan, and R. K. Brayton.
                                                                          Efficient ω-regular language containment. In Computer Aided
           Table 2: Finding all SCCs in D2.                               Verification, pages 371–382, Montréal, Canada, June 1992.
                                                                     [15] R. P. Kurshan. Computer-Aided Verification of Coordinating
7. CONCLUSIONS                                                            Processes. Princeton University Press, Princeton, NJ, 1994.
   We have presented a framework for abstraction-based live-         [16] L. Lamport. Proving the correctness of multiprocess programs.
lock and toggle deadlock checking, in order to handle large               IEEE Transactions on Software Engineering,
                                                                          SE-3(2):125–143, Mar. 1977.
designs in practice. Since exact livelock and toggle deadlock        [17] Y. Matsunaga, P. C. McGeer, and R. K. Brayton. On
checking is infeasible on real designs directly, our approach             computing the transitive closure of a state transition relation.
is to check livelock and toggle deadlock on abstract machine              In Proceedings of the Design Automation Conference, pages
of either an FSM or a toggle signal. Once we find a livelock               260–265, June 1993.
                                                                     [18] K. Nanshi and F. Somenzi. Constraints in one-to-many
or toggle deadlock, we justify the livelock or toggle deadlock            concretization for abstraction refinement. In Proceedings of the
on the concrete machine by concretizing the abstract trace                Design Automation Conference, pages 569–574, 2009.
on the concrete machine.                                             [19] S. Qadeer, R. K. Brayton, V. Singhal, and C. Pixley. Latch
   Even though the proposed approach does not prove the                   redundancy removal without global reset. In Proceedings of the
                                                                          International Conference on Computer Design, pages
non-existence of livelock or toggle deadlock on a design un-              432–439, 1996.
less the design is small enough to handle, this approach finds        [20] K. Ravi, R. Bloem, and F. Somenzi. A comparative study of
livelocks or toggle deadlocks on the design if there exists.              symbolic algorithms for the computation of fair cycles. In
                                                                          W. A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods
   To the best of our knowledge, it is the first approach to               in Computer Aided Design, pages 143–160. Springer-Verlag,
use the abstraction-based livelock checking and also the first             Nov. 2000. LNCS 1954.
approach for checking toggle deadlock. The experimental re-          [21] V. Singhal. Design replacements for sequential circuits. Ph.D.
sults showed that the abstraction-based approach finds live-               dissertation, University of California at Berkeley, 1996.
                                                                     [22] V. Singhal, C. Pixley, A. Aziz, and R. K. Brayton. Theory of
lock errors on the real designs.                                          safe replacements for sequential circuits. IEEE Transactions
   As future work, we are interested in improving the con-                on Computer-Aided Design, 20(2):249–265, Feb. 2001.
cretization, finding more accurate influential latches, and            [23] R. Tarjan. Depth first search and linear graph algorithms.
optimizing the computations with multiple FSMs or toggle                  SIAM Journal of Computing, 1:146–160, 1972.
                                                                     [24] H. J. Touati, R. K. Brayton, and R. P. Kurshan. Testing
signals by considering the overlaps in their COIs.                        language containment for ω-automata using BDD’s.
                                                                          Information and Computation, 118(1):101–109, Apr. 1995.
8. REFERENCES                                                        [25] M. Y. Vardi and P. Wolper. An automata-theoretic approach to
 [1] A. Biere, C. Artho, and V. Schuppan. Liveness checking as            automatic program verification. In Proceedings of the First
     safety checking. In International Workshop in Formal                 Symposium on Logic in Computer Science, pages 322–331,
     Methods for Industrial Critical Systems, pages 160–177, 2002.        Cambridge, UK, June 1986.
 [2] A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symbolic model     [26] T.-H. Wang and T. Edsall. Practical FSM analysis for verilog.
     checking without BDDs. In Fifth International Conference on          In IVC-VIUF ’98: Proceedings of the International Verilog
     Tools and Algorithms for Construction and Analysis of                HDL Conference and VHDL International Users Forum,
     Systems (TACAS’99), pages 193–207, Amsterdam, The                    pages 52–58, 1998.
     Netherlands, Mar. 1999. LNCS 1579.                              [27] A. Xie and P. A. Beeral. Efficient state classification of
 [3] R. Bloem, H. Gabow, and F. Somenzi. An algorithm for                 finite-state markov chains. IEEE Transactions on
     strongly connected component analysis in n log n symbolic            Computer-Aided Design, 17(12):1334–1339, Dec. 1998.
     steps. In Formal Methods in Computer Aided Design, pages        [28] A. Xie and P. A. Beeral. Implicit enumeration of strongly
     37–54, 2000.                                                         connected components and an application to formal
 [4] R. Bryant. Graph-based algorithms for boolean function               verification. IEEE Transactions on Computer-Aided Design,
     manipulation. IEEE Transactions on Computers,                        19(10):1225–1230, Oct. 2000.
     C-35(8):677–691, Aug. 1986.




                                                                                                                              53
DIFTS13                                          Keyword Index


Keyword Index


          abstraction                        4
          abstraction-based                 46
          assertion                         38

          Boolector                         28

          Data-path equivalence checking     9
          deadlock checking                 46
          dynamic verification              38

          IC3                               19

          Lambda                            28
          Lemmas on Demand                  28
          livelock checking                 46

          Model Checking                    19
          modelchecking                      4

          Polynomial equivalence checking    9

          QF BV SMT solving                  9

          reparameterization                 4

          SAT solver                        19
          SMT                               28
          synthesis                          4
          systemC                           38

          verification                       4
DIFTS13                                    Author Index


Author Index


               Biere, Armin         28
               Brayton, Robert       9

               Cabodi, Gianpiero    19
               Case, Michael         9
               Cleaveland, Rance     1

               Dutta, Sonali        38

               Een, Niklas           4

               Fujita, Masahiro      2

               Goswami, Dhiraj       3

               Harer, Kevin         46

               Long, Jiang           9

           Mishchenko, Alan        4, 19
           Moon, In-Ho                46

           Niemetz, Aina             28

           Palena, Marco             19
           Preiner, Mathias          28

           Tabakov, Deian            38

           Vardi, Moshe Y.           38