=Paper= {{Paper |id=None |storemode=property |title=A Fast Reparameterization Procedure |pdfUrl=https://ceur-ws.org/Vol-1130/paper_1.pdf |volume=Vol-1130 |dblpUrl=https://dblp.org/rec/conf/fmcad/EenM13 }} ==A Fast Reparameterization Procedure== https://ceur-ws.org/Vol-1130/paper_1.pdf
                                A Fast Reparameterization Procedure

                                               Niklas Een, Alan Mishchenko
                                                 {een,alanmi}@eecs.berkeley.edu

                                        Berkeley Verification and Synthesis Research Center
                                                         EECS Department
                                              University of California, Berkeley, USA.


 Abstract. Reparameterization, also known as range pre-            the original node can be removed. The old primary in-
 serving logic synthesis, replaces a logic cone by another logic   puts dominated by the given node are also removed by this
 cone, which has fewer inputs while producing the same out-        procedure.
 put combinations as the original cone. It is expected that
 a smaller circuit leads to a shorter verification time. This        Example. Suppose a design contains inputs x1 , x2 ,
 paper describes an approach to reparameterization, which            and a gate Xor(x1 , x2 ); and that furthermore, x1 has
 is faster but not as general as the previous work. The new          no other fanouts besides this Xor-gate. Then, no mat-
 procedure is particularly well-suited for circuits derived by
                                                                     ter which value x2 takes, both a zero and a one can
 localization abstraction.
                                                                     be forced at the output of the Xor by setting x1 ap-
                                                                     propriately, and thus the Xor-gate can, for verification
                                                                     purposes, be replaced by a primary input.
1     Introduction
                                                                   The proposed method to find similar situations starts by
The use of reparameterization as a circuit transformation in
                                                                   computing all dominators of the netlist graph, then for each
the verification flow was pioneered by Baumgartner et. al.
                                                                   candidate node dominating at least one PI the following
in [1]. In their work, new positions for the primary inputs
                                                                   quantification problem is solved: “for each assignment to
(PIs) are determined by finding a minimum cut between
                                                                   the non-dominated gates, does there exist a pair of assign-
the current PI positions and the next-state variables (flop
                                                                   ments to the dominated PIs that results in a zero and a
inputs). BDDs are then used to compute the range (or im-
                                                                   one at the candidate node”. More formally, assuming that
age) on the cut. Finally, a new logic cone with the same
                                                                   x represents non-dominated gates (“external inputs”) and
range (but fewer PIs), is synthesized from the BDDs and
                                                                   yi represents dominated PIs (“internal inputs”), the follow-
grafted onto the original design in place of the current logic
                                                                   ing is always true for the node’s function φ:
cone. This is a very powerful transformation, but it has po-
tential drawbacks: (i) the BDDs may blow up and exhaust                          ∀x ∃y0 , y1 . ¬φ(x, y0 ) ∧ φ(x, y1 )
the memory, (ii) the extracted circuit may be larger than
the logic it replaces, and (iii) the runtime overhead may be       Important features of this approach are:
too high.
   In contrast, the proposed approach is based on greedy lo-        (i) It is circuit based, while early work on reparameteri-
cal transformations, capturing only a subset of optimization            zation was based on transition relations [4].
opportunities. However, memory consumption is modest,              (ii) In its simplest form, the proposed restructuring re-
runtimes are very low, and the resulting design is always               places some internal nodes by new primary inputs and
smaller, or of the same size, as the original design. It is             remove dangling logic.
shown experimentally that the proposed method leads to
sizeble reductions when applied for circuits produced by           (iii) The analysis is completely combinational: no informa-
localization abstraction [5].                                            tion on the reachable state-space is used.
                                                                   (iv) If the property was disproved after reparametrization,
2     Fast Reparameterization                                           it is straight-forward to remap the resulting counter-
                                                                        example to depend on the original primary inputs.
The fast reparameterization algorithm is based on the fol-
lowing observation: if a node dominates1 a set of PIs, and         It is important to realize that by analyzing and applying
those PIs are sufficient to force both a zero and a one at           reductions in topological order from PIs to POs, regions
that node, regardless of the values given to the other PIs         amenable to reparameterization are gradually reduced to
and state variables, then that node can be replaced by a           contain fewer gates and PIs. By this process, the result of
new primary input, while the unused logic cone driving             repeatedly applying local transformations can lead to a sub-
                                                                   stantial global reduction. In the current implementation,
   1 A node n dominates another node m iff every path from m to a
                                                                   the above formula is evaluated by exhaustive simulation of
primary output goes through node n.
                [cand]                       [cand]
                                                                        Fast Reparameterization
                                                                        − Compute dominators. For a DAG with bounded
                    &                            &                      in-degree (such as an And-Inverter-Graph), this is a
                                                                        linear operation in the number of nodes (see Figure 3).
                                                                        − For each PI, add all its dominators to the set of
                                                                        “candidates”.
            &               &            !            &
                                                                        − For each candidate c, in topological order from inputs
                                                                        to outputs:
                                                                          - Compute the set D of nodes dominated by c (Figure 4).
            !               ?                    !            ?
                                                                          - Denote the PIs in D “internal” inputs.

                            [cand]                                        - Denote any node outside D, but being a direct fanin of a
                                                                          node inside D, an “external” input (may be any gate type).
                                                                          - Simulate all possible assignments to the internal and ex-
                                                                          ternal inputs. If for all external assignments there exists
                                &                                         an internal assignment that gives a 0 at c, and another in-
                                                                          ternal assignment that gives a 1 at c, then substitute c by
                                                                          a new primary input.

                        !            &
                                                                           Figure 2. Steps of the reparameterization algorithm.


                                &            &
                                                                        Compute Dominators
                                                                        − Initialize all POs to dominate themselves
                !               ?            !            ?             − Traverse the netlist in reverse topological order (from POs
                                                                        to PIs), and mark the children of each node as being domi-
                                                                        nated by the same dominator as yourself unless the child has
Figure 1. Example of subgraphs that can be reduced. Here “!”
                                                                        already been assigned a dominator
denote nodes we can control (a PI dominated by the output
node); “?” denote nodes that can assume any value beyond our            − For already marked children, compute the “meet” of the
control. In the above examples, the output node “[cand]” can            two dominators, i.e. find the first common dominator. If
be forced to both a zero and a one by choosing the right values         there is no common dominator, mark the node as dominat-
for the “!”, regardless of the values of the “?”. In such cases,        ing itself.
the output node is replaced by a new PI.
                                                                       Figure 3. Review of the “finding direct dominators” algorithm
                                                                       for DAGs. For a more complete treatment of this topic, see [8],
the logic cone rooted in the given node while the cone is              or for a more precise description, the source code of “compute-
limited to 8 inputs. Limiting the scope to cones with 8 in-            Dominators()” in “ZZ/Netlist/StdLib.cc” of ABC-ZZ [11].
puts and simulating 256 bit patterns (or eight 32-bit words)
seems to be enough to saturate the reduction achievable on
the benchmarks where the method is applicable.                          Compute Dominated Area
   Some typical reductions are shown in Figure 1. The
graphs should be understood as sub-circuits of a netlist be-            area = {w dom}             – init. set of gates to the dominator
ing reparameterized. The exclamation marks denote “inter-               count = [0, 0, . . ., 0]    – count is a map “gate → integer”
                                                                        for w ∈ area:
nal” PIs dominated by the top-node, and hence under our
                                                                              for v ∈ faninsOf (w ):
control; and the question marks denote gates with fanouts
                                                                                    count[v ]++
outside the displayed logic cone, for which no assumption                           if count[v ] == num of fanouts[v ]:
on their values can be made. The full algorithm is described                                 area = area ∪ {v }
in Figure 2.

Counterexample reconstruction. There are several                       Figure 4. Find nodes used only by gate “w dom”. This set is
                                                                       sometimes called maximum fanout free cone (MFFC). The outer
ways that a trace on the reduced netlist can be lifted to the
                                                                       for-loop over the elements of area is meant to visit all elements
original netlist. For instance, the removed logic between              added by the inner for-loop. For this procedure, flops are treated
the new PIs and the old PIs can be stored in a separate                as sinks, i.e. having no fanins.
netlist. The trace on the reduced netlist can then be pro-

                                                                   2
jected onto the original PIs by rerunning the simulation            ing Competition 2012 were considered. Localization ab-
used to produce the reparameterized circuit, and for each           straction [5] was applied with a timeout of one hour to each
candidate pick an assignment that gives the correct value.          benchmark and the resulting models meeting the following
But even simpler, one can just put the original netlist into        criteria were kept:
a SAT solver and assert the values from the trace onto the
appropriate variables and call the SAT solver to complete             – At least half of the flops were removed by abstraction.
the assignment. In practice, this seems to always work well.
                                                                      – The abstraction was accurate (no spurious counterex-
                                                                        amples).
3    Improvements
                                                                      – At least one of the verification engines could prove the
The algorithm described in the previous section replaces
                                                                        property within one hour.
internal nodes with inputs, and thus only removes logic. If
we are prepared to forgo this admittedly nice property and
                                                                    The sizes of benchmarks selected in this way are listed in
occasionally add a bit of new logic, then nodes that are
                                                                    table Table 1. All those models were given to the reparam-
not completely controllable by their dominated inputs can
                                                                    eterization engine, both in weak mode and strong mode,
still be reparameterized by the following method: for node
                                                                    the latter using the improvements described in section 3.
with function φ(x, y), where x are external inputs and y are
                                                                    The reparameterized models were also post-processed with
internal inputs, compute the following two functions:
                                                                    a quick simplification method called “shrink” which is part
                    φ0 (x) ≡ ∀y.¬φ(x, y)                            of the ABC package [7]. The longest runtime for weak repa-
                    φ1 (x) ≡ ∀y. φ(x, y)                            rameterization was 16 ms, for strong reparameterization
                                                                    28 ms and for the simplification phase 50 ms.2 Reductions
Using these two functions, φ can be resynthesized using a           are listed in table Table 2.
single input ynew by the expression:                                   For comparison, Table 2 also include the results of run-
                  ¬φ0 (x) ∧ (φ1 (x) ∨ ynew )                        ning an industrial implementation of the BDD based algo-
                                                                    rithm of [1]. Because runtimes are significantly longer with
In other words, if two or more inputs are dominated by              this algorithm, they are given their own column. These
the node φ, a reduction in the number of inputs is guar-            results were given to us from IBM, and according to their
anteed. Depending on the shape of the original logic, and           statement “are not tweaked as much as they could be”.
how well new logic for φ0 and φ1 is synthesized, the num-              All benchmarks were given to three engines: Property Di-
ber of logic gates may either increase or decrease. In our          rected Reachability [2, 6], BDD-based reachability [3], and
implementation, logic for φ0 and φ1 is created by the fast          Interpolation-based Model Checking [9]. The complete table
irredundant sum-of-product (“isop”) proposed by Shin-ichi           of results is given in Table 3. A slice of this table, showing
Minato in [10]. We greedily apply this extended method              only results for PDR, with and without (strong) reparam-
for all nodes with two or more dominated PIs, even if it            eterization, is given in Table 4 together with a scatter plot.
leads to a blow-up in logic size. To counter such cases, fast
logic synthesis can be applied after the reparameterization.
Obviously, there are many ways to refine this scheme.               Analysis. Firstly, we see a speedup of 100x-1000x over
                                                                    previous work in the runtime of the reparameterization al-
                                                                    gorithm itself, with comparable quality of results for the
4    Future Work                                                    application under consideration (models resulting from lo-
                                                                    calization abstraction). This means the algorithm can al-
Another possible improvement to the method is extending             ways be applied without the need for careful orchestration.
the reparameterization algorithm to work for multi-output           Secondly, we see an average speedup of 2.5x in verifica-
cones. As an example, consider a two-output cone where              tion times when applying reparameterization in conjunc-
the outputs can be forced to all four combinations {00,             tion with PDR, which is also the best overall engine on
01, 10, 11} by choosing appropriate values for dominated            these examples. For two benchmarks, 6s121 and 6s150,
inputs. In such a case, the cone can be replaced by two             BDD reachability do substantially better than PDR, and
free inputs. If some of the four combinations at the out-           for the latter (where runtimes are meaningful) the speedup
puts are impossible under conditions expressed in terms of          due to reparameterization is greater than 3x. Furthermore,
non-controllable signals, a logic cone can be constructed to        for BDD reachability one can see that on several occa-
characterize these conditions and reduce the number of PIs          sions (6s30 in particular), reparameterization is completely
by adding logic similar to the case of a single-output cone.        crucial for performance. Finally, interpolation based mod-
                                                                    elchecking (IMC) seems to be largely unaffected by repa-
                                                                    rameterization.
5    Experiments
                                                                       2 Benchmarks from HWMCC’12 are quite small. For comparison:
As part of the experimental evaluation, all benchmarks              running reparameterization on a 7 million gate design from one of our
from the single-property track of the Hardware Modelcheck-          industrial collaborators took 4.1 s.


                                                                3
                                                                        Abstraction Phase
                                       !                                                !
                          Design       !     #And         #PI             #FF           ! Depth
                                       !                                                !
                          6s102        !
                                       !       6,594         72      1,121 → 56         !
                                                                                        !    23
                          6s121        !
                                       !       1,636         99        419 → 110        !
                                                                                        !    19
                          6s132        !
                                       !       1,216         94        139 → 113        !
                                                                                        !     7
                          6s144        !
                                       !      41,862        480      3,337 → 236        !
                                                                                        !    18
                          6s150        !
                                       !       5,448        146      1,044 → 323        !
                                                                                        !   103
                          6s159        !
                                       !       1,469         13        252 → 18         !
                                                                                        !     4
                          6s164        !
                                       !       1,095         91        198 → 93         !
                                                                                        !    16
                          6s189        !
                                       !      36,851        479      2,434 → 259        !
                                                                                        !    18
                          6s194        !
                                       !      12,049        532      2,389 → 198        !
                                                                                        !    45
                          6s30         !
                                       !   1,043,139     32,994      1,195 → 128        !
                                                                                        !    32
                          6s43         !
                                       !       7,408         30        965 → 310        !
                                                                                        !    25
                          6s50         !
                                       !      16,700      1,570      3,107 → 207        !
                                                                                        !    52
                          6s51         !
                                       !      16,701      1,570      3,107 → 209        !
                                                                                        !    65
                          bob05        !
                                       !      18,043        224      2,404 → 146        !
                                                                                        !   106
                          bob1u05cu    !      32,063        224      4,377 → 146        !   106

Table 1. Sizes of original designs and abstract models. Column #FF shows how many flip-flops were turned into unconstrained
inputs by localization abstraction. The removed FFs show up as PIs in the other tables. Last column shows the BMC depth used
by the abstraction engine (see [5] for more details).
                                                                                        Reparameterization
                  !                !                !             !
                  ! No Reparam. ! Weak Rep. ! Strong Rep. !              BDD Reparam.
                  !                !                !             !
                  !                !               !             !
        Design    ! #And     #PI ! #And #PI ! #And          #PI ! #And #PI Runtime
                  !                !               !             !
        6s102     !
                  !   6,594  1,137 !! 1,247    331 !! 1,188  267 !! 1,283   285    71.91 s
        6s121     !
                  !   1,636    408  !
                                    !    627   120  !
                                                    !   559   66  !
                                                                  !    732   41     0.27 s
        6s132     !
                  !   1,216    120 ! 1,108
                                    !           62 ! 1,102
                                                    !         55 ! 2,731
                                                                  !          36     0.80 s
        6s144     ! 41,862   3,580 !! 10,172 1,038 !! 9,494  812 !! 13,259  889    60.50 s
                  !
        6s150     !
                  !   5,448    867 !! 3,062    506 !! 2,213   89 !! 2,231    46     1.81 s
        6s159     !
                  !   1,469    247 !!    116    20 !!   114   19 !!    256   13     0.02 s
        6s164     !
                  !   1,077    196 !!    661   109 !!   499   41 !! 3,413    40    11.61 s
        6s189     ! 36,851   2,654 !! 10,033 1,004 !! 9,552  794 !! 12,051  814    63.11 s
                  !
        6s194     ! 12,049   2,723 ! 1,366     184 ! 1,348   166 !! 1,612   121    10.37 s
                  !                 !               !
        6s30      ! 102,535 34,061  ! 1,508    307  ! 1,184  205  !    603   50     0.74 s
                  !                 !               !             !
        6s43      !
                  !   7,408    685  ! 3,451
                                    !          304  ! 3,218
                                                    !        202  ! 17,691
                                                                  !         101     2.07 s
        6s50      ! 16,700   4,470  ! 1,841    350  ! 1,652  270  ! 4,319   752    46.57 s
                  !                 !               !             !
        6s51      ! 16,701   4,468  ! 1,828    350  ! 1,639  268  ! 1,255    62    16.34 s
                  !                 !               !             !
        bob05     ! 18,043   2,358  ! 1,618    187  ! 1,388   52  ! 1,586    43     0.31 s
                  !                 !               !             !
        bob1u05cu ! 32,063   4,455 ! 1,618     187 ! 1,388    52 ! 1,586     43     0.33 s

Table 2. Effect of reparameterization on the abstracted models. “Weak” reparameterization refers to the basic method described
in section 2, “Strong” additionally includes the improvements discussed in section 3. Runtimes are omitted as the average CPU
time was 4-8 ms (and the longest 28 ms). However, BDD based reparameterization is not as scalable as the method presented in
this paper, and runtimes (typically between 100x-1000x longer) are listed for reference.

References                                                        [4] Pankaj Chauhan, Edmund Clarke, and Daniel Kroening.
                                                                      A SAT-Based Algorithm for Reparameterization in
 [1] J. Baumgartner and H. Mony. Maximal Input Reduc-                 Symbolic Simulation. In Proc. of DAC, 2004.
     tion of Sequential Netlists via Synergistic Reparam-
     eterization and Localization Strategies. In Proc. of         [5] N. Een, A. Mishchenko, and N. Amla. A Single-
     CHARME, pages 222–237, 2005.                                     Instance Incremental SAT Formulation of Proof-
 [2] Aaron Bradley. IC3: SAT-Based Model Checking                     and Counterexample-Based Abstraction. In FM-
     Without Unrolling. In Proc. of VMCAI, 2011.                      CAD, 2010.

 [3] R. E. Bryant. Graph-Based Algorithms for Boolean             [6] Niklas Een, Alan Mishchenko, and Robert Brayton. Effi-
     Function Manipulation. In IEEE Transactions on Com-              cient Implementation of Property Directed Reach-
     puters, vol. c-35, no.8, Aug., 1986.                             ability. In Proc. of FMCAD, 2011.


                                                             4
                                                                                                                               Verification Runtimes
             !                                    !                            !
             !
             !                 PDR                !
                                                  !          BDD reach.        !
                                                                               !                                                            IMC
             !                                    !                            !
             ! NoRep.           Weak      Strong   ! NoRep.     Weak Strong ! NoRep.                                                        Weak     Strong
             !                                    !                            !
   6s102     !
             !     1.7             0.4        0.4 !!   121.2      90.2   204.3 !!      –                                                    2619.2         –
   6s121     !
             !    64.0            12.3        5.4 !!     0.5       0.3     0.4 !!   10.2                                                       5.8      36.0
   6s132     !
             !     7.8             8.1        7.9 !!  2327.3    1375.5       – !!  201.4                                                     384.9     267.9
   6s144     !
             !    10.3            12.4        9.6 !!       –         –       – !! 1177.5                                                     942.1    1413.0
   6s150     !
             !       –          2323.7          – !!   539.3     143.6   189.1 !!      –                                                         –         –
   6s159     !
             !     0.0             0.0        0.0 !!     0.1       0.1     0.1 !!    0.1                                                       0.1       0.1
   6s164     !
             !     7.0            34.1        6.8 !!       –      33.6     0.4 !!    4.4                                                       8.5       3.0
   6s189     !
             !    11.9             7.8        8.6 !!       –         –       – !!  738.0                                                     396.5     366.0
   6s194     !
             !     4.7             4.6        3.7 !!   307.5      42.5   742.2 !!  798.9                                                    1042.8    1103.3
   6s30      !
             !    15.9            45.3       12.3 !!       –      17.6    49.2 !!      –                                                         –         –
   6s43      !
             !    13.6            11.7        9.1 !!       –    1191.1  1390.4 !!      –                                                         –         –
   6s50      !
             !    14.7            10.8        5.4 !!   941.6     241.0  1680.8 !! 1795.6                                                     820.0    2348.9
   6s51      !
             !    18.7             7.1        4.1 !!   303.8     147.6  1891.2 !! 1806.8                                                     954.8    1737.8
   bob05     !
             !     0.3             0.3        0.3 !!  2450.5    2371.4  1253.2 !!   30.3                                                      26.3      25.1
   bob1u05cu !     0.3             0.2        0.3 !        –    2157.6  1088.3 !    30.3                                                      24.0      24.8

Table 3. Full table of results. Each design after abstraction is given to three engines: Property Directed Reachability (PDR),
BDD-based reachability (BDD), and Interpolation-based Model Checking (IMC). Each engine is run on three versions of the model
with no/weak/strong reparameterization applied. A dash represents a timeout after one hour.


                      !
            Design    ! NoRep.         Rep.
                                                          PDR on reparametrized abstract model




            6s102
                      !
                      !    1.66        0.40                                                      1000
                      !
            6s121     !
                      !   64.00        5.43
            6s132     !
                      !   7.82         7.90
            6s144     !   10.34        9.56                                                          100
                      !
            6s159     !
                      !    0.04        0.03
            6s164     !
                      !    6.97        6.83
            6s189     !
                      !   11.86        8.59                                                           10
            6s194     !
                      !    4.71        3.72
            6s30      !
                      !   15.92       12.29
            6s43      !
                      !   13.63        9.15                                                            1
            6s50      !
                      !   14.66        5.44
            6s51      !
                      !   18.72        4.08
            bob05     !    0.33        0.26
                      !                                                                              0.1
            bob1u05cu !    0.33        0.26                                                                0.1     1      10       100      1000
                                                                                                                    PDR on abstract model

Table 4. Runtime improvements for PDR. Column “NoRep.” shows runtimes (in seconds) for proving the property of each
benchmark using PDR after abstraction, but without reparameterization; column “Rep.” shows runtimes after reparameterization.
The scatter plot on the right places these runtime pairs on a log-scale. The average speedup is 2.5x.


 [7] Berkeley Logic Synthesis Group.        ABC: A Sys-                                                    Products Forms from Binary Decision Diagrams. In
     tem for Sequential Synthesis and Verification.                                                        Proc. of SASIMI, 1992.
     http://www.eecs.berkeley.edu/˜alanmi/abc/, v00127p.                                             [11] Berkeley Verification and Synthesis Research Center.
 [8] T. Lengauer and R.E. Tarjan. A Fast Algorithm for                                                    ABC-ZZ: A C++ framework for verification & syn-
     Finding Dominators in a Flowgraph. In ACM Trans-                                                     thesis. https://bitbucket.org/niklaseen/abc-zz.
     actions on Programming Languages and Systems, Vol. 1,
     No. 1, July, pages 121–141, 1979.
 [9] K. L McMillan. Interpolation and SAT-based Model
     Checking. In Proc. of CAV, 2003.
[10] S. Minato. Fast Generation of Irredundant Sum-Of-


                                                                                                 5