=Paper=
{{Paper
|id=None
|storemode=property
|title=A Fast Reparameterization Procedure
|pdfUrl=https://ceur-ws.org/Vol-1130/paper_1.pdf
|volume=Vol-1130
|dblpUrl=https://dblp.org/rec/conf/fmcad/EenM13
}}
==A Fast Reparameterization Procedure==
A Fast Reparameterization Procedure
Niklas Een, Alan Mishchenko
{een,alanmi}@eecs.berkeley.edu
Berkeley Verification and Synthesis Research Center
EECS Department
University of California, Berkeley, USA.
Abstract. Reparameterization, also known as range pre- the original node can be removed. The old primary in-
serving logic synthesis, replaces a logic cone by another logic puts dominated by the given node are also removed by this
cone, which has fewer inputs while producing the same out- procedure.
put combinations as the original cone. It is expected that
a smaller circuit leads to a shorter verification time. This Example. Suppose a design contains inputs x1 , x2 ,
paper describes an approach to reparameterization, which and a gate Xor(x1 , x2 ); and that furthermore, x1 has
is faster but not as general as the previous work. The new no other fanouts besides this Xor-gate. Then, no mat-
procedure is particularly well-suited for circuits derived by
ter which value x2 takes, both a zero and a one can
localization abstraction.
be forced at the output of the Xor by setting x1 ap-
propriately, and thus the Xor-gate can, for verification
purposes, be replaced by a primary input.
1 Introduction
The proposed method to find similar situations starts by
The use of reparameterization as a circuit transformation in
computing all dominators of the netlist graph, then for each
the verification flow was pioneered by Baumgartner et. al.
candidate node dominating at least one PI the following
in [1]. In their work, new positions for the primary inputs
quantification problem is solved: “for each assignment to
(PIs) are determined by finding a minimum cut between
the non-dominated gates, does there exist a pair of assign-
the current PI positions and the next-state variables (flop
ments to the dominated PIs that results in a zero and a
inputs). BDDs are then used to compute the range (or im-
one at the candidate node”. More formally, assuming that
age) on the cut. Finally, a new logic cone with the same
x represents non-dominated gates (“external inputs”) and
range (but fewer PIs), is synthesized from the BDDs and
yi represents dominated PIs (“internal inputs”), the follow-
grafted onto the original design in place of the current logic
ing is always true for the node’s function φ:
cone. This is a very powerful transformation, but it has po-
tential drawbacks: (i) the BDDs may blow up and exhaust ∀x ∃y0 , y1 . ¬φ(x, y0 ) ∧ φ(x, y1 )
the memory, (ii) the extracted circuit may be larger than
the logic it replaces, and (iii) the runtime overhead may be Important features of this approach are:
too high.
In contrast, the proposed approach is based on greedy lo- (i) It is circuit based, while early work on reparameteri-
cal transformations, capturing only a subset of optimization zation was based on transition relations [4].
opportunities. However, memory consumption is modest, (ii) In its simplest form, the proposed restructuring re-
runtimes are very low, and the resulting design is always places some internal nodes by new primary inputs and
smaller, or of the same size, as the original design. It is remove dangling logic.
shown experimentally that the proposed method leads to
sizeble reductions when applied for circuits produced by (iii) The analysis is completely combinational: no informa-
localization abstraction [5]. tion on the reachable state-space is used.
(iv) If the property was disproved after reparametrization,
2 Fast Reparameterization it is straight-forward to remap the resulting counter-
example to depend on the original primary inputs.
The fast reparameterization algorithm is based on the fol-
lowing observation: if a node dominates1 a set of PIs, and It is important to realize that by analyzing and applying
those PIs are sufficient to force both a zero and a one at reductions in topological order from PIs to POs, regions
that node, regardless of the values given to the other PIs amenable to reparameterization are gradually reduced to
and state variables, then that node can be replaced by a contain fewer gates and PIs. By this process, the result of
new primary input, while the unused logic cone driving repeatedly applying local transformations can lead to a sub-
stantial global reduction. In the current implementation,
1 A node n dominates another node m iff every path from m to a
the above formula is evaluated by exhaustive simulation of
primary output goes through node n.
[cand] [cand]
Fast Reparameterization
− Compute dominators. For a DAG with bounded
& & in-degree (such as an And-Inverter-Graph), this is a
linear operation in the number of nodes (see Figure 3).
− For each PI, add all its dominators to the set of
“candidates”.
& & ! &
− For each candidate c, in topological order from inputs
to outputs:
- Compute the set D of nodes dominated by c (Figure 4).
! ? ! ?
- Denote the PIs in D “internal” inputs.
[cand] - Denote any node outside D, but being a direct fanin of a
node inside D, an “external” input (may be any gate type).
- Simulate all possible assignments to the internal and ex-
ternal inputs. If for all external assignments there exists
& an internal assignment that gives a 0 at c, and another in-
ternal assignment that gives a 1 at c, then substitute c by
a new primary input.
! &
Figure 2. Steps of the reparameterization algorithm.
& &
Compute Dominators
− Initialize all POs to dominate themselves
! ? ! ? − Traverse the netlist in reverse topological order (from POs
to PIs), and mark the children of each node as being domi-
nated by the same dominator as yourself unless the child has
Figure 1. Example of subgraphs that can be reduced. Here “!”
already been assigned a dominator
denote nodes we can control (a PI dominated by the output
node); “?” denote nodes that can assume any value beyond our − For already marked children, compute the “meet” of the
control. In the above examples, the output node “[cand]” can two dominators, i.e. find the first common dominator. If
be forced to both a zero and a one by choosing the right values there is no common dominator, mark the node as dominat-
for the “!”, regardless of the values of the “?”. In such cases, ing itself.
the output node is replaced by a new PI.
Figure 3. Review of the “finding direct dominators” algorithm
for DAGs. For a more complete treatment of this topic, see [8],
the logic cone rooted in the given node while the cone is or for a more precise description, the source code of “compute-
limited to 8 inputs. Limiting the scope to cones with 8 in- Dominators()” in “ZZ/Netlist/StdLib.cc” of ABC-ZZ [11].
puts and simulating 256 bit patterns (or eight 32-bit words)
seems to be enough to saturate the reduction achievable on
the benchmarks where the method is applicable. Compute Dominated Area
Some typical reductions are shown in Figure 1. The
graphs should be understood as sub-circuits of a netlist be- area = {w dom} – init. set of gates to the dominator
ing reparameterized. The exclamation marks denote “inter- count = [0, 0, . . ., 0] – count is a map “gate → integer”
for w ∈ area:
nal” PIs dominated by the top-node, and hence under our
for v ∈ faninsOf (w ):
control; and the question marks denote gates with fanouts
count[v ]++
outside the displayed logic cone, for which no assumption if count[v ] == num of fanouts[v ]:
on their values can be made. The full algorithm is described area = area ∪ {v }
in Figure 2.
Counterexample reconstruction. There are several Figure 4. Find nodes used only by gate “w dom”. This set is
sometimes called maximum fanout free cone (MFFC). The outer
ways that a trace on the reduced netlist can be lifted to the
for-loop over the elements of area is meant to visit all elements
original netlist. For instance, the removed logic between added by the inner for-loop. For this procedure, flops are treated
the new PIs and the old PIs can be stored in a separate as sinks, i.e. having no fanins.
netlist. The trace on the reduced netlist can then be pro-
2
jected onto the original PIs by rerunning the simulation ing Competition 2012 were considered. Localization ab-
used to produce the reparameterized circuit, and for each straction [5] was applied with a timeout of one hour to each
candidate pick an assignment that gives the correct value. benchmark and the resulting models meeting the following
But even simpler, one can just put the original netlist into criteria were kept:
a SAT solver and assert the values from the trace onto the
appropriate variables and call the SAT solver to complete – At least half of the flops were removed by abstraction.
the assignment. In practice, this seems to always work well.
– The abstraction was accurate (no spurious counterex-
amples).
3 Improvements
– At least one of the verification engines could prove the
The algorithm described in the previous section replaces
property within one hour.
internal nodes with inputs, and thus only removes logic. If
we are prepared to forgo this admittedly nice property and
The sizes of benchmarks selected in this way are listed in
occasionally add a bit of new logic, then nodes that are
table Table 1. All those models were given to the reparam-
not completely controllable by their dominated inputs can
eterization engine, both in weak mode and strong mode,
still be reparameterized by the following method: for node
the latter using the improvements described in section 3.
with function φ(x, y), where x are external inputs and y are
The reparameterized models were also post-processed with
internal inputs, compute the following two functions:
a quick simplification method called “shrink” which is part
φ0 (x) ≡ ∀y.¬φ(x, y) of the ABC package [7]. The longest runtime for weak repa-
φ1 (x) ≡ ∀y. φ(x, y) rameterization was 16 ms, for strong reparameterization
28 ms and for the simplification phase 50 ms.2 Reductions
Using these two functions, φ can be resynthesized using a are listed in table Table 2.
single input ynew by the expression: For comparison, Table 2 also include the results of run-
¬φ0 (x) ∧ (φ1 (x) ∨ ynew ) ning an industrial implementation of the BDD based algo-
rithm of [1]. Because runtimes are significantly longer with
In other words, if two or more inputs are dominated by this algorithm, they are given their own column. These
the node φ, a reduction in the number of inputs is guar- results were given to us from IBM, and according to their
anteed. Depending on the shape of the original logic, and statement “are not tweaked as much as they could be”.
how well new logic for φ0 and φ1 is synthesized, the num- All benchmarks were given to three engines: Property Di-
ber of logic gates may either increase or decrease. In our rected Reachability [2, 6], BDD-based reachability [3], and
implementation, logic for φ0 and φ1 is created by the fast Interpolation-based Model Checking [9]. The complete table
irredundant sum-of-product (“isop”) proposed by Shin-ichi of results is given in Table 3. A slice of this table, showing
Minato in [10]. We greedily apply this extended method only results for PDR, with and without (strong) reparam-
for all nodes with two or more dominated PIs, even if it eterization, is given in Table 4 together with a scatter plot.
leads to a blow-up in logic size. To counter such cases, fast
logic synthesis can be applied after the reparameterization.
Obviously, there are many ways to refine this scheme. Analysis. Firstly, we see a speedup of 100x-1000x over
previous work in the runtime of the reparameterization al-
gorithm itself, with comparable quality of results for the
4 Future Work application under consideration (models resulting from lo-
calization abstraction). This means the algorithm can al-
Another possible improvement to the method is extending ways be applied without the need for careful orchestration.
the reparameterization algorithm to work for multi-output Secondly, we see an average speedup of 2.5x in verifica-
cones. As an example, consider a two-output cone where tion times when applying reparameterization in conjunc-
the outputs can be forced to all four combinations {00, tion with PDR, which is also the best overall engine on
01, 10, 11} by choosing appropriate values for dominated these examples. For two benchmarks, 6s121 and 6s150,
inputs. In such a case, the cone can be replaced by two BDD reachability do substantially better than PDR, and
free inputs. If some of the four combinations at the out- for the latter (where runtimes are meaningful) the speedup
puts are impossible under conditions expressed in terms of due to reparameterization is greater than 3x. Furthermore,
non-controllable signals, a logic cone can be constructed to for BDD reachability one can see that on several occa-
characterize these conditions and reduce the number of PIs sions (6s30 in particular), reparameterization is completely
by adding logic similar to the case of a single-output cone. crucial for performance. Finally, interpolation based mod-
elchecking (IMC) seems to be largely unaffected by repa-
rameterization.
5 Experiments
2 Benchmarks from HWMCC’12 are quite small. For comparison:
As part of the experimental evaluation, all benchmarks running reparameterization on a 7 million gate design from one of our
from the single-property track of the Hardware Modelcheck- industrial collaborators took 4.1 s.
3
Abstraction Phase
! !
Design ! #And #PI #FF ! Depth
! !
6s102 !
! 6,594 72 1,121 → 56 !
! 23
6s121 !
! 1,636 99 419 → 110 !
! 19
6s132 !
! 1,216 94 139 → 113 !
! 7
6s144 !
! 41,862 480 3,337 → 236 !
! 18
6s150 !
! 5,448 146 1,044 → 323 !
! 103
6s159 !
! 1,469 13 252 → 18 !
! 4
6s164 !
! 1,095 91 198 → 93 !
! 16
6s189 !
! 36,851 479 2,434 → 259 !
! 18
6s194 !
! 12,049 532 2,389 → 198 !
! 45
6s30 !
! 1,043,139 32,994 1,195 → 128 !
! 32
6s43 !
! 7,408 30 965 → 310 !
! 25
6s50 !
! 16,700 1,570 3,107 → 207 !
! 52
6s51 !
! 16,701 1,570 3,107 → 209 !
! 65
bob05 !
! 18,043 224 2,404 → 146 !
! 106
bob1u05cu ! 32,063 224 4,377 → 146 ! 106
Table 1. Sizes of original designs and abstract models. Column #FF shows how many flip-flops were turned into unconstrained
inputs by localization abstraction. The removed FFs show up as PIs in the other tables. Last column shows the BMC depth used
by the abstraction engine (see [5] for more details).
Reparameterization
! ! ! !
! No Reparam. ! Weak Rep. ! Strong Rep. ! BDD Reparam.
! ! ! !
! ! ! !
Design ! #And #PI ! #And #PI ! #And #PI ! #And #PI Runtime
! ! ! !
6s102 !
! 6,594 1,137 !! 1,247 331 !! 1,188 267 !! 1,283 285 71.91 s
6s121 !
! 1,636 408 !
! 627 120 !
! 559 66 !
! 732 41 0.27 s
6s132 !
! 1,216 120 ! 1,108
! 62 ! 1,102
! 55 ! 2,731
! 36 0.80 s
6s144 ! 41,862 3,580 !! 10,172 1,038 !! 9,494 812 !! 13,259 889 60.50 s
!
6s150 !
! 5,448 867 !! 3,062 506 !! 2,213 89 !! 2,231 46 1.81 s
6s159 !
! 1,469 247 !! 116 20 !! 114 19 !! 256 13 0.02 s
6s164 !
! 1,077 196 !! 661 109 !! 499 41 !! 3,413 40 11.61 s
6s189 ! 36,851 2,654 !! 10,033 1,004 !! 9,552 794 !! 12,051 814 63.11 s
!
6s194 ! 12,049 2,723 ! 1,366 184 ! 1,348 166 !! 1,612 121 10.37 s
! ! !
6s30 ! 102,535 34,061 ! 1,508 307 ! 1,184 205 ! 603 50 0.74 s
! ! ! !
6s43 !
! 7,408 685 ! 3,451
! 304 ! 3,218
! 202 ! 17,691
! 101 2.07 s
6s50 ! 16,700 4,470 ! 1,841 350 ! 1,652 270 ! 4,319 752 46.57 s
! ! ! !
6s51 ! 16,701 4,468 ! 1,828 350 ! 1,639 268 ! 1,255 62 16.34 s
! ! ! !
bob05 ! 18,043 2,358 ! 1,618 187 ! 1,388 52 ! 1,586 43 0.31 s
! ! ! !
bob1u05cu ! 32,063 4,455 ! 1,618 187 ! 1,388 52 ! 1,586 43 0.33 s
Table 2. Effect of reparameterization on the abstracted models. “Weak” reparameterization refers to the basic method described
in section 2, “Strong” additionally includes the improvements discussed in section 3. Runtimes are omitted as the average CPU
time was 4-8 ms (and the longest 28 ms). However, BDD based reparameterization is not as scalable as the method presented in
this paper, and runtimes (typically between 100x-1000x longer) are listed for reference.
References [4] Pankaj Chauhan, Edmund Clarke, and Daniel Kroening.
A SAT-Based Algorithm for Reparameterization in
[1] J. Baumgartner and H. Mony. Maximal Input Reduc- Symbolic Simulation. In Proc. of DAC, 2004.
tion of Sequential Netlists via Synergistic Reparam-
eterization and Localization Strategies. In Proc. of [5] N. Een, A. Mishchenko, and N. Amla. A Single-
CHARME, pages 222–237, 2005. Instance Incremental SAT Formulation of Proof-
[2] Aaron Bradley. IC3: SAT-Based Model Checking and Counterexample-Based Abstraction. In FM-
Without Unrolling. In Proc. of VMCAI, 2011. CAD, 2010.
[3] R. E. Bryant. Graph-Based Algorithms for Boolean [6] Niklas Een, Alan Mishchenko, and Robert Brayton. Effi-
Function Manipulation. In IEEE Transactions on Com- cient Implementation of Property Directed Reach-
puters, vol. c-35, no.8, Aug., 1986. ability. In Proc. of FMCAD, 2011.
4
Verification Runtimes
! ! !
!
! PDR !
! BDD reach. !
! IMC
! ! !
! NoRep. Weak Strong ! NoRep. Weak Strong ! NoRep. Weak Strong
! ! !
6s102 !
! 1.7 0.4 0.4 !! 121.2 90.2 204.3 !! – 2619.2 –
6s121 !
! 64.0 12.3 5.4 !! 0.5 0.3 0.4 !! 10.2 5.8 36.0
6s132 !
! 7.8 8.1 7.9 !! 2327.3 1375.5 – !! 201.4 384.9 267.9
6s144 !
! 10.3 12.4 9.6 !! – – – !! 1177.5 942.1 1413.0
6s150 !
! – 2323.7 – !! 539.3 143.6 189.1 !! – – –
6s159 !
! 0.0 0.0 0.0 !! 0.1 0.1 0.1 !! 0.1 0.1 0.1
6s164 !
! 7.0 34.1 6.8 !! – 33.6 0.4 !! 4.4 8.5 3.0
6s189 !
! 11.9 7.8 8.6 !! – – – !! 738.0 396.5 366.0
6s194 !
! 4.7 4.6 3.7 !! 307.5 42.5 742.2 !! 798.9 1042.8 1103.3
6s30 !
! 15.9 45.3 12.3 !! – 17.6 49.2 !! – – –
6s43 !
! 13.6 11.7 9.1 !! – 1191.1 1390.4 !! – – –
6s50 !
! 14.7 10.8 5.4 !! 941.6 241.0 1680.8 !! 1795.6 820.0 2348.9
6s51 !
! 18.7 7.1 4.1 !! 303.8 147.6 1891.2 !! 1806.8 954.8 1737.8
bob05 !
! 0.3 0.3 0.3 !! 2450.5 2371.4 1253.2 !! 30.3 26.3 25.1
bob1u05cu ! 0.3 0.2 0.3 ! – 2157.6 1088.3 ! 30.3 24.0 24.8
Table 3. Full table of results. Each design after abstraction is given to three engines: Property Directed Reachability (PDR),
BDD-based reachability (BDD), and Interpolation-based Model Checking (IMC). Each engine is run on three versions of the model
with no/weak/strong reparameterization applied. A dash represents a timeout after one hour.
!
Design ! NoRep. Rep.
PDR on reparametrized abstract model
6s102
!
! 1.66 0.40 1000
!
6s121 !
! 64.00 5.43
6s132 !
! 7.82 7.90
6s144 ! 10.34 9.56 100
!
6s159 !
! 0.04 0.03
6s164 !
! 6.97 6.83
6s189 !
! 11.86 8.59 10
6s194 !
! 4.71 3.72
6s30 !
! 15.92 12.29
6s43 !
! 13.63 9.15 1
6s50 !
! 14.66 5.44
6s51 !
! 18.72 4.08
bob05 ! 0.33 0.26
! 0.1
bob1u05cu ! 0.33 0.26 0.1 1 10 100 1000
PDR on abstract model
Table 4. Runtime improvements for PDR. Column “NoRep.” shows runtimes (in seconds) for proving the property of each
benchmark using PDR after abstraction, but without reparameterization; column “Rep.” shows runtimes after reparameterization.
The scatter plot on the right places these runtime pairs on a log-scale. The average speedup is 2.5x.
[7] Berkeley Logic Synthesis Group. ABC: A Sys- Products Forms from Binary Decision Diagrams. In
tem for Sequential Synthesis and Verification. Proc. of SASIMI, 1992.
http://www.eecs.berkeley.edu/˜alanmi/abc/, v00127p. [11] Berkeley Verification and Synthesis Research Center.
[8] T. Lengauer and R.E. Tarjan. A Fast Algorithm for ABC-ZZ: A C++ framework for verification & syn-
Finding Dominators in a Flowgraph. In ACM Trans- thesis. https://bitbucket.org/niklaseen/abc-zz.
actions on Programming Languages and Systems, Vol. 1,
No. 1, July, pages 121–141, 1979.
[9] K. L McMillan. Interpolation and SAT-based Model
Checking. In Proc. of CAV, 2003.
[10] S. Minato. Fast Generation of Irredundant Sum-Of-
5