=Paper= {{Paper |id=None |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-1130/difts13Proceedings.pdf |volume=Vol-1130 }} ==None== https://ceur-ws.org/Vol-1130/difts13Proceedings.pdf

Program Proceedings

International Workshop on Design and
Implementation of Formal Tools and Systems

Portland, OR
October 19, 2013
Co-located with FMCAD and MEMOCODE
Preface
The second DIFTS (Design and Implementation of Formal Tools and Systems) workshop
was held at Portland, Oregon on Oct 19, 2013, co-located with Formal Methods in Computer-
Aided Design Conference (FMCAD), and Formal Methods and Models for Co-design
(MEMOCODE). The workshop emphasized the insightful experiences in tool and system design.
The goal of the workshop is to provide a forum for sharing challenges and solutions that are
original with ground breaking results. The workshop provided an opportunity for discussing
engineering aspects and various design decisions required to put such formal tools and systems
into practical use. It took a broad view of the formal tools/systems area, and solicited
contributions from hardware and software domains such as decision procedures, verification,
testing, validation, diagnosis, debugging, and synthesis.
The workshop received 10 original submissions, out of which 3 were chosen under tool
category, and 3 were chosen under system category. There were also three invited talks: first was
given by Rance Cleaveland, Reactive Systems Inc., USA on “Approximate Formal verification
using Model-based Testing”, second was given by Masahiro Fujita, University of Tokyo on
“Diagnosis and correction of buggy hardware/software with formal approaches”, and third talk
was given by Dhiraj Goswami, Synopsys Inc. on “Stimulus generation, enhancement and debug
in constraint random verification.”
First of all, we thank FMCAD’s steering committee for their continual support. We also
thank FMCAD chairs Sandip Ray and Barbara Jobstmann, and MEMOCODE chairs Marly
Roncken and Fei Xie for a seamless organization. We also thank Joe Leslie-Hurd for his help in
local arrangements. We thank Boğaziçi University, Turkey for hosting the DIFTS website. We
sincerely thank the program committee members and sub reviewers for selecting the papers and
providing candid review feedbacks to the authors. Last but not least, we thank all the authors for
contributing to the workshop and to all the participants of the workshop.

Malay K. Ganai and Alper Sen
Program Chairs
DIFTS 2013

i
DIFTS13 Program Committee

General Program Chairs

Malay Ganai NEC Labs America, USA
Alper Sen Bogazici University, Turkey

Program Committee

Armin Biere Johannes Kepler University, Austria
Gianpiero Cabodi Politecnico di Torino, Italy
Franco Fummi University of Verona, Italy
Malay Ganai NEC Labs America, USA
Daniel Grosse University of Bremen, Germnay
William Hung Synopsys Inc, USA
Daniel Kroening Oxford University, UK
Alper Sen Bogazici University, Turkey
Ofer Strichman Technion - Israel Institute of Technology, Israel
Chao Wang Virginia Tech, USA

ii
DIFTS13 Additional Reviewers

Additional Reviewers

Balakrishnan, Gogul
Eldib, Hassan
Horn, Alexander
Ivrii, Alexander
Kusano, Markus
Le, Hoang
Schrammel, Peter
Sinz, Carsten
Sousa, Marcelo
Suelflow, Andre

iii
DIFTS13 Table of Contents

Table of Contents
Approximate Formal Verification using Model-based Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Rance Cleaveland (Invited speaker)
Diagnosis and Correction of Buggy Hardware/Software with Formal Approaches . . . . . . . . . . 2
Masahiro Fujita (Invited speaker)
Stimulus generation, enhancement and debug in constraint random verification . . . . . . . . . . . 3
Dhiraj Goswami (Invited speaker)
A Fast Reparameterization Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Niklas Een and Alan Mishchenko
LEC: Learning driven data-path equivalence checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Jiang Long, Robert Brayton and Michael Case
Trading-off Incrementality and Dynamic Restart of Multiple Solvers in IC3 . . . . . . . . . . . . . . 19
Marco Palena, Gianpiero Cabodi and Alan Mishchenko
Lemmas on Demand for Lambdas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Mathias Preiner, Aina Niemetz and Armin Biere
CHIMP: a Tool for Assertion-Based Dynamic Verification of SystemC Models . . . . . . . . . . . . 38
Sonali Dutta, Moshe Y. Vardi and Deian Tabakov
Abstraction-Based Livelock/Deadlock Checking for Hardware Verification . . . . . . . . . . . . . . . . 46
In-Ho Moon and Kevin Harer

iv
Approximate Formal Verification using Model-based Testing
Rance Cleaveland
University of Maryland, USA

Abstract: In model-based testing, (semi-)formal models of systems are used to drive the
derivation of test cases to be applied to the system-under-test (SUT). The technology has long
been a part of the traditional hardware-design workflows, and it is beginning to find application
in embedded-software development processes also. In automotive and land-vehicle control-
system design in particular, models in languages such as MATLAB® / Simulink® /Stateflow®
are used to drive the testing of the software used to control vehicle behavior, with tools like
Reactis®, developed by a team including the speaker, providing automated test-case generation
support for this endeavor. This talk will discuss how test-case generation capabilities may also be
used to help verify that models meet formal specifications of their behavior. The method we
advocate, Instrumentation-Based Verification (IBV), involves the formalizaton of behavior
specifications as models that are used to instrument the model to be verified, and the use of
coverage testing of the instrumented model to search for specification violations. The
presentation will discuss the foundations of IBV, the test-generation approach and other features
in Reactis® that are used to support IBV, and the results of several case studies involving the use
of the methods.

1
Diagnosis and Correction of Buggy Hardware/Software with
Formal Approaches
Masahiro Fujita
University of Tokyo, Japan

Abstract: There have been intensive researches on debugging hardware as well as software.
Some are very ad-hoc and based on simple heuristics, but others utilize formal methods and are
mathematically modeled. In this talk, we first review various proposals on debugging from
historical viewpoints, and then summarize the state-of-the-art in terms of both diagnosis and
automatic correction of designs. In particular we show various approaches with SAT-based
formulations of diagnosis and correction problems. We also discuss about them in relation to
manufacturing test techniques. That is, if the design errors are within the pre-determined types
and/or areas, there could be very efficient ways to formally verify, diagnosis and correction
methods with small numbers of test vectors. In the last part of the talk, future perspectives
including post-silicon issues are discussed.

2
Stimulus Generation, Enhancements and Debug in Constraint
Random Verification
Dhiraj Goswami
Synopsys, USA

Abstract: Verification cost in an IC design team occupies 60-80% of the entire working
resources and efforts. Functional verification, posed at the foremost stage of the IC design flow,
determines the customers' ability to find bugs quickly and thereby their time-to-results (TTR)
and cost-of-results (COR). Consequently, functional verification has been the focus of the EDA
industry for the last several decades.

Constrained random simulation methodologies have become increasingly popular for functional
verification of complex designs, as an alternative to directed-test based simulation. In a
constrained random simulation methodology, random vectors are generated to satisfy certain
operating constraints of the design. These constraints are usually specified as part of a testbench
program (using industry-standard testbench languages, like SystemVerilog from Synopsys, e
from Cadence, OpenVera, etc.). The testbench automation tool (TBA) is then expected to
generate random solutions for specified random variables, such that all the specified constraints
over these random variables are satisfied. These random solutions are then used to generate valid
random stimulus for the Design Under Verification (DUV). This stimulus is simulated using
industry-standard simulation tools, like VCS from Synopsys, NC-Verilog from Cadence, etc.
The results of simulation are then typically examined within the testbench program to monitor
functional coverage, which gives a measure of confidence on the verification quality and
completeness.

In this talk we will review the challenges of stimulus and configuration generation using
constraint random verification methodology. We will also explore why state-of-the-art debug
solutions are important to handle complexity and improve the quality of stimulus.

3
A Fast Reparameterization Procedure

Niklas Een, Alan Mishchenko
{een,alanmi}@eecs.berkeley.edu

Berkeley Verification and Synthesis Research Center
EECS Department
University of California, Berkeley, USA.

Abstract. Reparameterization, also known as range pre- the original node can be removed. The old primary in-
serving logic synthesis, replaces a logic cone by another logic puts dominated by the given node are also removed by this
cone, which has fewer inputs while producing the same out- procedure.
put combinations as the original cone. It is expected that
a smaller circuit leads to a shorter verification time. This Example. Suppose a design contains inputs x1 , x2 ,
paper describes an approach to reparameterization, which and a gate Xor(x1 , x2 ); and that furthermore, x1 has
is faster but not as general as the previous work. The new no other fanouts besides this Xor-gate. Then, no mat-
procedure is particularly well-suited for circuits derived by
ter which value x2 takes, both a zero and a one can
localization abstraction.
be forced at the output of the Xor by setting x1 ap-
propriately, and thus the Xor-gate can, for verification
purposes, be replaced by a primary input.
1 Introduction
The proposed method to find similar situations starts by
The use of reparameterization as a circuit transformation in
computing all dominators of the netlist graph, then for each
the verification flow was pioneered by Baumgartner et. al.
candidate node dominating at least one PI the following
in [1]. In their work, new positions for the primary inputs
quantification problem is solved: “for each assignment to
(PIs) are determined by finding a minimum cut between
the non-dominated gates, does there exist a pair of assign-
the current PI positions and the next-state variables (flop
ments to the dominated PIs that results in a zero and a
inputs). BDDs are then used to compute the range (or im-
one at the candidate node”. More formally, assuming that
age) on the cut. Finally, a new logic cone with the same
x represents non-dominated gates (“external inputs”) and
range (but fewer PIs), is synthesized from the BDDs and
yi represents dominated PIs (“internal inputs”), the follow-
grafted onto the original design in place of the current logic
ing is always true for the node’s function φ:
cone. This is a very powerful transformation, but it has po-
tential drawbacks: (i) the BDDs may blow up and exhaust ∀x ∃y0 , y1 . ¬φ(x, y0 ) ∧ φ(x, y1 )
the memory, (ii) the extracted circuit may be larger than
the logic it replaces, and (iii) the runtime overhead may be Important features of this approach are:
too high.
In contrast, the proposed approach is based on greedy lo- (i) It is circuit based, while early work on reparameteri-
cal transformations, capturing only a subset of optimization zation was based on transition relations [4].
opportunities. However, memory consumption is modest, (ii) In its simplest form, the proposed restructuring re-
runtimes are very low, and the resulting design is always places some internal nodes by new primary inputs and
smaller, or of the same size, as the original design. It is remove dangling logic.
shown experimentally that the proposed method leads to
sizeble reductions when applied for circuits produced by (iii) The analysis is completely combinational: no informa-
localization abstraction [5]. tion on the reachable state-space is used.
(iv) If the property was disproved after reparametrization,
2 Fast Reparameterization it is straight-forward to remap the resulting counter-
example to depend on the original primary inputs.
The fast reparameterization algorithm is based on the fol-
lowing observation: if a node dominates1 a set of PIs, and It is important to realize that by analyzing and applying
those PIs are suﬃcient to force both a zero and a one at reductions in topological order from PIs to POs, regions
that node, regardless of the values given to the other PIs amenable to reparameterization are gradually reduced to
and state variables, then that node can be replaced by a contain fewer gates and PIs. By this process, the result of
new primary input, while the unused logic cone driving repeatedly applying local transformations can lead to a sub-
stantial global reduction. In the current implementation,
1 A node n dominates another node m iﬀ every path from m to a
the above formula is evaluated by exhaustive simulation of
primary output goes through node n.

4
[cand] [cand]
Fast Reparameterization
− Compute dominators. For a DAG with bounded
& & in-degree (such as an And-Inverter-Graph), this is a
linear operation in the number of nodes (see Figure 3).
− For each PI, add all its dominators to the set of
“candidates”.
& & ! &
− For each candidate c, in topological order from inputs
to outputs:
- Compute the set D of nodes dominated by c (Figure 4).
! ? ! ?
- Denote the PIs in D “internal” inputs.

[cand] - Denote any node outside D, but being a direct fanin of a
node inside D, an “external” input (may be any gate type).
- Simulate all possible assignments to the internal and ex-
ternal inputs. If for all external assignments there exists
& an internal assignment that gives a 0 at c, and another in-
ternal assignment that gives a 1 at c, then substitute c by
a new primary input.

! &
Figure 2. Steps of the reparameterization algorithm.

& &
Compute Dominators
− Initialize all POs to dominate themselves
! ? ! ? − Traverse the netlist in reverse topological order (from POs
to PIs), and mark the children of each node as being domi-
nated by the same dominator as yourself unless the child has
Figure 1. Example of subgraphs that can be reduced. Here “!”
already been assigned a dominator
denote nodes we can control (a PI dominated by the output
node); “?” denote nodes that can assume any value beyond our − For already marked children, compute the “meet” of the
control. In the above examples, the output node “[cand]” can two dominators, i.e. find the first common dominator. If
be forced to both a zero and a one by choosing the right values there is no common dominator, mark the node as dominat-
for the “!”, regardless of the values of the “?”. In such cases, ing itself.
the output node is replaced by a new PI.
Figure 3. Review of the “finding direct dominators” algorithm
for DAGs. For a more complete treatment of this topic, see [8],
the logic cone rooted in the given node while the cone is or for a more precise description, the source code of “compute-
limited to 8 inputs. Limiting the scope to cones with 8 in- Dominators()” in “ZZ/Netlist/StdLib.cc” of ABC-ZZ [11].
puts and simulating 256 bit patterns (or eight 32-bit words)
seems to be enough to saturate the reduction achievable on
the benchmarks where the method is applicable. Compute Dominated Area
Some typical reductions are shown in Figure 1. The
graphs should be understood as sub-circuits of a netlist be- area = {w dom} – init. set of gates to the dominator
ing reparameterized. The exclamation marks denote “inter- count = [0, 0, . . ., 0] – count is a map “gate → integer”
for w ∈ area:
nal” PIs dominated by the top-node, and hence under our
for v ∈ faninsOf (w ):
control; and the question marks denote gates with fanouts
count[v ]++
outside the displayed logic cone, for which no assumption if count[v ] == num of fanouts[v ]:
on their values can be made. The full algorithm is described area = area ∪ {v }
in Figure 2.

Counterexample reconstruction. There are several Figure 4. Find nodes used only by gate “w dom”. This set is
sometimes called maximum fanout free cone (MFFC). The outer
ways that a trace on the reduced netlist can be lifted to the
for-loop over the elements of area is meant to visit all elements
original netlist. For instance, the removed logic between added by the inner for-loop. For this procedure, flops are treated
the new PIs and the old PIs can be stored in a separate as sinks, i.e. having no fanins.
netlist. The trace on the reduced netlist can then be pro-

2 5
jected onto the original PIs by rerunning the simulation ing Competition 2012 were considered. Localization ab-
used to produce the reparameterized circuit, and for each straction [5] was applied with a timeout of one hour to each
candidate pick an assignment that gives the correct value. benchmark and the resulting models meeting the following
But even simpler, one can just put the original netlist into criteria were kept:
a SAT solver and assert the values from the trace onto the
appropriate variables and call the SAT solver to complete – At least half of the flops were removed by abstraction.
the assignment. In practice, this seems to always work well.
– The abstraction was accurate (no spurious counterex-
amples).
3 Improvements
– At least one of the verification engines could prove the
The algorithm described in the previous section replaces
property within one hour.
internal nodes with inputs, and thus only removes logic. If
we are prepared to forgo this admittedly nice property and
The sizes of benchmarks selected in this way are listed in
occasionally add a bit of new logic, then nodes that are
table Table 1. All those models were given to the reparam-
not completely controllable by their dominated inputs can
eterization engine, both in weak mode and strong mode,
still be reparameterized by the following method: for node
the latter using the improvements described in section 3.
with function φ(x, y), where x are external inputs and y are
The reparameterized models were also post-processed with
internal inputs, compute the following two functions:
a quick simplification method called “shrink” which is part
φ0 (x) ≡ ∀y.¬φ(x, y) of the ABC package [7]. The longest runtime for weak repa-
φ1 (x) ≡ ∀y.φ (x, y) rameterization was 16 ms, for strong reparameterization
28 ms and for the simplification phase 50 ms.2 Reductions
Using these two functions, φ can be resynthesized using a are listed in table Table 2.
single input ynew by the expression: For comparison, Table 2 also include the results of run-
¬φ0 (x) ∧ (φ1 (x) ∨ ynew ) ning an industrial implementation of the BDD based algo-
rithm of [1]. Because runtimes are significantly longer with
In other words, if two or more inputs are dominated by this algorithm, they are given their own column. These
the node φ, a reduction in the number of inputs is guar- results were given to us from IBM, and according to their
anteed. Depending on the shape of the original logic, and statement “are not tweaked as much as they could be”.
how well new logic for φ0 and φ1 is synthesized, the num- All benchmarks were given to three engines: Property Di-
ber of logic gates may either increase or decrease. In our rected Reachability [2, 6], BDD-based reachability [3], and
implementation, logic for φ0 and φ1 is created by the fast Interpolation-based Model Checking [9]. The complete table
irredundant sum-of-product (“isop”) proposed by Shin-ichi of results is given in Table 3. A slice of this table, showing
Minato in [10]. We greedily apply this extended method only results for PDR, with and without (strong) reparam-
for all nodes with two or more dominated PIs, even if it eterization, is given in Table 4 together with a scatter plot.
leads to a blow-up in logic size. To counter such cases, fast
logic synthesis can be applied after the reparameterization.
Obviously, there are many ways to refine this scheme. Analysis. Firstly, we see a speedup of 100x-1000x over
previous work in the runtime of the reparameterization al-
gorithm itself, with comparable quality of results for the
4 Future Work application under consideration (models resulting from lo-
calization abstraction). This means the algorithm can al-
Another possible improvement to the method is extending ways be applied without the need for careful orchestration.
the reparameterization algorithm to work for multi-output Secondly, we see an average speedup of 2.5x in verifica-
cones. As an example, consider a two-output cone where tion times when applying reparameterization in conjunc-
the outputs can be forced to all four combinations {00, tion with PDR, which is also the best overall engine on
01, 10, 11} by choosing appropriate values for dominated these examples. For two benchmarks, 6s121 and 6s150,
inputs. In such a case, the cone can be replaced by two BDD reachability do substantially better than PDR, and
free inputs. If some of the four combinations at the out- for the latter (where runtimes are meaningful) the speedup
puts are impossible under conditions expressed in terms of due to reparameterization is greater than 3x. Furthermore,
non-controllable signals, a logic cone can be constructed to for BDD reachability one can see that on several occa-
characterize these conditions and reduce the number of PIs sions (6s30 in particular), reparameterization is completely
by adding logic similar to the case of a single-output cone. crucial for performance. Finally, interpolation based mod-
elchecking (IMC) seems to be largely unaﬀected by repa-
rameterization.
5 Experiments
2 Benchmarks from HWMCC’12 are quite small. For comparison:
As part of the experimental evaluation, all benchmarks running reparameterization on a 7 million gate design from one of our
from the single-property track of the Hardware Modelcheck- industrial collaborators took 4.1 s.

3 6
Abstraction Phase
! !
Design ! #And #PI #FF ! Depth
! !
6s102 !
! 6,594 72 1,121 → 56 !
! 23
6s121 !
! 1,636 99 419 → 110 !
! 19
6s132 !
! 1,216 94 139 → 113 !
! 7
6s144 !
! 41,862 480 3,337 → 236 !
! 18
6s150 !
! 5,448 146 1,044 → 323 !
! 103
6s159 !
! 1,469 13 252 → 18 !
! 4
6s164 !
! 1,095 91 198 → 93 !
! 16
6s189 !
! 36,851 479 2,434 → 259 !
! 18
6s194 !
! 12,049 532 2,389 → 198 !
! 45
6s30 ! 1,043,139
! 32,994 1,195 → 128 !
! 32
6s43 !
! 7,408 30 965 → 310 !
! 25
6s50 !
! 16,700 1,570 3,107 → 207 !
! 52
6s51 !
! 16,701 1,570 3,107 → 209 !
! 65
bob05 !
! 18,043 224 2,404 → 146 !
! 106
bob1u05cu ! 32,063 224 4,377 → 146 ! 106

Table 1. Sizes of original designs and abstract models. Column #FF shows how many flip-flops were turned into unconstrained
inputs by localization abstraction. The removed FFs show up as PIs in the other tables. Last column shows the BMC depth used
by the abstraction engine (see [5] for more details).
Reparameterization
! ! ! !
! No Reparam. ! Weak Rep. ! Strong Rep. ! BDD Reparam.
! ! ! !
! ! ! !
Design ! #And #PI ! #And #PI ! #And #PI ! #And #PI Runtime
! ! ! !
6s102 !
! 6,594 1,137 !! 1,247 331 !! 1,188 267 !! 1,283 285 71.91 s
6s121 !
! 1,636 408 !
! 627 120 !
! 559 66 !
! 732 41 0.27 s
6s132 !
! 1,216 120 ! 1,108
! 62 ! 1,102
! 55 ! 2,731
! 36 0.80 s
6s144 ! 41,862 3,580 !! 10,172 1,038 !! 9,494 812 !! 13,259 889 60.50 s
!
6s150 !
! 5,448 867 !! 3,062 506 !! 2,213 89 !! 2,231 46 1.81 s
6s159 !
! 1,469 247 !! 116 20 !! 114 19 !! 256 13 0.02 s
6s164 !
! 1,077 196 !! 661 109 !! 499 41 !! 3,413 40 11.61 s
6s189 ! 36,851 2,654 !! 10,033 1,004 !! 9,552 794 !! 12,051 814 63.11 s
!
6s194 ! 12,049 2,723 ! 1,366 184 ! 1,348 166 !! 1,612 121 10.37 s
! ! !
6s30 ! 102,535 34,061 !! 1,508 307 !! 1,184 205 !! 603 50 0.74 s
!
6s43 !
! 7,408 685 !! 3,451 304 !! 3,218 202 !! 17,691 101 2.07 s
6s50 ! 16,700 4,470 !! 1,841 350 !! 1,652 270 !! 4,319 752 46.57 s
!
6s51 !
! 16,701 4,468 !
! 1,828 350 !
! 1,639 268 ! 1,255
! 62 16.34 s
bob05 ! 18,043 2,358 ! 1,618 187 ! 1,388 52 ! 1,586 43 0.31 s
! ! ! !
bob1u05cu ! 32,063 4,455 ! 1,618 187 ! 1,388 52 ! 1,586 43 0.33 s

Table 2. Eﬀect of reparameterization on the abstracted models. “Weak” reparameterization refers to the basic method described
in section 2, “Strong” additionally includes the improvements discussed in section 3. Runtimes are omitted as the average CPU
time was 4-8 ms (and the longest 28 ms). However, BDD based reparameterization is not as scalable as the method presented in
this paper, and runtimes (typically between 100x-1000x longer) are listed for reference.

References [4] Pankaj Chauhan, Edmund Clarke, and Daniel Kroening.
A SAT-Based Algorithm for Reparameterization in
[1] J. Baumgartner and H. Mony. Maximal Input Reduc- Symbolic Simulation. In Proc. of DAC, 2004.
tion of Sequential Netlists via Synergistic Reparam-
eterization and Localization Strategies. In Proc. of [5] N. Een, A. Mishchenko, and N. Amla. A Single-
CHARME, pages 222–237, 2005. Instance Incremental SAT Formulation of Proof-
[2] Aaron Bradley. IC3: SAT-Based Model Checking and Counterexample-Based Abstraction. In FM-
Without Unrolling. In Proc. of VMCAI, 2011. CAD, 2010.

[3] R. E. Bryant. Graph-Based Algorithms for Boolean [6] Niklas Een, Alan Mishchenko, and Robert Brayton. Eﬃ-
Function Manipulation. In IEEE Transactions on Com- cient Implementation of Property Directed Reach-
puters, vol. c-35, no.8, Aug., 1986. ability. In Proc. of FMCAD, 2011.

4 7
Verification Runtimes
! ! !
!
! PDR !
! BDD reach. !
! IMC
! ! !
! NoRep. Weak Strong ! NoRep. Weak Strong ! NoRep. Weak Strong
! ! !
6s102 !
! 1.7 0.4 0.4 !! 121.2 90.2 204.3 !! – 2619.2 –
6s121 !
! 64.0 12.3 5.4 !! 0.5 0.3 0.4 !! 10.2 5.8 36.0
6s132 !
! 7.8 8.1 7.9 !! 2327.3 1375.5 – !! 201.4 384.9 267.9
6s144 !
! 10.3 12.4 9.6 !! – – – !! 1177.5 942.1 1413.0
6s150 !
! – 2323.7 – !! 539.3 143.6 189.1 !! – – –
6s159 !
! 0.0 0.0 0.0 !! 0.1 0.1 0.1 !! 0.1 0.1 0.1
6s164 !
! 7.0 34.1 6.8 !! – 33.6 0.4 !! 4.4 8.5 3.0
6s189 !
! 11.9 7.8 8.6 !! – – – !! 738.0 396.5 366.0
6s194 !
! 4.7 4.6 3.7 !! 307.5 42.5 742.2 !! 798.9 1042.8 1103.3
6s30 !
! 15.9 45.3 12.3 !! – 17.6 49.2 !! – – –
6s43 !
! 13.6 11.7 9.1 !! – 1191.1 1390.4 !! – – –
6s50 !
! 14.7 10.8 5.4 !! 941.6 241.0 1680.8 !! 1795.6 820.0 2348.9
6s51 !
! 18.7 7.1 4.1 !! 303.8 147.6 1891.2 !! 1806.8 954.8 1737.8
bob05 !
! 0.3 0.3 0.3 !! 2450.5 2371.4 1253.2 !! 30.3 26.3 25.1
bob1u05cu ! 0.3 0.2 0.3 ! – 2157.6 1088.3 ! 30.3 24.0 24.8

Table 3. Full table of results. Each design after abstraction is given to three engines: Property Directed Reachability (PDR),
BDD-based reachability (BDD), and Interpolation-based Model Checking (IMC). Each engine is run on three versions of the model
with no/weak/strong reparameterization applied. A dash represents a timeout after one hour.

!
Design ! NoRep. Rep.
PDR on reparametrized abstract model

6s102
!
! 1.66 0.40 1000
!
6s121 !
! 64.00 5.43
6s132 !
! 7.82 7.90
6s144 ! 10.34 9.56 100
!
6s159 !
! 0.04 0.03
6s164 !
! 6.97 6.83
6s189 !
! 11.86 8.59 10
6s194 !
! 4.71 3.72
6s30 !
! 15.92 12.29
6s43 !
! 13.63 9.15 1
6s50 !
! 14.66 5.44
6s51 !
! 18.72 4.08
bob05 ! 0.33 0.26
! 0.1
bob1u05cu ! 0.33 0.26 0.1 1 10 100 1000
PDR on abstract model

Table 4. Runtime improvements for PDR. Column “NoRep.” shows runtimes (in seconds) for proving the property of each
benchmark using PDR after abstraction, but without reparameterization; column “Rep.” shows runtimes after reparameterization.
The scatter plot on the right places these runtime pairs on a log-scale. The average speedup is 2.5x.

[7] Berkeley Logic Synthesis Group. ABC: A Sys- Products Forms from Binary Decision Diagrams. In
tem for Sequential Synthesis and Verification. Proc. of SASIMI, 1992.
http://www.eecs.berkeley.edu/˜alanmi/abc/, v00127p. [11] Berkeley Verification and Synthesis Research Center.
[8] T. Lengauer and R.E. Tarjan. A Fast Algorithm for ABC-ZZ: A C++ framework for verification & syn-
Finding Dominators in a Flowgraph. In ACM Trans- thesis. https://bitbucket.org/niklaseen/abc-zz.
actions on Programming Languages and Systems, Vol. 1,
No. 1, July, pages 121–141, 1979.
[9] K. L McMillan. Interpolation and SAT-based Model
Checking. In Proc. of CAV, 2003.
[10] S. Minato. Fast Generation of Irredundant Sum-Of-

5 8
LEC: Learning-Driven Data-path Equivalence
Checking
Jiang Long∗ , Robert K. Brayton∗ , Michael Case†
∗ EECS Department, UC-Berkeley

{jlong, brayton}@eecs.berkeley.edu
† Calypto Design Systems

{mcase}@calypto.com

Abstract— equivalence problem is NP-complete. However, on the extreme
In the LEC system, we employ a learning-driven approach for end, the complexity becomes O(1) of the size of the network
solving combinational data-path equivalence checking problems. if the two designs are structurally the same. An NP-complete
The data-path logic is specified using Boolean and word-level
operators in VHDL/Verilog. The targeted application area are C- problem can be tackled by using SAT-solvers as a general
to-RTL equivalence checking problems found in an industrial set- procedure. To counter the capacity limitation of SAT-solving,
ting. These are difficult because of the algebraic transformations it is crucial to reduce the complexity by identifying internal
done on the data-path logic for highly optimized implementations. match points and by conducting transformations to bring in
Without high level knowledge, existing techniques in bit-level more structural similarity between the two sides of the miter
equivalence checking and QF BV SMT solving are unable to
solve these problems effectively. It is crucial to reverse engineer logic.
such transformations to bring more similarity between the two
sides of the logic. However, it is difficult to extract algebraic Verilog operators SMT QF BV operators
logic embedded in a cloud of Boolean and word-level arithmetic Boolean &&, k, !, ⊕, mux and, or, not, xor, ite
operators. To address this, LEC uses a compositional proof bit-wise &, |, ∼, ⊕, mux bvand, bvor, bvnot, bvxor, bvite
methodology and analysis beyond the bit and word level by incor- arithmetic +, −, ∗, /, % bvadd, bvsub, bvmul, bvdiv, bvmod
porating algebraic reasoning through polynomial reconstruction. extract [] extract
LEC’s open architecture allows new solver techniques to be concat {} concat
integrated progressively. It builds sub-model trees, recursively comparator <, >, ≤, ≥ bvugt, bvult, bvuge, bvule
transformating the sub-problems to simplify and expose the shifter , bvshl, bvshr
actual bottleneck arithmetic logic. In addition to rewriting
TABLE I
rules that normalize the arithmetic operators, LEC supports S UPPORTED OPERATORS ( UNSIGNED )
conditional rewriting, where the application of a rule is dependent
on the existence of invariants in the design itself. LEC utilizes
A. Motivation
both functional and structural information of the data-path
logic to recognize and reconstruct algebraic transformations. A The differences between the two data-path logics under
case-study illustrates the steps used to extract the arithmetic equivalence checking are introduced by various arithmetic
embedded in a data-path design as a linear sum of signed transformations for timing, area and power optimizations.
integers, and shows the procedures that collaboratively led to
a successful compositional proof. These optimizations are domain specific and can be very spe-
cialized towards a particular data-path design and underlying
technology. They have the following characteristics:
I. I NTRODUCTION • The two sides of the miter logic are architecturally
With the increasing popularity of high-level design method- different and have no internal match points.
ologies there is renewed interest in data-path equivalence • Many expensive operators such as adders and multipliers
checking [3][13][18][20]. In such an application, a design are converted to cheaper but more complex implemen-
prototype is first implemented and validated in C/C++, and tations and the order of computations are changed. It is
then used as the golden specification. A corresponding Ver- not a scalable solution to rely on SAT solving on the
ilog/VHDL design is implemented either manually or au- bit-blasted model.
tomatically through high-level synthesis tool [2][4][15]. In • The parts of the transformed portion are embedded in
both cases, a miter logic for equivalence checking is formed a cloud of bit and word level operators. Algebraic ex-
to prove the correctness of the generated RTL model by traction [8][26] of arithmetic logic based on structural
comparing it against the original C/C++ implementation. patterns is generally too restrictive to handle real-world
The data-path logic targeted in this paper is specified post-optimization data-path logic.
using Verilog/VHDL. The bit and word-level operators in • Word-level rewriting uses local transformation. Without
Verilog/VHDL have the same semantic expressiveness as SMT high-level information, local rewriting is not able to make
QF BV theory[5]. Table I gives a one-to-one correspondence the two sides of the miter logic structurally more similar.
between Verilog and QF BV unsigned operators. Signed arith- Lacking high-level knowledge of the data-path logic, the
metic operators are also supported. The complexity of such an equivalence problems can be very difficult for gate-level equiv-
9
alence checking and general QF BV SMT solvers. Strategi- Original
miter
cally, LEC views the bottleneck of such problems as having new miter target
been introduced by high-level optimizations and employs a Verific Parser Frontend

collaborative approach to isolate, recognize and reconstruct
the high-level transformations to simplify the miter model by VeriABC
bringing in more structural similarities.
SSA Network
B. Contributions
Word−level Bit −level Learning−based
The LEC system incorporates compositional proof strate- Simulator AIG Transformations
gies, uses rewriting to normalize arithmetic operators, and
ABC Transformed
conducts analysis beyond bit and word level. The collaborating solvers RTL
procedures help to expose the actual bottleneck in a proof of
equivalence. The novel aspects of this system are: SAT UNSAT
1) It uses global algebraic reasoning through polynomial Fig. 1. Overall tool flow
reconstruction. In the case-study, it uses the functional
information of the design to reverse engineer the arith-
metic expression as a linear sum and also uses a struc- LEC tries to solve the miter directly using random simula-
tural skeleton of the original data-path to achieve the tion on the word-level simulator or by ABC[1]’s equivalence
equivalence proof. checking procedure dcec, which is a re-implementation of
2) It supports conditional rewriting and proves required iprove[24]. If unresolved, LEC applies transformations to the
invariants as pre-conditions. SSA and produces sub-models in Verilog miter format from
3) It uses recursive transformations that target making both which LEC can be recursively applied. The overall system
sides of the miter logic structurally more similar and integration is described in Section IV.
hence more amenable to bit-level SAT sweeping.
4) It has an open architecture, allowing new solver tech- III. L EARNING TECHNIQUES
niques to be integrated progressively. In this section, we present the techniques implemented in
Through a case study, we demonstrate the steps that were LEC. Even though some are simple and intuitive, they are
used to reconstruct the arithmetic embedded in a data-path powerful when integrated together as demonstrated in the
design as a linear sum of signed integers, as well as all the pro- experimental results. All techniques are essential bdecause
cedures that compositionally led to a successful equivalence LEC may not achieve a final proof if any one is omitted.
proof. The experimental results demonstrate the effectiveness Their interactions are illustrated in the case-study in Section
of these collaborating procedures. V.

miter miter miter
C. Overview
The overall tool flow is described in Section II. Learning F=G F=G F=G

techniques and system integration are presented in Section III F (̄x ) G(̄x ) F (̄x ) G(̄x ) F (̄x ) G(̄x )
and IV. A case study is presented in Section V. Experimental
results is presented in Section VI followed by a comparison
red blue red blue
with related work and conclusion.

purple kept
II. T OOL FLOW
input : ̄x abstract input x̄'
LEC takes Verilog/VHDL as the input language for the data- input : ̄x
redundant
path logic under comparison. Internally, a miter network, as
in Figure 2(a), is constructed comparing combinational logic input : ̄x
(a)miter network (b )structurally hashed
functions F and G. Figure 1 illustrates the overall tool flow.
(c )abstraction
First, the Verific RTL parser front-end[6] is used to compile Fig. 2. Miter network
input RTL into the Verific Netlist Database. VeriABC[23]
processes the Verific netlist, flattens the hierarchy and produces
an intermediate DAG representation in static single assignment A. Structural information
(SSA) form, consisting of Boolean and word-level operators An SSA netlist is a DAG of bit and word-level operators
as shown in Table I. Except for the hierarchical information, annotated with bit-width and sign information. In the tool flow,
the SSA is a close-to-verbatim representation of the original both Verific and VeriABC perform simple structural hashing
RTL description. From SSA, a bit-blasting procedure generates at the SSA level, merging common sub-expressions. After
a corresponding bit-level network as an AIG (And-inverter merging, the miter logic is divided into three colored regions
graph). Word-level simulation models can be created at the using cone of influence (COI) relations, as in Figure 2 (b).
SSA level. ABC[1] equivalence checking solvers are integrated
• Red: if the node is in the COI of F only
as external solvers.
10
• Blue: if the node is in the COI of G only removes the logic between the cut and the inputs. An abstract
• Purple: the node is in the COI of both sides of the miter model is formed by replacing the cut signals with free inputs
i.e. common logic x̄0 . If this abstracted miter is UNSAT, then the original miter
The purple region is the portion of the miter logic that is UNSAT. In our current implementation, LEC traverses the
has been proved equivalent already, while the red and blue SSA network in topological order from the inputs. As each
regions are the unresolved ones. LEC makes progress by node is tentatively replaced with new PIs, simulation is used
reducing the red/blue regions and increasing the purple region. to validate the replacement. If successful, the node is omitted
The common logic constrains the logic for the red and blue and replaced with the new PIs and the next node is processed
regions, which may be abstracted (see Section III-E) to reduce similarly.
redundancy and possibly expose the real bottleneck in a proof. A successful abstraction step removes irrelevant logic and
exposes a smaller unresolved region of the miter logic, al-
lowing LEC to continue using other procedures. In addition,
B. Simulation model
as seen from experimental results, the reduction of common
Two word-level simulators are generated from the SSA logic can reduce significantly the amount of complexity for
network. One is a native interpreted model. The other uses downstream SAT-solving, e.g. when common multipliers being
the open-source Verilator[29] for compiled simulation. From removed from the miter logic. An unsuccessful abstraction
the SSA network, LEC automatically generates C++ code for when the abstract miter becomes SAT, indicates the existence
pseudo-random input drivers and for monitoring design be- of a rare event not being captured during random simulations.
havior. Verilator compiles the Verilog miter logic, links in the Often, this gives hints for selecting case-splitting candidates.
generated C++ code and produces a simulator as a standalone
executable. Efficient and effective simulation is crucial in our
current flow in capturing potential constants and potential F. Rewriting
internal equivalent points at the SSA level. Simulation is also Similar to [20], word-level rewriting transforms an SSA
used to reduce common logic in the abstraction computation network into a structurally different but functionally equivalent
procedure. one. Through rewriting, certain equivalence checking problems
can become much simpler. In our experience, a multiplier is
C. Bit-level model often a source of difficulty in data-path equivalence checking.
If two multipliers from opposite sides of the miter are matched
As shown in Figure 1, an AIG is created from the SSA exactly, LEC can simplify the miter through structural hashing
network by bit-blasting. LEC calls ABC[1]’s SAT sweeping and treat them as common logic. This is most effective when
procedure dcec to perform direct solving at the bit level. Using combined with the abstraction procedure as the common
the AIG model, the native SAT solver embedded in LEC can multiplier can now be totally removed.
be used to obtain a formal proof for a particular query. Typical In LEC, a few rules are hard-coded through pattern match-
queries are for extracting constant nodes, proving suspected ing applied to the SSA network. The goal is to process
equivalent pairs of points or conducting particular learning for multiplications so that they can be matched exactly. This
rewriting. Book-keeping information between the SSA nodes rewriting is implementation specific; for illustration purposes,
and the AIG nodes allows queries to be constructed at the we list a few rewriting rules in Table II using Verilog notation
word-level and verified at the bit-level. The result is then used and the semantics of the operators.
to simplify the SSA network.
The first rule is the normalization of multiplier operands. If
a multiplier uses a partial product generator and a compressor
D. Constants and Potential Equivalent Points (PEPs) tree, switching the operands of the multiplication becomes a
At the word-level, candidates for constants and PEPs are very hard SAT problem because at the bit level the imple-
identified through simulation and SAT queries are posed. Each mentation is not symmetrical. It is almost imperative to apply
such SAT query is constructed and checked at the bit-level. this rule whenever possible. The second and third rules use
SAT-solving is configured at a low-effort level (run for a few the distributive laws of multiplication and multiplexing. Rules
seconds) for these types of queries. Proven constants and PEPs 4 and 5 remove the shift operator when it is used with
are used immediately to simplify the SSA network, leading extract and concat because it is hard for multiplication to
to a new sub-model of less complexity. LEC then continues be restructured through the operator. Rule 6 distributes
to process the sub-model. In the presence of unproven PEPs, multiplication through the concat of two bit vectors using
LEC can choose one as the next miter target, normally the +. It uses the fact that the concatenation {a, b[n − 1 : 0]} is
smallest in terms of the number of nodes in its COI. The proof equivalent to a ∗ 2n + b[n − 1 : 0].
progresses as constants and PEPs are identified and used to The following is a more complex rule that distributes + over
simplify the miter model. the extract operator. The right hand side is corrected with a
third term, which is the carry bit from adding the lower n bits
of a and b.
E. Abstraction
(a + b)[m : n] =
As illustrated in Figure 2 (c), in computing an abstraction, (1)
LEC computes a cut in the purple region (common logic), and a[m : n] + b[m : n] + (a[n − 1 : 0] + b[n − 1 : 0])[n]

11
Before After
implemented to trace back from the sel port of a mux node
1 a∗b b∗a
2 mux(cond, d0, d1) ∗ c mux(cond, d0 ∗ c, d1 ∗ c) through its Boolean fanins and choose the candidates that have
3 mux(cond, d0, d1)[m : n] mux(cond, d0[m : n], d1[m : n]) the highest controllability.
4 a[m : 0] n { (m-n)’b0, a[m:n] } Another advantage of case-splitting is that the co-factored
5 (a[m : 0] n)[m − n : 0] a[m : n] sub-models contain new candidates for constants and PEPs,
6 {a, b[n − 1 : 0]} ∗ c a ∗ c n + b[n − 1 : 0] ∗ c which lead to other down-stream transformations not possible
TABLE II before. Case-splitting also reduces the amount of Boolean
R EWRITING RULES
logic in the SSA network and exposes the data-path logic to
high-level learning such as polynomial construction.
Repeatedly applying the above rules, LEC transforms the SSA
network and keeps only the ∗ and + operators, enhancing H. Polynomial construction
the possibility of multipliers to be matched. Note that the
above rule (1) and Rule 4-6 in Table II are correct for Reasoning at the word-level, rewriting rules are based on
unsigned operators. Currently, for signed operators, due to the arithmetic properties of the corresponding operators such
sign extension and the two’s complement representation of the as the commutative law of integer multiplication. However,
operands, we have not implemented a good set of rewriting rewriting applies only local transformations and does not have
rules. a global view. In situations when the miter logic is constructed
1) Conditional rewriting: The following equation from arithmetic optimization at the polynomial level, local
rewriting is not able to bring similarity into the miter for
(a c) ∗ b = (a ∗ b) c (2) further simplification. In such a situation, LEC tries to recon-
struct the polynomial of the whole miter model to establish
reduces the bit-width of a multiplier on the left hand side to
equivalence through arithmetic or algebraic equivalences and
a smaller one on the right. It is correct if a, b, c are integers
then use high level transformations to prove the equivalence
but incorrect in Verilog semantics, which uses modulo integer
of the original miter.
arithmetic. However, if the following is true within the miter
As a generic procedure, LEC follows four steps to prove a
model in modulo integer semantics
miter network F (x̄) = G(x̄) where F and G are the top level
( (a c) c) == a (3) signals being compared, and x̄ is the vector of input variables
(bit-vectors):
then equation (2) is valid. In such a situation, LEC identifies
the pattern on the left hand side of (2) in the SSA network 1) Conjecture (possibly by design knowledge) about the
and executes a SAT query concerning (3) using the AIG model algebraic domain of the polynomial, e.g. signed vs.
through bit-level solvers. The transformation to the left hand unsigned integer, modulo integer arithmetic, the order
side of (2) is carried out only if the query is proven to be an of the polynomial etc. These conjectures set up the
invariant. Such a transformation may produce an exact match framework and semantics for polynomial reconstruction
of a∗b afterwards, which can be crucial for achieving the final as illustrated in the case-study of Section V.
proof. 2) Determine a polynomial f and create a logic network
F 0 such that the following can be proved formally.

G. Case-split F 0 implements f (5)
Case-splitting on a binary signal, cofactors the original miter F 0 = F (6)
model into two sub-models. The miter is proven if both sub-
models are proven, or falsified if any sub-model is falsified. Al- How f is constructed is domain and test-case dependent.
though exponential in nature, if many signals are chosen, case- In the case-study of Section V, we use simulation
splitting can simplify the underlying bit-level SAT solving patterns to probe for the coefficients of a linear function.
significantly. For example, it is difficult to prove the following 3) Determine a polynomial g and create a logic network
miter structure directly through bit-blasting and SAT solving G0 such that the following can be proved formally.
at the AIG level
G0 implements g (7)
(x + y) ∗ (x + y) == x ∗ x + 2 ∗ x ∗ y + y ∗ y (4) miter G0 = G (8)
where x is a 32-bit integer and y a single binary signal. 4) Establish the following equivalence formally at the al-
However, it can be proven easily if case-splitting is done on gebraic level.
y = 0 and y = 1. After constant propagation, the bit-level
solver can prove both sub-models easily. f =g (9)
The current case-splitting mechanism supports cofactoring
on an input bit or input bit-vector. In verifying the test cases The combination of Items 2, 3, and 4 establishes the equiva-
experienced so far, the case splits are conducted on a bit, a lence proof of the original miter model F = G. In constructing
bit-vector equal to zero or not, or on the lsb or msb of a F 0 and G0 , we try to make them as structurally similar to F
bit-vector equals to zero or not. A heuristic procedure can be and G as possible. Details are given in Section V.
12
Sub-model Return
IV. S YSTEM I NTEGRATION
simulator SAT
The above learning techniques are integrated in LEC as a set AIG SAT/UNSAT
of logically independent procedures. Each procedure produces simplified SAT/UNSAT
one or more sub-models, illustrated as a tree in Figure 3. abstraction UNSAT
The root node is the current targeted Verilog miter model. rewrite SAT/UNSAT
case-split SAT/UNSAT
It has eight children. The simulator and AIG models are
linear construction SAT/UNSAT
the ones described in Figure 1. The simplif ied sub-model is PEP SIMPLIFY
generated by constant propagation and merging proven PEPs.
TABLE III
The abstraction and rewrite sub-models are created by the S UB MODEL RETURN VALUE
abstraction and rewrite procedures in the previous section.
The case-split sub-model consists of a set of sub-models,
corresponding to the cofactoring variables selected. In the is disjunctive, which includes the root and all the leaf nodes
current implementation, the user needs to input the set of in bold font. The case-split and linear construction nodes
signals to case-split on; eventually they will be selected by are conjunctive; a P EP node is disjunctive. The semantics,
heuristics. The linear-construction node has two sub-models shown in the following tables, are used to resolve the proof
which will be explained in detail in Section V. When PEPs result of the parent model from its immediate sub-models. To
are identified through simulation, a P EP node is create with complete the calculus, we introduced two values: CON and
the set of unproven-PEPs as sub-models. BOT, where CON stands for an internal conflict indicating a
potential LEC software bug and BOT is the bottom of the
Verilog Miter Model simulator
value lattice and acts like an uninitialized value.
AIG
simplified k SAT UNS UNK SMP CON BOT
abstraction SAT SAT CON SAT SAT CON SAT
rewrite UNS CON UNS UNS UNS CON UNS
case-split case 0 UNK SAT UNS UNK SMP CON UNK
case
... 1 SMP SAT UNS SMP SMP CON SMP
case n CON CON CON CON CON CON CON
linear-construction case F BOT SAT UNS UNK SMP CON BOT
case G
PEP pep 0 TABLE IV
pep 1 D ISJUNCTIONS OF MODELS
...
pep m
& SAT UNS UNK SMP CON BOT
SAT SAT SAT SAT n/a CON SAT
LEC
UNS SAT UNS UNK n/a CON UNS
Fig. 3. Branching sub-model tree UNK SAT UNK UNK n/a CON UNK
SMP n/a n/a n/a n/a n/a n/a
Two nodes in the sub-model tree are terminal. One is the CON CON CON CON n/a CON CON
simulator model which can falsify the miter through random BOT SAT UNS UNK n/a CON BOT
simulation. The other is the AIG model where ABC’s bit-level TABLE V
dcec procedure is applied. The rest of the leaf models (in bold C ONJUNCTION OF MODELS
font) are generated as Verilog miter models, which have the
same format as the root node. LEC procedures can be applied Tables IV and V are the truth tables for the disjunction and
recursively to these leaf nodes to extend the sub-model trees to conjunction semantics of the return values, in which UNS,
simpler ones. The LEC proof process progresses by expanding UNK, SMP stand for UNSAT, UNKNOWN, and SIMPLIFY.
the sub-model tree. A sub-model is resolved as SAT or UNSAT Assuming a bug free situation, at a disjunctive node, if either
from its sub-models’ proof results. SAT or UNSAT is returned from a sub-model, this is the final
Since there are no logical dependencies between sibling proof result for the parent. In conjunction, the parent must wait
sub-models, any branch can be chosen to continue the proof until all sub-models are resolved as UNSAT before deciding
process. Sibling sub-models can be forked in parallel from that its result is UNSAT, while any SAT sub-model implies
a parent process. A node in the sub-model tree determines the current model is SAT. A P EP node returns SIMPLIFY
its proof result from its children. Table III gives the possible to its parent if one of its sub-models, say pepi , is proven
return values from the first level sub-models. SIMPLIFY is UNSAT. In this case, the parent model can apply another round
returned by a P EP node to its parent model when at least of simplification to obtain a new simplif ied sub-model by
one of its sub-models, pepi , is proven UNSAT, notifying the merging the node pair in the just-proved pepi . The proof log in
parent node to simplify further with the newly proved pepi . Figure VI is a sample sub-model tree where only the branches
Depending on the logical relationships between a parent and that contributed to the final proof are shown. Indentation
its immediate sub-models, a node is either disjunctive or con- indicates the parent-child relationship. Recursively, the proof
junctive in semantics. In Figure 3, a Verilog miter model node result of the top level target is evaluated as UNSAT.
13
{
"case split": { three numbers on the right are the node counts in the red,
"case_0": "UNSAT by AIG" blue and purple regions (common logic) of the SSA network
"case_1": { as distinguished in Figure 2(a). Only those sub-models that
"simplified": { contributed to the final proof are shown in the figure. Others
"abstraction": { are ignored. As seen in Figure 5, the case-split procedure is
"case split": {
"case_00": "UNSAT by AIG", 1.original model : 366 332 776
"case_01": "UNSAT by AIG", 2. case-split
"case_10": "UNSAT by AIG", 3. case_0 : 366 331 844
"case_11": "UNSAT by AIG" 4. AIG : UNSAT
}, 6. case_1 : 366 332 776
}, 7. simplified : 344 289 675
}, 8. abstraction : 344 289 29
}, 9. case-split
}, 10. case_0 : 344 289 31
} 11. AIG : UNSAT
------------------------------------- 12. case_1 : 344 289 31
Miter proof result: [Resolved: UNSAT] 13. simplified : 343 288 27
------------------------------------- 14. PEP
Fig. 4. Illustration of proof log 15. pep_0 : 335 280 27
16. linear construction
17. case_F
Using this sub-model tree infrastructure, any new proce- 18. AIG : UNSAT
dures discovered in the future can be plugged into the system 19. case_G
easily. Also, the system is fully parallelizable in that siblings 20. AIG : UNSAT
21. simplified : 10 10 305
can be executed at the same time. The proof process can be 22. AIG : UNSAT
retrieved from the expanded sub-model tree.
Fig. 5. Sub-model proof tree
V. C ASE S TUDY
The design in this case-study is an industrial example taken applied twice, at lines 2 and 9. Both models have a single-
from the image processing domain. We verify specification bit input port, which was selected for cofactoring. ABC[1]
= implementation where the “specification” is a manually- immediately proved the first cofactored case, case 0 (3 and
specified high-level description of the design. “Implemen- 10) , using the AIG model at 4 and 11. The time-out for the
tation” is a machine-generated and highly optimized RTL dcec run was set to two seconds. Abstraction was applied at 8,
implementation of the same design using[2]. The miter logic significantly reducing the common logic from 675 to 29 SSA
is obtained through SLEC[3]. Therefore, the miter problem nodes, and effectively removing all the comparator logic. We
is verifying that the high-level synthesis (HLS) tool did not tried abstraction on the original model without the case-split
modify the design behavior. procedure and it failed to produce any result. The case-split at
This miter is sequential in nature, but here we examine a 2 removed enough Boolean logic and eliminated some corner
bounded model checking (BMC) problem which checks the cases such that the abstraction procedure was able to produce
correctness of the implementation at cycle N. This renders the an abstract model successfully.
problem combinational. This is industrially relevant because Model 15 is the smallest unproved PEP from model 13.
the sequential problem is too hard to solve in general, and It is proved using the linear construction procedure at 16,
even the BMC problem at cycle N becomes too difficult for which we shall describe in detail in Section V-A. Model
industrial tools. 21 is the simplified model of model 13 after merging the
The original design (specification) consists of 150 lines just-proved pep0 . After simplification, most of the logic in
of C++. It went through the Calypto frontend[3] and was model 21 became common logic through structural hashing,
synthesized into a word-level netlist in Verilog. The generated leaving only 10 nodes in each of the blue and red regions.
miter model has 1090 lines of structural Verilog code with 36 Model 21 was proved quickly by ABC which concludes the
input ports: 29 of which are 7 bits wide, 2 are 9 bits, 4 are proof of the original miter. In this case, the linear-construction
28 bits and one is a single-bit wire. The miter is comparing procedure was crucial in attaining the proof. However, the
two 28-bit values. We do not have knowledge about what the case-split, simplification, abstraction, and PEP models also are
design does except through structural statistics: no multipliers, very important because they collaborate in removing Boolean,
many adders, subtractors, comparators, shifters etc., together mux and comparator logic etc, but keeping only the part of the
with Boolean logic. From a schematic produced from the original miter logic which constitutes a linear function. Only
Verilog, there seems to be a sorting network implemented at this point, can a proof by the linear construction procedure
using comparators, but we can not tell anything further. succeed.
Figure 5 illustrates the compositional proof produced by the
LEC system by showing the sub-model tree created during A. Linear construction
the proof process. Indentations indicate parent and sub-model For model 15 in Figure 5, the SSA network contains many
relations and are listed in the order they were created. The +,−, and operators along with extract and concat oper-
14
ators, but contains no Boolean operators or muxes. The input input variables to compute the coefficients:
ports consist of twenty-five 7-bit or 12-bit wide ports. The
b = F (0, 0, ..., 0)
miter is comparing two 15-bit wide output ports. At this point,
simplification and abstraction can not simplify the model a0 = F (1, 0, ..., 0) − b
further. Also, there are no good candidates for case-splitting. a1 = F (0, 1, ..., 0) − b
The local rewriting rules can not be applied effectively without ...
having some global information to help converge the two sides
an−1 = F (0, 0, ..., 1) − b
of the miter logic. High-level information must be extracted
and applied to prove this miter model. Another round of random simulation on both the logic and
After the linear construction procedure through LEC, the the polynomial can be done to increase the likelihood of the
miter logic is found to be implementing the following linear conjecture. The same is repeated for G(x̄) to obtain g(x̄).
sum in the signed integer domain using two’s complement In integer arithmetic, f is equal to g if and only if the
representation: coefficients match exactly for each term:
f = g <=> ∀ i ai = a0i and b = b0 (12)
−16 ∗ x0 + 2 ∗ x1 + 2 ∗ x2 + 2 ∗ x3 + 2 ∗ x4 + 2 ∗ x5
So checking of f = g is trivial in this case. In other algebraic
+2 ∗ x6 + 2 ∗ x7 + 2 ∗ x7 + 2 ∗ x8 + 2 ∗ x9
domains, domain specific reasoning may have to be applied
+2 ∗ x10 + x11 + x12 + 2 ∗ x13 + 2 ∗ x14 + 2 ∗ x15 to derive algebraic equivalence e.g. in [28].
+2 ∗ x16 + 2 ∗ x17 + 2 ∗ x18 + 2 ∗ x19 − 2 ∗ x20 + 2 ∗ x21 3) Synthesizing implementations F 0 /G0 for f (f =g), struc-
+2 ∗ x22 + 2 ∗ x23 + 2 ∗ x24 + 14 turally similar to F/G: We want to find a Verilog implemen-
tation F 0 (x̄) of f such that
1) F 0 implements f
2) F 0 is structurally similar to F
One side of the miter implements the above sum as a plain
To do this, all nodes in the SSA network with arithmetic
linear adder chain (Figure 6(a)), the other side is a highly op-
operators +, − are marked, and edges connecting single bits
timized implementation using a balanced binary tree structure
are removed. A reduced graph is then created from the marked
(Figure 6(b)) and optimization tricks, which we don’t fully un-
nodes in the remaining graph maintaining the input/output
derstand. This is a hard problem for bit-level engines because
relations between marked nodes. This graph is a skeleton of
the implementation structure of F . For each of its nodes, we
annotate it with a conjectured linear sum computed in the
same way as in the above steps. The root node F is annotated
with f and internal nodes annotated with linear sums fs ,
ft , etc. For illustration purposes, Figure 7(a) shows such an
...
annotated reduced graph for node w. For an arbitrary node w
...
(a)linear adder chain (b) balanced adder tree u=... u=...
+ +
Fig. 6. Addition implementation v=... w=c s⋅s+c t⋅t + f st ( ̄x )
w=f w ( ̄x ) v=...
+ + + +
there are no internal match points to utilize. Therefore, LEC s=f s ( ̄x ) s=... t =...
resorts to trying a high-level method to establish equivalence + + t=f ( ̄ + +
t x)

at the polynomial level. The following are the detailed steps
̄x ̄x
for this specific case. (a)annotated reduced graph (b) substituted annotation
1) The conjecture: Assume the miter logic is F (x̄) = G(x̄) Fig. 7. Annotated reduced graph
as in Figure 2(a). LEC conjectures the following for the
arithmetic domain. in the reduced graph with inputs from nodes s and t, from the
• Signed integer arithmetic. The numbers are in 2’s com- annotation we have the following:
plement representation. s = fs (x̄)
• Assume F (x̄) and G(x̄) are implementing the linear sums
t = ft (x̄)
f and g of the forms
w = fw (x̄)
X
f (x̄) = ai · xi + b (10) We would like to substitute fw with variable s and t, such
X that w is a function of s and t in order to follow the structure
g(x̄) = a0i · xi + b0 (11) of the skeleton reduced graph. Because all the functions are
linear sums, we can compute, using algebraic division, two
2) Determining the coefficients of f , g and proving f = g constants cs and ct such that the following holds:
algebraically: Given the data-path logic F (x̄) and the linear
sum formula (10), it takes n + 1 simulation patterns on the n w = cs · s + ct · t + fst (x̄)

15
cs is the quotient of fw /fs and ct = (fw − cs · fs )/ft , while c[m : 0] ={a[n : 0]}[m : 0];
fst is the remainder of the previous division. The substitution assert a[n : 0] == {(n − m) ∗ {c[m − 1]}, c[m : 0]}
is conducted in one topological traversal from the inputs to the
miter output. After substitution, the annotated reduced graph The first two sets of assertions ensure there is no overflow of
is essentially a multi-level implementation of the above linear signed integer add and no non-zero remainder of division.
sum. Because the linear sum annotated at each node in the The third one ensures that extraction does not change the
reduced graph is only a conjecture, it may not be exactly the value in two’s complement representation. The SVA checkers
same function as in the original miter logic. However, in re- are formally verified separately using the VeriABC[23] flow,
implementing f using this structure, certain similarities are which in turn uses ABCs model checker.
still captured through the construction process. From the above procedures, we established the following:
This multi-level circuit is implemented by traversing the
f (x̄) = g(x̄) (13)
substituted reduced graph, and creating a corresponding Ver-
0
ilog file for F 0 . It is generated by allocating a bit-vector at F (x̄) implements f (x̄) (14)
each internal node with its bit-width equivalent to output port G0 (x̄) implements g(x̄) (15)
of F . The same can be done for G to obtain G0 . The reason F = F0 (16)
why LEC goes through so much trouble to obtain separate
G = G0 (17)
RTL implementations for F 0 and G0 (even though they are
both implementing f ), is that we need to prove F = F 0 and Altogether they establish the proof for F = G.
G = G0 separately next. Without the structural similarities Combining the above procedures into a single run, LEC took
created through this procedure, proving equivalence with an about 10 minutes on an i7 processor to complete the full proof.
arbitrary F 0 and G0 would be as difficult as proving the Roughly 80% of the time is spent on compiling and running
original F = G. Generally, only with these extra similarities random simulation, the rest are used for SAT solving. In table
injected, can the miter model be simplified enough to allow VI, we also compare this run-time against Boolector[11], z3
SAT sweeping to succeed for F = F 0 and G = G0 . [16] and ABC’ iprove [24] solver, all run on the same server.
4) Proving F = F 0 and G = G0 : We construct two miter It is clear that LEC expedites the proof significantly by using
models F = F 0 and G = G0 as caseF and caseG in Figure the knowledge of the linear sum formulation inside the miter
3, and apply LEC separately to each. By construction, each logic.
miter should be simpler than the original F = G because of
the increased structural similarity between the two sides of the Miter Boolector Z3 iprove LEC
miter. Another round of LEC might reduce this miter logic, if model 1 time-out time-out time-out 10min
not prove it directly through bit-level solvers. In the present TABLE VI
case, F = F 0 was proven instantly because F is a simple C OMPARISON WITH OTHER SOLVES ( TIME - OUT IN 24 HOURS )
linear adder chain, and so is F 0 . Proving G = G0 takes more
time using ABCs dcec because G is a highly optimized imple- In summary, polynomial reconstruction was a key tech-
mentation of f and the multi-level implementation from the nique to prove the underlying miter problem. The case-
annotated reduced graph only captures part of the similarity. study illustrates the major steps and the proof obligations
But, the injected similarity was sufficient enough to reduce the encountered during the process. The actual techniques used
complexity to be within dcec’s capacity. for different domains of the underlying arithmetic would
5) Proving that F 0 /G0 implements f /g : To complete the differ. Each algebraic domain would require special purpose
proof, we still have the proof obligation that F 0 and G0 heuristics and automated proof procedures to guarantee the
implement f and g respectively. By construction from the correctness of the reconstructions and transformations used
reduced graph, the generated Verilog is a verbatim translation in the method. The goal of the linear construction procedure
from the multi-level form of f . However, we need to bridge was to inject increased structural similarity by using global
the gap between Verilog’s bit-vector arithmetic vs. the integer algebraic transformations.
arithmetic of the linear sum. To do so, we created SVA
assertions to check that every Verilog statement captures the
integer value in full without losing precision due to underflows VI. E XPERIMENTAL RESULTS
or overflows. Table VI shows the experimental results comparing
Boolector[11], Z3[16] and iprove[24] using a 24-hour time-
c[n : 0] =a[n − 1 : 0] + b[n − 1 : 0];
out limit on an 2.6Ghz Intel Xeon processor. These models
assert (a[n − 1] & b[n − 1] =⇒ c[n]) are generated directly using SLEC[3] for checking C-to-RTL
assert (!a[n − 1] & !b[n − 1] =⇒ !c[n]) equivalence or extracted as a sub-target from PEPs. The first
column is the miter design name. The second column is the
c[n : 0] =a[n : 0]/b[m : 0]; number of lines of Verilog for the miter model specification.
assert (a[n : 0] == (c[n : 0] ∗ b[m : 0])[n : 0]) Run-time or time-out results are reported for each solver in
columns 3 to 6. Although the miter models are not big in terms
of lines of Verilog, they are quite challenging for Boolector,
Z3 and iprove. The run-time of LEC is the total CPU time
16
including Verilog compilation. It was expected that iprove it in 9 hours while both iprove and Boolector time out.
would not prove any of them because it works on the bit- This shows that LEC’s transformations through collaborating
blasted model without any high-level information that the other procedures successfully reduce the surrounding logic in the
solvers have. original model, which was preventing Z3 to prove it in 24
hours.
Design Lines Boolector z3 iprove LEC
mul 64 64 125 20 sec 200 sec timeout 10 sec The above experiments demonstrate the effectiveness of
d1 24 time-out time-out time-out 15 sec LEC’s collaborating procedures of simplification, rewriting,
d2 507 time-out time-out time-out 2 min case-splitting and abstraction computations. The LEC architec-
d3 191 time-out time-out time-out 15 min
d4 473 time-out time-out time-out 60 sec
ture allows these procedures to be applied recursively through
d5 pep 0 674 time-out 9 hour time-out 4 min a sub-model tree: the model obtained by one procedure
TABLE VII
introduces new opportunities for applying other procedures
B ENCHMARK COMPARISON (T IMEOUT 24 HOURS ) in the next iteration. As exemplified in miter d4, the initial
case-split gives rise to new opportunities for simplification
The miter, mul 64 64, is comparing a 64x64 multiplier as new constants are introduced by cofactoring. Then a new
with an implementation using four 32x32 multipliers as the round of abstraction is able to remove enough common logic
following: and expose three one-bit inputs as case-split candidates in
{aH , aL } ∗ {bH , bL } = (aH ∗ bH ) << 64 + the reduced model, which in turn gives rise to another case-
split transformation that leads to the final proof. None of
(aH ∗ bL + aL ∗ bH ) << 32 + aL ∗ bL this is possible without the transformations being applied in
where aH , aL , bH , bL are the 32-bit slices of the original sequence.
64-bit bit-vectors. Both Boolector and Z3 are able to prove it.
LEC proves it by first utilizing rewriting rules to transform VII. C OMPARISON WITH RELATED WORK
the 64x64 multiplier into four 32x32 multipliers, matching
the other four in the RHS of the miter. As they are matched In bit-level equivalence-checking procedures [24][25], sim-
exactly, they become common logic in the miter model. LEC ulation, SAT-sweeping, AIG rewriting and internal equiva-
then produces an abstraction and obtains a reduced model lence identification are all relevant to data-path equivalence-
with all the multipliers removed: the outputs of the multipliers checking. In LEC, these types of procedures are conducted
become free inputs in the abstract model. The abstract model at the word-level. Word-level rewriting is difficult if only a
is then proven instantly by ABC’s dcec on the AIG model. bit-level model is available. For example, with no knowledge
The miter d1, extracted from a PEP sub-model, is a demon- of the boundary of a multiplier, normalizing its operands is
stration of rewrite rule 6 in Table II using 32-bit multiplication. impractical at the bit-level . Although abstraction and case-
As both Boolector and Z3 fail to prove equivalence within split techniques in LEC can be applied at the bit-level in
the time-limit, they likely do not have this rewriting rule theory, these are not used due to the difficulty of comput-
implemented. ing an abstraction boundary or of finding good cofactoring
To prove d2, LEC conducts conditional rewriting using rule candidates.
(2) by first statically proving an invariant in the form of (3). SMT solving is relevant because a data-path is a subset
After the transformation, the multipliers are matched exactly of QF BV theory. Methods such as [7][11][16][14][17][19],
on both sides of the miter and removed in the subsequent are state-of-art QF BV solvers. These employ different imple-
abstract model. The final miter model is proved instantly by mentations of word-level techniques in rewriting, abstraction,
ABC on the bit level AIG. case-splitting, and simplification, and interleave Boolean and
The miter model d3 has part of its logic similar to word-level reasoning via a generalized DPLL framework or
mul 64 64 embedded inside. LEC proves d3 by first applying through abstraction refinements of various forms. Hector[20] is
rewriting rules repeatedly until no more rewriting is possible. closest to LEC in terms of technology and targeted application
Then, LEC computes a reduced model through abstraction. In domains, and has a rich set of word-level rewriting rules along
the reduced model, LEC conducts a case-split on a one-bit with some theorem prover [7] procedures to validate every
input. The case-0 AIG model is proven instantly, while case-1 rewriting applied. Hector also has an orchestration of a set of
is proven in about 10 minutes by ABC. bit-level solvers using SAT and BDD engines to employ once
The miter d4 is proven by first conducting a case-split of two the bit-level miter model is constructed. Strategically, LEC
bit-vector inputs: cofactoring on whether the bit-vector equals relies less on the capacity of SAT solver; instead it builds
zero or not. Three of the four cofactored cases are proven a compositional proof infrastructure and employs iterative
instantly. The one unresolved goes through a round of simpli- transformations to finally obtain a proof through sub-model
fication and abstraction. On the then obtained sub-model, three trees. The goal of these LEC learning procedures is to reverse
one-bit inputs are identified and cofactored through case-split engineer the embedded high-level algebraic transformations
procedures. LEC prove all eight cases quickly within a few and bring more similarity between both sides of the miter
seconds. model.
Miter d5 is extracted from model 15 in Figure 5 which The techniques in [26] [31][33] also try to reconstruct an
contains the purely linear sum miter described in the case- algebraic model from the underlying logic, but they employ a
study section. For this simpler miter, Z3 is able to prove bottom up approach and their primitive element is a half-adder.
17
The method in [8] simplifies the algebraic construction by [11] R. Brummayer and A. Biere. Boolector: An efficient smt solver for bit-
solving an integer linear programming problem. The limitation vectors and arrays. In Tools and Algorithms for the Construction and
Analysis of Systems, pages 174–177. Springer, 2009.
of these approaches is that they rely on the structural pattern [12] M. L. Case, A. Mishchenko, and R. K. Brayton. Automated extraction
of the underlying logic to reconstruct the algebraic model. of inductive invariants to aid model checking. In Formal Methods
On the other hand, the linear construction case-study in in Computer Aided Design, 2007. FMCAD’07, pages 165–172. IEEE,
2007.
Section V-A constructs the polynomial through probing with [13] P. Chauhan, D. Goyal, G. Hasteer, A. Mathur, and N. Sharma. Non-
simulation patterns. This is more general as it uses only the cycle-accurate sequential equivalence checking. In Proceedings of the
functional information of the data-path logic. For different 46th Annual Design Automation Conference, pages 460–465. ACM,
2009.
domains, other techniques may well be more applicable such [14] A. Cimatti, A. Griggio, B. J. Schaafsma, and R. Sebastiani. The mathsat5
as the bottom-up approach. The use of vanishing polynomials smt solver. In Tools and Algorithms for the Construction and Analysis
and Grobner bases in [27][28] to prove equivalence between of Systems, pages 93–107. Springer, 2013.
[15] J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang.
polynomials in the modulo integer domain can be utilized once High-level synthesis for fpgas: From prototyping to deployment.
a polynomial form is reconstructed in LEC. In many data-path Computer-Aided Design of Integrated Circuits and Systems, IEEE Trans-
miter models, such a polynomial in a certain domain or theory actions on, 30(4):473–491, 2011.
[16] L. De Moura and N. Bjørner. Z3: An efficient smt solver. In Tools
is likely embedded in other control and data-path logic. Direct and Algorithms for the Construction and Analysis of Systems, pages
application of algebraic techniques is often not practical. Thus 337–340. Springer, 2008.
the collaborating procedures in LEC are designed to bridge [17] B. Dutertre and L. De Moura. The yices smt solver. Tool paper at
http://yices. csl. sri. com/tool-paper. pdf, 2:2, 2006.
this gap and isolate such polynomials so that these high level [18] M. Fujita. Equivalence checking between behavioral and rtl descriptions
theories can then be applied. with virtual controllers and datapaths. ACM Transactions on Design
In conducting consistency checking between C and Verilog Automation of Electronic Systems (TODAES), 10(4):610–626, 2005.
[19] S. Jha, R. Limaye, and S. A. Seshia. Beaver: Engineering an efficient
RTL, the work [21] focuses on how to process a C program smt solver for bit-vector arithmetic. In Computer Aided Verification,
to generate formal models. The tool relies on SMT solvers pages 668–674. Springer, 2009.
[11][16][14] as the back-end solving engines. [20] A. Koelbl, R. Jacoby, H. Jain, and C. Pixley. Solver technology for
system-level to rtl equivalence checking. In Design, Automation & Test
In terms of tool architecture, [9] [10] [22], all employ in Europe Conference & Exhibition, 2009. DATE’09., pages 196–201.
a sophisticated set of transformations to simplify the target IEEE, 2009.
model during verification. These are done at the bit-level. The [21] D. Kroening, E. Clarke, and K. Yorav. Behavioral consistency of C and
Verilog programs using bounded model checking. In Proceedings of
LEC infrastructure allows future extension to take advantage DAC 2003, pages 368–371. ACM Press, 2003.
of multi-core parallelization as demonstrated in [30]. [12] [22] A. Kuehlmann and J. Baumgartner. Transformation-based verification
[32], use a dedicated data-structures to represent the proof- using generalized retiming. In Computer Aided Verification, pages 104–
117. Springer, 2001.
obligations, while LEC relies on the sub-model tree to track [23] J. Long, S. Ray, B. Sterin, A. Mishchenko, and R. Brayton. Enhanc-
the compositional proof strategy used at each node. ing abc for ltl stabilization verification of systemverilog/vhdl models.
Ganesh Gopalakrishnan University of Utah USA, page 38, 2011.
[24] A. Mishchenko, S. Chatterjee, R. Brayton, and N. Een. Improvements to
VIII. C ONCLUSION combinational equivalence checking. In Computer-Aided Design, 2006.
In LEC, we build a system of collaborating procedures ICCAD ’06. IEEE/ACM International Conference on, pages 836–843,
2006.
for data-path equivalence-checking problems found from an [25] V. Paruthi and A. Kuehlmann. Equivalence checking combining a
industrial setting. The strategy is to utilize Boolean level structural sat-solver, bdds, and simulation. In Computer Design, 2000.
solvers, conduct the transformations at the word-level and Proceedings. 2000 International Conference on, pages 459–464, 2000.
[26] E. Pavlenko, M. Wedler, D. Stoffel, W. Kunz, A. Dreyer, F. Seelisch,
to synthesize internal similarities by lifting the reasoning to and G. Greuel. Stable: A new qf-bv smt solver for hard verification
the algebraic level . Using a real industrial case-study, we problems combining boolean reasoning with computer algebra. In
demonstrated the applicability of the sub-tree infrastructure for Design, Automation & Test in Europe Conference & Exhibition (DATE),
2011, pages 1–6. IEEE, 2011.
integrating a compositional proof methodology using LEC. [27] N. Shekhar, P. Kalla, and F. Enescu. Equivalence verification of
polynomial datapaths using ideal membership testing. Computer-Aided
R EFERENCES Design of Integrated Circuits and Systems, IEEE Transactions on,
[1] ABC - a system for sequential synthesis and verification. Berkeley 26(7):1320–1330, 2007.
Verification and Synthesis Research Center, http://www.bvsrc.org. [28] N. Shekhar, P. Kalla, F. Enescu, and S. Gopalakrishnan. Equivalence
[2] Calypto R Catapult Design Product. http://www.calypto.com. verification of polynomial datapaths with fixed-size bit-vectors using
[3] Calypto R SLEC. http://www.calypto.com. finite ring algebra. In Computer-Aided Design, 2005. ICCAD-2005.
[4] Forte design systems. http://www.forteds.com. IEEE/ACM International Conference on, pages 291–296, 2005.
[5] smtlib. http://www.smt-lib.org. [29] W. Snyder, P. Wasson, and D. Galbi. Verilator: Convert verilog code to
[6] Verific Design Automation: http://www.verific.com. c++/systemc, 2012.
[7] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanović, T. King, [30] B. Sterin, N. Een, A. Mishchenko, and R. Brayton. The benefit of
A. Reynolds, and C. Tinelli. Cvc4. In Computer Aided Verification, concurrency in model checking. IWLS, 2011.
pages 171–177. Springer, 2011. [31] D. Stoffel and W. Kunz. Equivalence checking of arithmetic circuits on
[8] M. A. Basith, T. Ahmad, A. Rossi, and M. Ciesielski. Algebraic the arithmetic bit level. Computer-Aided Design of Integrated Circuits
approach to arithmetic design verification. In Formal Methods in and Systems, IEEE Transactions on, 23(5):586–597, 2004.
Computer-Aided Design (FMCAD), 2011, pages 67–71. IEEE, 2011. [32] D. Wang and J. Levitt. Automatic assume guarantee analysis for
[9] J. Baumgartner, H. Mony, V. Paruthi, R. Kanzelman, and G. Janssen. assertion-based formal verification. In Proceedings of the 2005 Asia
Scalable sequential equivalence checking across arbitrary design trans- and South Pacific Design Automation Conference, pages 561–566. ACM,
formations. In Computer Design, 2006. ICCD 2006. International 2005.
Conference on, pages 259–266. IEEE, 2007. [33] M. Wedler, D. Stoffel, R. Brinkmann, and W. Kunz. A normalization
[10] R. Brayton and A. Mishchenko. Abc: An academic industrial-strength method for arithmetic data-path verification. Computer-Aided Design of
verification tool. In Computer Aided Verification, pages 24–40. Springer, Integrated Circuits and Systems, IEEE Transactions on, 26(11):1909–
2010. 1922, 2007.

18
Trading-off incrementality and dynamic restart of
multiple solvers in IC3
G. Cabodi (*), A. Mishchenko (**), M. Palena (*)
(*) Dip. di Automatica ed Informatica
Politecnico di Torino - Torino, Italy
(**) Dept. of EECS, University of California, Berkeley, CA, USA

Abstract—This paper1 addresses the problem of SAT solver process by constantly refining a set of over-approximations
performance in IC3, one of the major recent breakthroughs to forward reachable states with new inductive clauses; at the
in Model Checking algorithms. Unlike other Bounded and bottom level, a SAT solving framework is exploited by the
Unbounded Model Checking algorithms, IC3 is characterized
by numerous SAT solver queries on small sets of problem top-level algorithm to respond to queries about the system.
clauses. Besides algorithmic issues, the above scenario poses As shown in [7], these two layers can be separated by means
serious performance challenges for SAT solver configuration of a clean interface.
and tuning. As well known in other application fields, finding Performance of IC3 turns out to be both highly sensitive
a good compromise between learning and overhead is key to
performance. We address solver cleanup and restart heuristics,
to the various internal behaviours of SAT solvers, and strictly
as well as clause database minimality, based on on-demand clause dependent on the way the top-level algorithm is integrated with
loading: transition relation clauses are loaded in solver based the underlying SAT solving framework.
on structural dependency and phase analysis. We also compare The peculiar characteristics exposed by the SAT queries of
different solutions for multiple specialized solvers, and we provide
IC3 can thus be exploited to improve the overall performance
an experimental evaluation on benchmarks from the HWMCC
suite. Though not finding a clear winner, the work outlines several of the algorithm in two different manners:
potential improvements for a portfolio-based verification tool 1) Tuning the internal behaviours of the particular SAT
with multiple engines and tunings.
solver employed to better fit IC3 needs.
I. I NTRODUCTION 2) Defining better strategies to manage the SAT solving
work required by IC3.
IC3 [1] is a SAT-based invariant verification algorithm
for bit-level Unbounded Model Checking (UMC). Since its In this paper we address this second issue, proposing and
introduction, IC3 has immediately generated strong interest, comparing different implementation strategies for handling
and is now considered one of the major recent breakthroughs SAT queries in IC3. The aim of this paper is to identify the
in Model Checking. IC3 proved to be impressively effective on most efficient way to manage SAT solving in IC3. To achieve
solving industrial verification problems. Our experience with this goal we experimentally compare a number of different
the algorithm shows that IC3 is the single invariant verification implementation strategies over a selected set of benchmarks
algorithm capable of solving the largest number of instances from the recent HWMCC.
among the benchmarks of the last editions of the Hardware The experimental work has been done by two different
Model Checking Competition (HWMCC). research groups, on two different state-of-the-art verification
tools, ABC [8] and PdTRAV [9], that share similar architec-
A. Motivations tural and algorithmnic choices in their implementation of IC3.
IC3 heavily relies on SAT solvers to drive several parts of The focus of this paper is neither on the IC3 algorithm
the verification algorithm: a typical run of IC3 is characterized itself nor on the internal details of the SAT solving procedures
by a huge amount of SAT queries. As stated by Bradley in [2], employed, but rather on the implementation details of the
the queries posed by IC3 to SAT solvers differ significantly integration between IC3 and the underlying SAT solving
in character from those posed by other SAT-based invariant framework.
verification algorithms (such as Bounded Model Checking [3],
k-induction [4] [5] or interpolation [6]). Most notably, SAT
B. Contributions
queries posed by IC3 don’t involve the unrolling of the transi-
tion relation for more than one step and are thus characterized The main contributions of this paper are:
by small-sized formulas.
• A characterization of SAT queries posed by IC3.
IC3 can be thought as composed of two different layers:
• Novel approaches to solver allocation, loading and clean
at the top level, the algorithm itself drives the verification
up in IC3.
1 This work was supported in part by SRC Contracts No. 2012-TJ-2328 and • An experimental evaluation of performance using two
No. 2265.001 verification tools.

19
C. Outline P holds for every reachable state in S: ∀s ∈ R(S) : s |= P .
First in Section II we introduce the notation used and give An algorithm used to solve the invariant verification problem
some background on IC3. Then, in Section III we present a is called an invariant verificatio algorithm.
systematic characterization of the SAT solving work required Definition 8. Given a transition system S = hM, I, T i, a
by IC3. Section IV introduces the problem of handling SAT boolean predicate F over M is called an inductive invariant
queries posed by IC3 efficiently. Both commonly used and for S if the following two conditions hold:
novel approaches to the allocation, loading and cleaning up of • Base case: I → F
SAT solvers in IC3 are discussed in Sections V, VI and VII • Inductive case: F ∧ T → F
′
respectively. Experimental data comparing these approaches
A boolean predicate F over M is called an inductive invariant
are presented in Section VIII. Finally, in Section IX we draw
for S relative to another boolean predicate G if the following
some conclusions and give summarizing remarks.
two conditions hold:
II. BACKGROUND • Base case: I → F
′
• Relative inductive case: G ∧ F ∧ T → F
A. Notation
Lemma 1. Given a transition system S = hM, I, T i, an
Definition 1. A transition system is the triple S = hM, I, T i
inductive invariant F for S is an over-approximation to the
where M is a set of boolean variables called state variables of
set of reachable states R(S).
the system, I(M ) is a boolean predicate over M representing
the set of initial states of the system and T (M, M ′ ) is a Definition 9. Given a transition system S = hM, I, T i and a
predicate over M, M ′ that represents the transition relation boolean predicate P over M , an inductive strengthening of P
of the system. for S is an inductive invariant F for S such that F is stronger
than P .
Definition 2. A state of the system is represented by a
complete assignment s to the state variables M . A set of Lemma 2. Given a transition system S = hM, I, T i and a
states of the system is represented by a boolean predicate boolean predicate P over M , if an inductive strengthening of
over M . Given a boolean predicate F over M , a complete P can be found, then the property P holds for every reachable
assignement s such that s satisfies F (i.e. s |= F ) represents state of S. The invariant verification problem can be seen as
a state contained in F and is called an F -state. Primed the problem to find an inductive strengthening of P for S.
state variables M ′ are used to represent future states and,
B. IC3
accordingly, a boolean predicate over M ′ represent a set of
future states. Given a transition system S = hM, I, T i and a safety prop-
erty P over M, IC3 aims to find an inductive strengthening of
Definition 3. A boolean predicate F is said to be stronger than P for S. For this purpose, it maintains a sequence of formulas
another boolean predicate G if F → G, i.e. every F -state is Fk = F0 , F1 , . . . Fk such that, for every 0 ≤ i < k, Fi is an
also a G-state. over-approximation of the set of states reachable in at most
Definition 4. A literal is a boolean variable or the negation of i steps in S. Each of these over-approximations is called a
a boolean variable. A disjunction of literals is called a clause time frame and is represented by a set of clauses, denoted by
while a conjunction of literals is called a cube. A formula clauses(Fi ). The sequence of time frames Fk is called trace
is said to be in Conjunctive Normal Form (CNF) if it is a and is maintained by IC3 in such a way that the following
conjunction of clauses. conditions hold throughout the algorithm:
(1) F0 = I
Definition 5. Given a transition system S = hM, I, T i, if an (2) Fi → Fi+1 , for all 0 ≤ i < k
assignment s, t′ satisfies the transition relation T (i.e. if s, t′ |= (3) Fi ∧ T → Fi+1 ′
, for all 0 ≤ i < k
T ) then s is said to be a predecessor of t and t is said to be (4) Fi → P , for all 0 ≤ i < k
a successor of s. A sequence of states s0 , s1 , . . . , sn is said
Condition (1) states that the first time frame of the trace is
to be a path in S if every couple of adjacent states si , si+1 ,
special and is simply assigned to the set of initial states of
i ≤ 0 < n satisfies the transition relation (i.e. if si , s′i+1 |= T ).
S. The remaining conditions, claim that for every time frame
Definition 6. Given a transition system S = hM, I, T i, a state Fi but the last one: (2) every Fi -state is also a Fi+1 -state, (3)
s is said to be reachable in S if there exists a path s0 , s1 , . . . , s, every successor of an Fi -state is an Fi+1 -state and (4) every
such that s0 is an initial state (i.e. s0 |= I). We denote with Fi -state is safe. Condition (2) is maintained syntactically,
Rn (S) the set of states that are reachable in S in at most n enforcing the condition (2’) clauses(Fi+1 ) ⊆ clauses(Fi ).
steps. We denote with R(S) the overall S set of states that are Lemma 3. Let S = hM, I, T i be a transition system,
reachable in S. Note that R(S) = i≥0 Ri (S).
Fk = F0 , F1 , . . . Fk a sequence of boolean formulas over
Definition 7. Given a transition system S = hM, I, T i and a M and let conditions (1-3) hold for Fk . Then each Fi ,
boolean predicate P over M (called A safety property), the with 0 ≤ i < k, is an over-approximation to the set of states
invariant verification problem is the problem of determining if reachable within i steps in S.

20
Lemma 4. Let S = hM, I, T i be a transition system, P a to a bad cube s. This is done by means of the Extend(t)
safety property over M, Fk = F0 , F1 , . . . Fk a sequence of procedure (line 5), not reported here, that employs ternary
boolean formulas over M and let conditions (1-4) hold for simulation to remove some literals from t. The resulting cube
Fk . Then P is satisfied up to k − 1 steps in S (i.e. there s includes t and violates the property P , it is thus a bad
doesn’t exist any counter-example to P of length less or equal cube. The algorithm then tries to block the whole bad cube
than k − 1). s rather than t. It is showed in [7] that extending bad states
into bad cubes before blocking them dramatically improves
The main procedure of IC3 is described in Algorithm 1 and
IC3 performance.
is composed of two nested iterations. Major iterations (lines 3-
Once a bad cube s is found, it is blocked in Fk calling
16) try to prove that P is satisfied up to k steps in S, for
the BlockCube(s, Q, Fk ) procedure (line 6). This procedure
increasing values of k. To prove so, in minor iterations (lines 4-
is described in Algorithm 2.
9), IC3 refines the trace Fk computed so far, by adding new
relative inductive clauses to some of its time frames. The
algorithm iterates until either (i) an inductive strengthening Input: s: bad cube in Fk ; Q: priority queue; Fk : trace
of the property is produced (line 4), or (ii) a counter-example Output: SUCCESS or FAIL(σ), with σ counter-example
to the property is found (line 7). 1: add a proof-obligation (s, k) to the queue Q
2: while Q is not empty do
Input: S = hM, I, T i ; P (M) 3: extract (s, j) with minimal j from Q
Output: SUCCESS or FAIL(σ), with σ counter-example 4: if j > k or t 6|= Fj then continue;
1: k ← 0 5: if j = 0 then return FAIL(σ)
2: Fk ← I 6: if ∃t, v ′ : t, v ′ |= Fj−1 ∧ T ∧ ¬s ∧ s′ then
3: repeat 7: p ← Extend(t)
4: while ∃t : t |= Fk ∧ ¬P do 8: add (p, j − 1) and (s, j) to Q
5: s ← Extend(t) 9: else
6: if BlockCube(s, Q, Fk ) = FAIL(σ) then 10: c ← Generalize(j, s, Fk )
7: return FAIL(σ) 11: Fi ← Fi ∪ c for 0 < i ≤ j
8: end if 12: add (j + 1, c) to Q
9: end while 13: end if
10: Fk+1 ← ∅ 14: end while
11: k ←k+1 15: return SUCCESS
12: Fk ← Propagate(Fk ) Algorithm 2. BlockCube(s, Q, Fk )
13: if Fi = Fi+1 for some 0 ≤ i < k then
14: return SUCCESS
Otherwise, if Q1 is UNSAT, every bad state of Fk has been
15: end if
blocked so far, conditions (1-4) hold for k + 1 and IC3 can
16: until forever
safely move to the next major iteration, trying to prove that
Algorithm 1. IC3(S, P ) P is satisfied up to k + 1 steps. Before moving to the next
iteration, a new empty time frame Fk+1 is created (line 10).
At major iteration k, the algorithm has computed a trace Initially, clauses(Fk+1 ) = ∅ and such time frame represent
Fk such that conditions (1-4) hold. From Lemma 4, it follows the entire state space, i.e. Fk+1 = Space(S). Note that
that P is satisfied up to k − 1 steps in S. IC3 then tries to Space(S) is a valid over-approximation to the set of states
prove that P is satisfied up to k steps as well. This is done by reachable within k + 1 steps in S. Then a phase called prop-
enumerating Fk -states that violate P and then trying to block agation takes place (line 12). The procedure Propagate(Fk)
them in Fk . (Algorithm 4), which is discussed later, handles that phase.
During propagation, IC3 tries to refine every time frame Fi ,
Definition 10. Blocking a state (or, more generally, a cube)
with i < 0 ≤ k, by checking if some clauses of one time
s in a time frame Fk means proving s unreachable within k
frame can be pushed forward to the following time frame.
steps in S, and consequently refine Fk to exclude it.
Possibly, propagation refines the outmost timeframe Fk so that
To enumerate each state of Fk that violates P (line 4), the Fk ⊂ Space(S). Propagation can lead to two adjacent time
algorithm poses SAT queries to the underlying SAT solving frames becoming equivalent. If that happens, the algorithm
framework in the following form: has found an inductive strengthening of P S (equal to those
time frames), so the property P holds for for every reachable
SAT ?(Fk ∧ ¬P ) (Q1 )
state of S and IC3 return success (line 13).
If Q1 is SAT, a bad state t in Fk can be extracted from the The procedure BlockCube(s, Q, Fk ) (Algorithm 2) is re-
satisfying assignment. That state must be blocked in Fk . To sponsible for refining the trace Fk in order to block a bad cube
increase performance of the algorithm, as suggested in [7], s found in Fk . To preserve condition (3), prior to blocking a
the bad state t generated this way is first (possibly) extended cube in a certain time frame, IC3 has to recursively block its

21
predecessor states in the preceding time frames. To keep track in clause c = ¬s while maintaining its inductiveness relative
of the states (or cubes) that must be blocked in certain time to Fj−1 , in order to preserve condition (2). The resulting
frames, IC3 uses the formalism of proof-obligations. clause is added not only to Fj , but also to every time frame
Fi , 0 < i < j (line 11). Doing so discharges the proof-
Definition 11. Given a cube s and a time frame Fj , a proof-
obligation (s, j). In fact, this refinement rule out s from every
obligation is a couple (s, j) formalizing the fact that s must
Fi with 0 < i ≤ j. Since the sets Fi with i > j are larger
be blocked in Fj .
than Fj , s may still be present in one of them and (s, j + 1)
Given a proof obligation (s, j), the cube s can either may become a new proof-obligation. To address this issue,
represent a set of bad states or a set of states that can all Algorithm 2 adds (s, j + 1) to the priority queue (line 12).
reach a bad state in one or more transitions. The number j Otherwise, if Q3 is SAT (lines 7-8), a predecessor t of s
indicates the position in the trace where s must be proved in Fj−1 can be extracted from the satisfying assignment. To
unreachable, or else the property fails. preserve condition (3), before blocking a cube s in a time
frame Fj , every predecessor of s must be blocked in Fj−1 . So,
Definition 12. A proof obligation (s, j) is said to be dis-
the predecessor t is extended with ternary simulation (line 7)
charged when s becomes blocked in Fj .
into the cube p, and then both proof-obligations (p, j − 1) and
IC3 maintains a priority queue Q of proof-obligations. (s, j) are added to the queue (line 8).
During the blocking of a cube, proof-obligations (s, j) are
extracted from Q and discharged for increasing values of Input: j: time frame index; s: cube such that ¬s is
j, ensuring that every predecessor of a bad cube s will be inductive relative to Fj−1 ; Fk : trace
blocked in Fj (j < k) before s will be blocked in Fk . In Output: c : a sub-clause of ¬s
the BlockCube(s, Q, Fk ) procedure, first a proof-obligation 1: c ← ¬s
encoding the fact that s must be blocked in Fk is added to 2: for all literals l in c do
Q (line 1). Then proof-obligations are iteratively extracted 3: try ← the clause obtained by deleting l from c
from the queue and discharged (lines 2-14). 4: if 6 ∃t, v ′ : t, v ′ |= Fj−1 ∧ T ∧ try ∧ ¬try ′ then
Prior to discharge the proof-obligation (s, j) extracted, IC3 5: if 6 ∃t |= I ∧ ¬try then
checks if that proof-obligation still needs to be discharged. 6: c ← try
It is in fact possible for an enqueued proof-obligation to 7: end if
become discharged as a result of the discharging of some 8: end if
previously extracted proof-obligations. To perform this check, 9: end for
the following SAT query is posed (line 4): 10: return c
SAT ?(Fj ∧ s) (Q2 ) Algorithm 3. Generalize(j, s, Fk )
If the result of Q2 is SAT, then the cube s is still in Fj and
(s, j) still needs to be discharged. Otherwise, s has already The Generalize(j, s, Fk ) procedure (Algorithm 3) tries to
been blocked in Fj and the procedure can move on to the remove literals from ¬s while keeping it relatively inductive
next iteration. to Fj−1 . To do so, a clause c intialized with ¬s (line 1) is used
If the proof-obligation (s, j) still needs to be discharged, to represent the current minimal inductive clause. For every
then IC3 checks if the time frame identified by j is the initial literal of c, the clause try obtained by dropping that literal
time frame (line 5). If so, the states represented by s are initial from c (line 3), is checked for inductiveness relative to Fj−1
states that can reach a violation of the property P . A counter- by means of the following SAT query (line 4):
example σ to P can be constructed going up the chain of
proof-obligations that led to (s, 0). In that case, the procedure SAT ?(Fj−1 ∧ try ∧ T ∧ ¬try ′ ) (Q4 )
terminates with failure returning the counter-example found.
If Q4 is UNSAT, the iductive case for the reduced formula
To discharge a proof-obligation (s, j), i.e. to block a cube
still holds. Since dropping literals from a relatively inductive
s in Fj , IC3 tries to derive a clause c such that c ⊆ ¬s and
clause can break both the inductive case and the base case, the
c is inductive relative to Fj−1 . This is done by posing the
latter must be explicilty checked too for the reduced clause
following SAT query (line 6):
try (line 5). This is done by posing the following SAT query:
SAT ?(Fj−1 ∧ ¬s ∧ T ∧ s′ ) (Q3 )
SAT ?(I ∧ ¬try) (Q5 )
If Q3 is UNSAT (lines 10-12), then the clause ¬s is
inductive relative to Fj−1 and can be used to refine Fj , ruling If both the inductive case and the base case hold for the
out s. To pursue a stronger refinement of Fj , the inductive reduced clause try, the currently minimal inductive clause c
clause found undergoes a process called inductive generaliza- is updated with try (line 6).
tion (line 10) prior to being added to Fi . Inductive gener- The Propagate(Fk) procedure (Algorithm 4) handles the
alization is carried out by the procedure Generalize(j, s, Fk ) propagation phase. For every clause c of each time frame Fj ,
(Algorithm 3), which tries to minimize the number of literals with 0 ≤ j < k − 1, the procedure checks if c can be pushed

22
Input: Fk : trace Name Query Type Query
Output: Fk : updated trace Q1 Target Intersection SAT ?(Fk ∧ ¬P )
Q2 Blocked Cube SAT ?(Fi ∧ s)
1: for j = 0 to k − 1 do Q3 Relative Induction SAT ?(Fi ∧ ¬s ∧ T ∧ s′ )
2: for all c ∈ Fj do Q4 Inductive Generalization SAT ?(Fi ∧ c ∧ T ∧ ¬c′ )
3: if ∃t, v ′ : t, v ′ |= Fj ∧ T ∧ c′ then Q5 Base of Induction SAT ?(I ∧ ¬c)
4: Fj+1 ← Fj+1 ∪ {c} Q6 Clause Propagation SAT ?(Fi ∧ T ∧ ¬c′ )
5: end if
6: end for Table I: SAT Queries Breakdown in IC3
7: end for
8: return Fk
• SAT calls involved in inductive generalization are by
Algorithm 4. Propagate(Fk ) far the most frequent ones. These are in fact the calls
that appears at the finest granularity. In the worst case
scenario, one call is posed for every literal of every
forward to Fj+1 (line 3). To do so, it poses the following SAT inductive clause found.
query: • Inductive generalization and propagation checks are the
SAT ?(Fj ∧ T ∧ c′ ) (Q6 ) most expensive queries in terms of average SAT solving
If the result of Q6 is SAT, then it is safe, with respect to time required.
condition (3), to push clause c forward to Fi+1 . Otherwise, c • Target intersection calls are very infrequent and don’t take

can’t be pushed and the procedure iterates to the next clause. much time to be solved.
• Blocked cube and relative induction checks are close in
C. Related works the number of calls and solving time.
In [2], Aaron Bradley outlined the opportunity for SAT and
SMT researchers to directly address the problem of improving
IC3’s performance by exploiting the peculiar character of the Query Calls Avg Time
[Number] [%] [ms]
SAT queries it poses. A description of the IC3 algorithm,
Q1 483 0.1 81
specifically addressing implementation issues, is given in [7]. Q2 27891 6.8 219
Q3 31172 7.6 334
III. SAT SOLVING IN IC3 Q4 142327 34.7 575
SAT queries posed by IC3 have some specific characteris- Q5 147248 35.9 112
tics: Q6 61114 14.9 681
• Small-sized formulas: they employ no more than a single
instance of the transition relation; Table II: SAT queries statistics in IC3: Number of calls,
• Large number of calls: reasoning during the verification
percentage, and average time spent in different SAT queries
process is highly localized and takes place at relatively- during an IC3 run.
small-cubes granularity;
• Separated contexts: each SAT query is relative to a single
IV. H ANDLE SAT SOLVING IN IC3
time frame;
• Related calls: subsequent calls may expose a certain Subsequent SAT calls in IC3 are often applied to highly
correlation (for instance, inductive generalization calls different formulas. In the general case, two subsequent calls
take place on progressively reduced formulas). can in fact be posed in the context of different time frames,
We performed an analysis of the implementation of IC3 thus involving different sets of clauses. In addition, one of
within the academic model checking suite PdTRAV, closely them may require the use of the transition relation, while
following the description of IC3 given in [7] (there called the other may not, and each query can involve the use of
PDR: Property Directed Reachability). The experimental anal- temporary clauses/cubes that are needed only to respond to that
ysis lead us to identify six different types of SAT queries that particular query (e.g. the candidate inductive clause used to
the algorithm poses during its execution. These queries are check relative inductiveness during cube blocking or inductive
the ones already outlined in Section II-B. The type of these generalization).
queries is reported in Table I. In the general case, whenever a new query is posed by
For each of the queries identified, we have measured the IC3 to the underlying SAT solving framework, the formula
average number of calls and the average solving time. These currently loaded in the solver must be modified to accomodate
statistics are reported in Table II. The results were collected by the new query. For this reason, IC3 requires the use of
running PdTRAV’s implementation of IC3 on the complete set SAT solvers that expose an incremental SAT interface. An
of 310 single property benchmarks of the HWMCC’12, using incremental SAT interface for IC3 must support the following
time and memory limits of 900 s and 2 GB, respectively. features:
Such statistics can be summarized as follows: • Pushing and popping clauses to or from the formula.

23
• Specifying literal assumptions. The reason why inductive generalization calls are so ex-
Many state-of-the-art SAT solvers, like MiniSAT [10], fea- pensive can be due to the fact that during the inductive
ture an incremental interface capable of pushing new clauses generalization of a clause, at every iteration a slightly changing
into the formula and allowing literal assumptions. Removing formula is queried for a satisfying assignment in increasingly
clauses from the current formula is a more difficult task since larger subspaces. Given two subsequent queries in inductive
one have to remove not only the single clause specified, generalization, in fact, it can be noticed that their formulas
but also every learned clause that has been derived from can differ only for one literal of the present state clause try
it. Although solvers such as zchaff [11] directly support and one literal of the next state cube ¬try. As the subspace
clause removal, the majority of the state-of-the-art SAT solvers becomes larger solving times for those queries increases.
feature an interface like the one of MiniSAT, in which clause The average expensiveness of clause propagation calls can
removal can only be simulated. This is done through the use of be intuitively motivated by noticing that they are posed one
literal assumptions and the introduction of auxiliary variables time for every clause of every time frame. The innermost
known as activation variables, as described in [12]. In such time frames are the ones with the largest number of clauses,
solvers, clauses aren’t actually removed from the formula and thus require the largest number of propagation calls.
but only made redundant for the purpose of determining the Unfortunately, given the high number of clauses in those time
satisfiability of the rest of the formula. Since such clauses frames, the CNF formulas upon which such calls act are highly
are not removed from the formula, they still participate in constrained and usually harder to solve. So during propagation
the Boolean Constraint Propagation (BCP) and, thus, degrade there are, in general, more hard queries than simple queries,
the overall SAT solving performance. In order to minimize making the average SAT solving time for those queries high.
this degradation, each solver employed by IC3 must be pe- In an attempt to reduce the burden of each time frame’s SAT
riodically cleaned up, i.e. emptied and re-loaded with only solver, we have experimented the use of specialized solvers for
relevant clauses. In this work we assume the use of a SAT handling such queries. For every time frame, a second solver is
solver exposing an incremental interface similar to the one of instantiated and used to respond a particular type of query (Q4
MiniSAT. or Q6 ). Table III summarize the results of such experiment.
Once an efficient incremental SAT solving tool has been
VI. S OLVER LOADING
chosen, any implementation of IC3 must face the problem
of deciding how to integrate the top-level algorithm with the To minimize the size of the formula to be loaded into each
underlying SAT solving layer. Such problem can be divided solver, a common approach is to load, for every SAT call that
into the following three sub-problems: queries the dynamic behavior of the system, only the necessary
• SAT solvers allocation: decide how many different SAT
part of the transition relation.
solvers to employ and how to distribute workload among It is easy to observe that every SAT call that uses the
them. transition relation involves a constraint on the next state
• SAT solvers loading: decide which clauses must be loaded
variables of the system in the form of a cube c′ . Such queries
in each solver to make them capable of responding ask if there is a state of the system satisfying some constraints
correctly to the SAT queries posed by IC3. on the present state, which can reach in one transition states
• SAT solvers clean up: decide when and how often the
represented by c′ . Since c′ is a cube, the desired state must
algorithm must clean up each solver, in order to avoid have a successor p such that p correspond to c′ for the value of
performance degradation. every variable of c′ . It’s easy to see that the only present state
variables that are relevant in determining if such a state exists,
V. SAT SOLVERS ALLOCATION are those in the structural support of the next state variables
We assume in this work the use of multiple SAT solvers, of c′ . It follows that the only portions of the transition relation
one for each different time frame. Using multiple solvers, we required to answer such queries are the logic cones of the next
observed that performance is highly related to: state variables of c′ .
• Solver cleanup frequency: cleaning up the solver means Such loading strategy, known as lazy loading of transition
removal of incrementally loaded problem clauses and relation, is commonly employed in various implementations
learned clauses of IC3, as the ones of PdTRAV and ABC. We observed in
• Clause loading strategy: on-demand loading of transition average 50% reduction in the size of the CNF formula for the
relation clauses based on topological dependency transition relation using lazy loading of transition relation.
We noticed that, for these queries, the portions of the
A. Specialized solvers transition relation loaded can be further minimized employing
From the statistical results of reported in Table II it’s easy a CNF encoding technique, called Plaisted-Greenbaum CNF
to see that inductive generalization and clause propagation encoding [13] (henceforth simply called PG encoding). The
queries are by far the most expensive ones in terms of average AIG representation of the transition relation together with
SAT solving time. Inductive generalization queries, in addition the assumptions specified by the next state cube c′ can be
of being expensive, are also the most frequent type of query viewed as a constrained boolean circuit [14], i.e. a boolean
posed. circuit in which some of the outputs are constrained to assume

24
certain values. The Plaisted-Greenbaum encoding is a special These heuristics clean up each solver as soon as the computed
encoding that can be applied in the translation of a constrained estimate meets some, often static, threshold.
boolean circuit into a CNF formula. Our experiments with IC3 prove that the frequency of
Below we give an informal description of the PG encoding. the cleanups play a crucial role in determining the overall
For a complete description refer to [13] or [14]. verification performance. We explored the use of new cleanup
Given an AIG representation of a circuit, a CNF encoding heuristics based on more precise measures of the number of
first subdivides that circuit into a set of functional blocks, i.e. irrelevant clauses loaded and able to exploit correlation among
gates or group of connected gates representing certain boolean different SAT calls to dynamically adjust the frequency of
functions, and introduces a new CNF variable a for each of cleanups.
these blocks. For each functional block representing a function For SAT calls in IC3, notice that there are two sources of
f on the input variables x1 , x2 , . . . , xn , a set of clauses is irrelevant clauses loaded in a solver:
derived translating into CNF the formula:
1) Deactivated clauses loaded for previous inductive checks
a ↔ f (x1 , x2 , . . . , xn ) (1) (Q3 and Q4 queries).
2) Portions of logic cones loaded for previous calls query-
The final CNF formula is obtained by the conjunction of these
ing the dynamic behavour of the system.
sets of clauses. Different CNF encodings differ in the way the
gates are grouped together to form functional blocks and in Cleanup heuristics commonly used, such as the ones used
the way these blocks are translated into clauses. The idea of in baseline versions of ABC and PdTRAV, typically take into
PG encoding is to start from a base CNF encoding, and then account only the number of deactivated clauses in the solver to
use both output assumptions and topological information of compute an estimate of the number of irrelevant clauses. We
the circuit to get rid of some of the derived clauses, while investigated the use of a new heuristic measure taking into
still producing an equi-satisfiable CNF formula. Based on the account the second source of irrelevant clauses, i.e. irrelevant
output constraints and the number of negations that can be portions of previously loaded cones.
found in every path from a gate to an output node, it can be Every time a new query requires to load a new portion of the
shown that, for some gates of the circuit, an equi-satisfiable transition relation, to compute a measure of the number of the
encoding can be produced by only translating one of the two irrelevant transition relation’s clauses, the following quantities
sides of the bi-implication (1). The CNF formula produced are computed (assuming that c′ is the cube constraining the
by PG encoding will be a subset of the one produced by the next state variables for that query):
traditional encoding. • A: the number of transition relation clauses already
PG encoding proves to be effective in reducing the size loaded into the solver (before loading the logic cones
of loaded formulas, but it is not certain whether it is more required by the current query);
efficient for SAT solving, since it may have worst propagation ′
• S(c ): the size (number of clauses) of the logic cones
behaviour [15]. required for solving the current query;
We observed a 10-20% reduction in the size of the CNF ′
• L(c ): the number of clauses in the required logic cones to
formula for the transition relation. be loaded into the solver (equal to the size of the required
VII. S OLVERS CLEAN UP logic cone minus the number of clauses that such cone
shares with previously loaded cones);
A natural question arises regarding how frequently and at
what conditions SAT solvers cleanups should be scheduled. A measure of the number of irrelevant transition relation’s
Cleaning up a SAT solver, in fact, introduces a certain over- clauses loaded for c′ , denoted by u(c′ ), can be computed as
head. This is because: follows:
• Relevant clauses for the current query must be reloaded. u(c′ ) = A − (S(c′ ) − L(c′ )) (2)
• Relevant inferences previously derived must be derived
again from those clauses. Such a heuristic measure, divided by the number of clauses
A heuristic cleanup strategy is needed in order to achieve a loaded into the solver, indicates the percentage of irrelevant
trade-off between the overhead introduced and the slowdown clauses loaded in the solver w.r.t the current query. In Sec-
in BCP avoided. The purpose of that heuristic is to determine tion VIII we investigate the use new cleanup strategies based
when the number of irrelevant clauses (w.r.t. the current query) on this measure. In order to take into account correlation
loaded in a solver becomes large enough to justify a cleanup. between subsequent SAT calls, we consider such measure
To do so, a heuristic measure representing an estimate of the averaged on a time window of the last SAT calls.
number of currently loaded irrelevant clauses is compared to a
certain threshold. If that measure exceeds the threshold, then VIII. E XPERIMENTAL R ESULTS
the solver is cleaned up.
Clean up heuristics currently used in state-of-the-art tools, We have compared different cleanup and clause loading
like ABC and PdTRAV, rely on loose estimates of the size heuristics in both PdTRAV and ABC. In this section, we
of irrelevant portions of the formula loaded into each solver. briefly summarize the obtained results.

25
A. PG encoding • H4: H2 || H3;
A first set of experiments was done on the full set of Heuristic H1 is the cleanup heuristic used in the baseline
310 HWMCC’12 benchmarks [16], with a lower timeout, of versions of both PdTRAV and ABC. The static threshold of
900 seconds, in order to evaluate the use of PG encoding. 300 activation literals was determined experimentally. Heuris-
Results were controversial. A run in PdTRAV, with 900 tic H2 cleans up each solver as soon as half of its variables
seconds timeout, showed a reduction in the number of solved are used for activation literals. It can be seen as a first simple
instances from 79 to 63 (3 of which previously unsolved). The attempt to adopt a dynamic cleanup threshold. Heuristic H3 is
percentage reduction of loaded transition relation clauses was the first heuristic proposed to take into account the second
21.32%. source of irrelevant clauses described in Section VII, i.e.
A similar run on ABC, showed a more stable compari- irrelevant portions of previously loaded cones. In H3 a solver
son, from 80 to 79 solved instances (3 of which previously is cleaned up as soon as the percentage of irrelevant transition
unsolved). So a first conclusion is that, PG encoding was relation’s clauses loaded, averaged on a window of the last
not able to win over the standard encoding, suggesting it can 1000 calls, reaches 50%. Heuristic H4 takes into account both
indeed suffer of worst propagation behaviour. Nonetheless, it sources of irrelevant clauses, combining H2 and H3.
was very interesting to observe that the overall number of
Configuration Solved [#] New [#] Avg Time [s]
solved problems grew from 79 to 82 in PdTRAV and from 80 P0 (H1) 64 137.00
to 85 in ABC. P3 (H2) 60 1 122.19
Different results between the two tools in this experi- P4 (H3) 44 116.28
mentation could be partially due to different light-weight P5 (H4) 62 3 125.94
preprocessing done by default within them. We started a more
detailed comparison, in order to better understand the partially Table IV: Tests on clean up heuristics.
controversial results.
Table IV shows a comparison among H1 (row P 0), H2,
B. Experiments with PdTRAV H3, and H4, in rows P 3, P 4, and P 5, respectively. No PG
We then focused on a subset of 70 selected circuits, for encoding, nor specialized solvers were used. Heuristic H1,
which preprocessing was done off-line, and the tools were that employs a static threshold, was able to solve the largest
run on the same problem instances. In the following tables, number of instances. Among dynamic threshold heuristics,
the P 0 rows always represent the default setting of PdTRAV. both H2 and H3 take into account a single source of irrelevant
We report number of solver instances (out of 70) and average loaded clauses, respectively the deactivated clauses in H2
execution time (time limit 900 seconds). Column labeled and the unused portions of transition relation in H3. Data
N ew, when present, shows the number of instances not solved clearly indicates that H3 has worse performance. This sug-
by P 0. gests that deactivated clauses play a bigger role in degrading
SAT solvers’ performance than irrelevant transition relation’s
Configuration Solved [#] New [#] Avg Time [s] clauses do. Taking into account only the latter source of
P0 (baseline) 64 137.00 irrelevant clauses it’s not sufficient. It can be noticed that
P1 (Q4 spec.) 66 4 144.18
P2 (Q6 spec.) 60 3 134.25
heuristic H4, that takes into account both sources, outperforms
both H2 and H3. This proves that considering irrelevant
Table III: Tests on specialized solvers. clauses arising from previoulsy loaded cones in addition to
deactivated clauses can be beneficial. In addition Table IV
shows that dynamic heuristics were able solve some instances
Table III shows two different implementations with special-
that can’t be solved by the static heuristic H1 and viceversa. In
ized solvers (so two solvers per time frame): in the first one
terms of overall number of solved instances, the static heuristic
(P 1) the second solver handles generalization calls while in
H1 outperformes our best dynamic heuristic H4. This can be
the second one (P 2) the it handles propagation calls. Overall,
due to the fact that the threshold parameter of H1 results from
we see a little improvement by P 1, with two more instances
extensive experimentation while to determine the parameters
solved w.r.t P 0, including 4 instances not solved by P 0.
of H4 (window size and percentage thresholds) a narrower
The next two tables show an evaluation of different solver
experimentation has been performed.
cleanup heuristics. Let a be the number of activation variables
We then focused on H4, and benchmarked it in different
in the solver, |vars| the total number of variables in the solver,
setups. Results are showed in table V.
|clauses| the total number of clauses in the solver, u(c′ ) the
Here PG encoding was used in configurations P 6 and P 8,
heuristic measure discussed in Section VII and W (x, n) a
single solver per time frame in P 6, additional specialized
sliding window containing the values of x for the last n SAT
solver for generalization in P 7 and P 8. We see that the
calls. The heuristics compared are the following:
specialized solver configuration appears to perform worse
• H1: a > 300; with PG encoding. Also, adding a specialized solver for
1
• H2: a > 2 |vars|; generalization to the dynamic heuristic H4 doesn’t seem to
u(c′ )
• H3: avg W |clause| , 1000 > 0.5 be as effective as it is when using the static heuristic H1.

26
Configuration Solved [#] New [#] Avg Time [s]
static heuristic H1 hasn’t been found yet, we believe that a
P0 (baseline) 64 137.00
P6 (PG) 59 3 208.85 more extensive experimentation could lead to better results.
P7 (Q4 spec) 58 1 111.59 Nonetheless, the results are more interesting if we consider
P8 (PG+Q4 spec) 50 1 178.56 them from the standpoint of a portfolio-based tool, since the
overall coverage (by the union of all setups) is definitely
Table V: Tests on mixed strategies for cleanup heuristic H4. higher.
So we believe that more effort in implementation, experi-
mentation, and detailed analysis of case sudies, needs to be
This can be due to the fact that irrelevant clauses arising from done. We also deem that this work contributes to the discussion
generalization calls are not taken into account to schedule the of new developments in the research related to IC3.
clean up of the main solver that, in turn, is cleaned up less
frequently. R EFERENCES
[1] A. R. Bradley, “SAT-based model checking without unrolling,” in
C. Experiments with ABC VMCAI, Austin, Texas, Jan. 2011, pp. 70–87.
[2] A. R. Bradley, “Understanding IC3,” in SAT, ser. Lecture Notes in
The 70 selected circuits were also benchmarked with ABC, Computer Science, A. Cimatti and R. Sebastiani, Eds., vol. 7317.
with same pre-processing used in PdTRAV: Table VI report in Springer, 2012, pp. 1–14.
row A0 the default setting of ABC. Row A1 shows the variant [3] A. Biere, A. Cimatti, E. M. Clarke, M. Fujita, and Y. Zhu, “Symbolic
Model Checking using SAT procedures instead of BDDs,” in Proc. 36th
with PG encoding, row A2 shows a run without dynamic TR Design Automation Conf. New Orleans, Louisiana: IEEE Computer
clause loading. Row A3 finally shows a different period for Society, Jun. 1999, pp. 317–320.
solver cleanup (1000 variables instead of 300). [4] M. Sheeran, S. Singh, and G. Stålmarck, “Checking Safety Properties
Using Induction and a SAT Solver,” in Proc. Formal Methods in
Computer-Aided Design, ser. LNCS, W. A. Hunt and S. D. Johnson,
Configuration Solved [#] New [#] Avg Time [s] Eds., vol. 1954. Austin, Texas, USA: Springer, Nov. 2000, pp. 108–
A0 64 138.66 125.
A1 63 1 152.18 [5] P. Bjesse and K. Claessen, “SAT–Based Verification without State Space
A2 63 2 158.75 Traversal,” in Proc. Formal Methods in Computer-Aided Design, ser.
A3 64 138.45 LNCS, vol. 1954. Austin, TX, USA: Springer, 2000.
[6] K. L. McMillan, “Interpolation and SAT-Based Model Checking,” in
Proc. Computer Aided Verification, ser. LNCS, vol. 2725. Boulder,
Table VI: Tests on ABC with different strategies. CO, USA: Springer, 2003, pp. 1–13.
[7] N. Eén, A. Mishchenko, and R. K. Brayton, “Efficient implementation
of property directed reachability,” in FMCAD, 2011, pp. 125–134.
Overall, results show little variance among different settings, [8] A. Mishchenko, “ABC: A System for Sequential Synthesis and Verifi-
which could suggest lesser sensitivity of ABC to different cation, http://www.eecs.berkeley.edu/∼alanmi/abc/,” 2005.
tunings. Nonetheless, a further experimentation with ABC on [9] G. Cabodi, S. Nocco, and S. Quer, “Benchmarking a model checker for
algorithmic improvements and tuning for performance,” Formal Methods
the full set of 310 benchmarks (with 300 seconds time limit), in System Design, vol. 39, no. 2, pp. 205–227, 2011.
showed a 14% improvement in the number of solved problems [10] N. Eén and N. Sörensson, “The Minisat SAT Solver, http://minisat.se,”
(from 71 to 81), which indicate a potential improvement for Apr. 2009.
[11] M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik, “Chaff:
a portfolio-based tool, able to concurrently exploit multiple Engineering an Efficient SAT Solver,” in Proc. 38th Design Automation
settings. Conf. Las Vegas, Nevada: IEEE Computer Society, Jun. 2001.
[12] N. Eén and N. Sörensson, “Temporal induction by incremental SAT
IX. C ONCLUSIONS solving,” Electr. Notes Theor. Comput. Sci., vol. 89, no. 4, pp. 543–560,
2003.
The paper shows a detailed analysis and characterization of [13] D. A. Plaisted and S. Greenbaum, “A structure-preserving clause form
SAT queries posed by IC3. We also discuss new ideas for translation,” J. Symb. Comput., vol. 2, no. 3, pp. 293–304, 1986.
[14] M. Järvisalo, A. Biere, and M. Heule, “Simulating circuit-level simpli-
solver allocation/loading/restarting. The experimental evalu- fications on CNF,” J. Autom. Reasoning, vol. 49, no. 4, pp. 583–619,
ation done on two different state-of-the-art academic tools, 2012.
shows lights and shadows, as no breakthrough or clear winner [15] N. Eén, “Practical SAT: a tutorial on applied satisfiability solving,” Slides
of invited talk at FMCAD, 2007.
emerges from the new ideas. [16] A. Biere and T. Jussila, “The Model Checking Competition Web Page,
PG encoding showed to be less effective than expected. http://fmv.jku.at/hwmcc.”
This is probably because the benefits introduced in terms of
loaded formula size will be overwhelmed by the supposed
worst propagation behaviour of that formula.
The use of specialized solvers seems to be effective when a
static cleanup heuristic is used, less effective when combined
with PG encoding or a dynamic heuristic.
Our experiments showed that, when a dynamic cleanup
heuristic is used, IC3’s performance can be improved by taking
into account both deactivated clauses and irrelevant portions of
previously loaded cones. Even if a parameter configuration for
H4 that is able to outperform the currently used, well-rounded

27
Lemmas on Demand for Lambdas
Mathias Preiner, Aina Niemetz, and Armin Biere
Institute for Formal Models and Verification
Johannes Kepler University, Linz, Austria

Abstract—We generalize the lemmas on demand de- proposed [10], which uses λ -terms similarly to UCLID.
cision procedure for array logic as implemented in This extension provides support for modeling memset,
Boolector to handle non-recursive and non-extensional
lambda terms. We focus on the implementation aspects memcpy and loop summarizations. However, it does
of our new approach and discuss the involved algorithms not make use of native support of λ -terms provided
and optimizations in more detail. Further, we show how
arrays, array operations and SMT-LIB v2 macros are by an SMT solver. Instead, it reduces instances in the
represented as lambda terms and lazily handled with theory of arrays with λ -terms to a theory combination
lemmas on demand. We provide experimental results that supported by solvers such as Boolector [3] (without
demonstrate the effect of native lambda support within
an SMT solver and give an outlook on future work. native support for λ -terms), CVC4, STP [12], and
Z3 [6].
In this paper, we generalize the decision procedure
I. I NTRODUCTION for the theory of arrays with bit vectors as intro-
The theory of arrays as axiomatized by Mc- duced in [3] to lazily handle non-recursive and non-
Carthy [14] enables us to reason about memory (com- extensional λ -terms. We show how arrays, array op-
ponents) in software and hardware verification, and erations and SMT-LIB v2 macros are represented in
is particularly important in the context of deciding Boolector as λ -terms and introduce a lemmas on de-
satisfiability of first order formulas w.r.t. first order mand procedure for lazily handling λ -terms in Boolec-
theories, also known as Satisfiability Modulo Theories tor in detail. We summarize an experimental evaluation
(SMT). However, it is restricted to array operations on and compare our results to solvers with SMT-LIB v2
single array indices and lacks support for efficiently macro support (CVC4, MathSAT [5], SONOLAR [13]
modeling operations such as memory initialization and and Z3) and finally, give an outlook on future work.
parallel updates (memset and memcpy in the standard
C library). II. P RELIMINARIES
In 2002, Seshia et al. [4] introduced an approach to We assume the usual notions and terminology of first
overcome these limitations by using restricted λ -terms order logic and are mainly interested in many-sorted
to model array expressions (such as memset and mem- languages, where bit vectors of different bit width cor-
cpy), ordered data structures and partially interpreted respond to different sorts and array sorts correspond to a
functions within the SMT solver UCLID [17]. The mapping (τi ⇒ τe ) from index sort τi to element sort τe .
SMT solver UCLID employs an eager SMT solving Our approach is focused primarily on the quantifier-free
approach and therefore eliminates all λ -terms through first order theories of fixed size bit vectors, arrays and
β -reduction, which replaces each argument variable equality with uninterpreted functions, but not restricted
with the corresponding argument term as a preliminary to the above.
rewriting step. Other SMT solvers that employ a lazy We call 0-arity function symbols constant symbols
SMT solving approach and natively support λ -terms and a, b, i, j, and e denote constants, where a and b are
such as CVC4 [1] or Yices [8] also treat them eagerly, used for array constants, i and j for array indices, and
similarly to UCLID, and eliminate all occurrences of e for an array value. For each bit vector of size n, the
λ -terms by substituting them with their instantiated equality =n is interpreted as the identity relation over bit
function body (cf. C-style macros). Eagerly eliminating vectors of size n. We further interpret the if-then-else bit
λ -terms via β -reduction, however, may result in an vector term iten as ite(>,t, e) =n t and ite(⊥,t, e) =n e.
exponential blow-up in the size of the formula [17]. As a notational convention, the subscript might be
Recently, an extension of the theory of arrays was omitted in the following. We identify read and write
This work was funded by the Austrian Science Fund (FWF)
as basic operations on array elements, where read(a, i)
under NFN Grant S11408-N23 (RiSE). denotes the value of array a at index i, and write(a, i, e)

28
denotes the modified array a overwritten at position i III. λ - TERMS IN B OOLECTOR
with value e. The theory of arrays (without extensional-
ity) is axiomatized by the following axioms, originally In contrast to λ -term handling in other SMT solvers
introduced by McCarthy in [14]: such as e.g. UCLID or CVC4, where λ -terms are
eagerly eliminated, in Boolector we provide a lazy λ -
i = j → read(a, i) = read(a, j) (A1)
term handling with lemmas on demand. We generalized
i = j → read(write(a, i, e), j) = e (A2) the lemmas on demand decision procedure for the
i 6= j → read(write(a, i, e), j) = read(a, j) (A3) extensional theory of arrays introduced in [3] to handle
The array congruence axiom A1 asserts that accessing lemmas on demand for λ -terms as follows.
array a at two equal indices i and j produces the In order to provide a uniform handling of arrays
same element. The read-over-write Axioms A2 and A3 and λ -terms within Boolector, we generalized all arrays
ensure a basic characteristic of arrays: A2 asserts that (and array operations) to λ -terms (and operations on λ -
accessing a modification to an array a at the index it terms) by representing array variables as uninterpreted
has most recently been updated (i), produces the value it functions (UF), read operations as function applica-
has been updated with (e). A3 captures the case when tions, and write and if-then-else operations on arrays
a modification to an array a is accessed at an index as λ -terms. We further interpret macros (as provided
other than the one it has most recently been updated at by the command define-fun in the SMT-LIB v2 format)
( j), which produces the unchanged value of the original as (curried) λ -terms. Note that in contrast to [3], our
array a at position j. Note that we assume that all implementation currently does not support extensional-
variables a, i, j and e in axioms A1, A2 and A3 are ity (equality) over arrays (λ -terms).
universally quantified. We represent an array as exactly one λ -term with
From the theory of equality with uninterpreted func- exactly one bound variable (parameter) and define its
tions we primarily focus on the following axiom: representation as λ j . t( j). Given an array of sort (τi ⇒
n
^ τe ) and its λ -term representation λ j . t( j), we require
∀x̄, ȳ. xi = yi → f (x̄) = f (ȳ) (EUF)
i=1
that bound variable j is of sort index τi and term t( j) is
The function congruence axiom (EUF) asserts that a of sort element τe . Term t( j) is not required to contain
function evaluates to the same value for the same j and if it does not contain j, it represents a constant λ -
argument values. term (e.g. λ j . 0). In contrast to SMT-LIB v2 macros,
We only consider a non-recursive λ -calculus, as- it is not required to represent arrays with curried λ -
suming the usual notation and terminology, includ- chains, as arrays are accessed at one single index at a
ing the notion of function application, currying and time (cf. read and write operations on arrays).
β -reduction. In general, we denote a λ -term λx as We treat array variables as UF with exactly one
λ x.t(x), where x is a variable bound by λx and t(x) argument and represent them as fa for array variable a.
is a term in which x may or might not occur. We We interpret read operations as function applica-
interpret t(x) as defining the scope of bound variable tions on either UF or λ -terms with read index i as
x. Without loss of generality, the number of bound argument and represent them as read(a, i) ≡ fa (i) and
variables per λ -term is restricted to exactly one. Func- read(λ j . t( j), i) ≡ (λ j . t( j))(i), respectively.
tions with more than one parameter are transformed We interpret write operations as λ -terms model-
into a chain of nested λ -terms by means of currying ing the result of the write operation on array a
(e.g. f (x, y) := x + y is rewritten as λ x . λ y . x + y). As at index i with value e, and represent them as
a notational convention, we will use λx̄ as a shorthand write(a, i, e) ≡ λ j . ite(i = j, e, fa ( j)). A function appli-
for λ x0 . . . λ xk . t(x0 , . . . , xk ) for k ≥ 0. We identify the cation on a λ -term λw representing a write operation
function application as an explicit operation on λ -terms yields value e if j is equal to the modified index
and interpret it as instantiating a bound variable (all i, and the unmodified value fa ( j), otherwise. Note
bound variables) of a λ -term (a curried λ -chain). We that applying β -reduction to a λ -term λw yields the
interpret β -reduction as a form of function application, same behaviour described by array axioms A2 and
where all formal parameter variables (bound variables) A3. Consider a function application on λw (k), where
are substituted with their actual parameter terms. We k represents the position to be read from. If k = i
will use λx̄ [x0 \a0 , . . . , xn \an ] to indicate β -reduction of (A2), β -reduction yields the written value e, whereas if
a λ -term λx̄ , where the formal parameters x0 , . . . , xn are k 6= i (A3), β -reduction returns the unmodified value of
substituted with the actual argument terms a0 , . . . , an . array a at position k represented by fa (k). Hence, these

29
axioms do not need to be explicitly checked during
consistency checking. This is in essence the approach eq

2 1
to handle arrays taken by UCLID [17].
We interpret if-then-else operations on arrays applyi applyj

1 2 1
a and b as λ -terms, and represent them as 2

ite(c, a, b) ≡ λ j . ite(c, fa ( j), fb ( j)). Condition c yields lambda args eq args

21 2 1
either function application fa ( j) or fb ( j), which repre-
var var
sent the values of arrays a and b at index j, respectively. ite i j
2 13
In addition to the base array operations introduced
apply slt
above, λ -terms enable us to succinctly model array
1 2 2 1
operations like e.g. memcpy and memset from the
lambda args const
standard C library, which we previously were not able
12
to efficiently express by means of read, write and ite param param
y x
operations on arrays. In particular, both memcpy and
memset could only be represented by a fixed sequence Fig. 1: DAG representation of formula ψ1 .
of read and write operations within a constant index
range, such as copying exactly 5 words etc. It was In contrast to treating SMT-LIB v2 macros as C-style
not possible to express a variable range, e.g. copying n macros, i.e., substituting every function application with
words, where n is a symbolic (bit vector) variable. the instantiated function body, in Boolector, we directly
With λ -terms however, we do not require a sequence translate SMT-LIB v2 macros into λ -terms, which are
of array operations as it usually suffices to model a par- then handled lazily via lemmas on demand. Formulas
allel array operation by means of exactly one λ -term. are represented as directed acyclic graphs (DAG) of
Further, the index range does not have to be fixed and bit vector and array expressions. Further, in this paper,
can therefore be within a variable range. This type of we propose to treat arrays and array operations as λ -
high level modeling turned out to be useful for applica- terms and operations on λ -terms, which results in an
tions in software model checking [10]. See also [17] for expression graph with no expressions of sort array (τi ⇒
more examples. For instance, the memset with signature τe ). Instead, we introduce the following four additional
memset (a, i, n, e), which sets each element of array a expression types of sort bit vector:
within the range [i, i + n[ to value e, can be represented • a param expression serves as a placeholder variable
as λ j . ite(i ≤ j ∧ j < i + n, e, fa ( j)). Note, n can be for a variable bound by a λ -term
symbolic, and does not have to be a constant. In the • a lambda expression binds exactly one param ex-
same way, memcpy with signature memcpy (a, b, i, k, n), pression, which may occur in a bit vector expression
which copies all elements of array a within the range that represents the body of the λ -term
[i, i + n[ to array b, starting from index k, is represented • an args expression is a list of function arguments
as λ j . ite(k ≤ j ∧ j < k + n, fa (i + j − k), fb ( j)). As a • an apply expression represents a function application
special case of memset, we represent array initialization that applies arguments args to a lambda expression
operations, where all elements of an array are initialized Example 1: Consider ψ1 ≡ f (i) = f ( j) ∧ i 6= j with
with some (constant or symbolic) value e, as λ j . e. functions f (x) := ite(x < 0, g(x), x), g(y) := −y as de-
Introducing λ -terms does not only enable us to model picted in Fig. 1. Both functions are represented as λ -
terms, where function g(y) returns the negation of y
arrays and array operations, but further provides support
and is used in function f (x), which computes the abso-
for arbitrary functions (macros) by means of currying, lute value of x. Dotted nodes indicate parameterized
with the following restrictions: (1) functions may not expressions, i.e., expressions that depend on param
be recursive and (2) arguments to functions may not be expressions, and serve as templates that are instantiated
functions. The first restriction enables keeping the im- as soon as β -reduction is applied.
plementation of λ -term handling in Boolector as simple In order to lazily evaluate λ -terms in Boolector we
as possible, whereas the second restriction limits λ -term implemented two β -reduction approaches, which we
handling in Boolector to non-higher order functions. will discuss in the next section in more detail.
Relaxing these restrictions will turn the considered λ -
calculus to be Turing-complete and in general render IV. β - REDUCTION
the decision problem to be undecidable. As future work In this section we discuss how concepts from the
it might be interesting to consider some relaxations. λ -calculus have been adapted and implemented in

30
our SMT solver Boolector. We focus on reduction
algorithms for the non-recursive λ -calculus, which is eq

1
rather atypical for the (vast) literature on λ -calculus. 2

In the context of Boolector, we distinguish between 2
apply apply
1 2
full and partial β -reduction. They mainly differ in 1

args args
their application and the depth up to which λ -terms 1 2
lambda

2 1 1 2
are expanded. In essence, given a function application
var var var
λx̄ (a0 , . . . , an ) partial β -reduction reduces only the top- j
1
lambda k l
2
most λ -term λx̄ , whereas full β -reduction reduces λx̄
and every λ -term in the scope of λx̄ . 3
ite

1 2
Full β -reduction of a function application on λ -term
apply apply
λx̄ consists of a series of β -reductions, where λ -term
2 1 2
1
λx̄ and every λ -term λȳ within the scope of λx̄ are in-
args args
stantiated, substituting all formal parameters with actual 1
ult
2
lambda
2 1
parameter terms. Since we do not allow partial function
param param
ite
applications, full β -reduction guarantees to yield a term y 1 x
2 3
which is free of λ -terms. Given a formula with λ -terms,
eq var
mul
we usually employ full β -reduction in order to eliminate e
1
1 2 2
all λ -terms by substituting every function application
var param
const
with the term obtained by applying full β -reduction i x

on that function application. In the worst case, full β - (a) Original formula ψ2 .
reduction results in an exponential blow-up. However,
in practice, it is often beneficial to employ full β -
reduction, since it usually leads to significant simplifi- eq
2
cations through rewriting. In Boolector, we incorporate 1

this method as an optional rewriting step. We will use 1
ite
3
ite
2 3 1
2
λx̄ [x0 \a0 , . . . , xn \an ]f as a shorthand for applying full
β -reduction to λx̄ with arguments a0 , . . . , an . 1
and
1
ite
2
ite

2 3
Partial β -reduction of a λ -term λx̄ , on the other 3 2 1

var
hand, essentially works in the same way as what is eq ult
1
mul
1
eq mul e 1
eq
2
mul ult
2
2 1 1 2 2 2
1 2 1 1 2
referred to as β -reduction in the λ -calculus. Given a
var var var var
function application λx̄ (a0 , . . . , an ), partial β -reduction j i const l k

substitutes formal parameters x0 , . . . , xn with the actual (b) Formula ψ20 after full β -reduction of ψ2 .
argument terms a0 , . . . , an without applying β -reduction Fig. 2: Full β -reduction of formula ψ2 .
to other λ -terms within the scope of λx̄ . This has the
effect that λ -terms are expanded function-wise, which a) Visit arguments b0 , . . . , bm first, and obtain rebuilt
we require for consistency checking. In the following, arguments b00 , . . . , b0m .
we use λx̄ [x0 \a0 , . . . , xn \an ]p to denote the application b) Assign rebuilt arguments b00 , . . . , b0m to λȳ and
of partial β -reduction to λx̄ with arguments a0 , . . . , an . apply β -reduction to λȳ (b00 , . . . , b0m ).
This ensures a bottom-up construction of the β -
A. Full β -reduction
reduced DAG (see step 3.), since all arguments
Given a function application λx̄ (a0 , . . . , an ) and a b00 , . . . , b0m passed to a λ -term λȳ are β -reduced and
DAG representation of λx̄ . Full β -reduction of λx̄ rebuilt prior to applying β -reduction to λȳ .
consecutively substitutes formal parameters with actual 3) During up-traversal of the DAG we rebuild all
argument terms while traversing and rebuilding the visited expressions bottom-up and require special
DAG in depth-first-search (DFS) post-order as follows. handling for the following expressions:
1) Initially, we instantiate λx̄ by assigning arguments • param: substitute param expression yi with cur-
a0 , . . . , an to the formal parameters x0 , . . . , xn . rent instantiation b0i
2) While traversing down, for any λ -term λȳ in the • apply: substitute expression λȳ (b0 , . . . , bm ) with
scope of λx̄ , we need special handling for each λȳ [y0 \b00 , . . . , ym \b0m ]f
function application λȳ (b0 , . . . , bm ) as follows.

31
We further employ following optimizations to improve
the performance of the full β -reduction algorithm.
eq

• Skip expressions that do not need rebuilding 2 1

Given an expression e within the scope of a λ -term ite ite
λx̄ . If e is not parameterized and does not contain 3
1 2 2 3 1

any λ -term, e is not dependent on arguments passed apply apply apply
1
to λx̄ and may therefore be skipped. 2 1 2
1 2

• λ -scope caching args ult args lambda args ult
1 1
We cache rebuilt expressions in a λ -scope to prevent 2
1
2 2

rebuilding parameterized expressions several times. var
l
var
k ite
var
j

Example 2: Given a formula ψ2 ≡ f (i, j) = f (k, l) 3 1 2

and two functions g(x) := ite(x = i, e, 2 ∗ x) and mul eq var
e
2
f (x, y) := ite(y < x, g(x), g(y)) as depicted in Fig. 2a. 2 1 1

Applying full β -reduction to formula ψ2 yields formula param const
var
i
ψ20 as illustrated in Fig. 2b. Function application f (i, j)
has been reduced to ite( j ≥ i ∧ i 6= j, 2 ∗ j, e) and f (k, l) Fig. 3: Partial β -reduction of formula ψ2 .
to ite(l < k, ite(k = i, e, 2 ∗ k), ite(l = i, e, 2 ∗ l)).
• if-then-else: substitute expression ite(c,t1 ,t2 ) with
t1 if c = >, and t2 otherwise
B. Partial β -reduction For partial β -reduction, we have to modify the first of
Given a function application λx̄ (a0 , . . . , an ) and a the two optimizations introduced for full β -reduction.
DAG representation of λx̄ . The scope of a partial β - • Skip expressions that do not need rebuilding
reduction β p (λx̄ ) is defined as the sub-DAG obtained Given an expression e in the scope of partial β -
by cutting off all λ -terms in the scope of λx̄ . Assume reduction β p (λx̄ ). If e is not parameterized, in the
that we have an assignment for arguments a0 , . . . , an , context of partial β -reduction, e is not dependent
and for all non-parameterized expressions in the scope on arguments passed to λx̄ and may be skipped.
of β p (λx̄ ). The partial β -reduction algorithm substi- Example 3: Consider ψ2 from Ex. 2. Applying partial
tutes param expressions x0 , . . . , xn with a0 , . . . , an and β -reduction to ψ2 yields the formula depicted in Fig. 3,
rebuilds λx̄ . Similar to full β -reduction, we perform a where function application f (i, j) has been reduced to
DFS post-order traversal of the DAG as follows. ite( j < i, e, g( j)) and f (k, l) to ite(l < k, g(k), g(l)).
1) Initially, we instantiate λx̄ by assigning arguments
V. D ECISION P ROCEDURE
a0 , . . . , an to the formal parameters x0 , . . . , xn .
2) While traversing down the DAG, we require special The idea of lemmas on demand goes back to [7]
handling for the following expressions: and actually represents one extreme variant of the lazy
• function applications λȳ (b0 , . . . , bm )
SMT approach [16]. Around the same time, a related
technique was developed in the context of bounded
a) Visit arguments b0 , . . . , bm , obtain rebuilt ar-
model checking [9], which lazily encodes all-different
guments b00 , . . . , b0m .
constraints over bit vectors (see also [2]). In constraint
b) Do not assign rebuilt arguments b00 , . . . , b0m to
programming the related technique of lazy clause gen-
λȳ and stop down-traversal at λȳ .
eration [15] is effective too.
• ite(c,t1 ,t2 ) In this section, we introduce lemmas on demand for
Since we have an assignment for all non- non-recursive λ -terms based on the algorithm intro-
parameterized expressions within the scope of duced in [3]. A top-level view of our lemmas on demand
β p (λx̄ ), we are able to evaluate condition c. Based decision procedure for λ -terms (DPλ ) is illustrated in
on that we either select t1 or t2 to further traverse Fig. 4 and proceeds as follows. Given a formula φ ,
down (the other branch is omitted). DPλ uses a bit vector skeleton of the preprocessed
3) During up-traversal of the DAG we rebuild all formula π as formula abstraction αλ (π). In each itera-
visited expressions bottom-up and require special tion, an underlying decision procedure DPB determines
handling for the following expressions: the satisfiability of the formula abstraction refined by
• param: substitute param expression yi with cur- formula refinement ξ , i.e., in DPB , we eagerly encode
rent instantiation b0i the refined formula abstraction Γ to SAT and determine

32
p r o c e d u r e DPλ ( φ )
π ← preprocess(φ ) eq eq

ξ ←> 1 2 1 2
loop var var
αλ (applyi ) αλ (applyj )
Γ ← αλ (π) ∧ ξ i j

(r, σ ) ← DPB (Γ)
i f r = unsatisfiable r e t u r n unsatisfiable Fig. 5: Formula abstraction αλ (ψ1 ).
i f consistentλ (π, σ ) r e t u r n satisfiable
ξ ← ξ ∧ αλ (lemmaλ (π, σ )) Example 4: Consider formula ψ1 from Ex. 1, which
has two roots. The abstraction function αλ performs a
Fig. 4: Lemmas on demand for λ -terms DPλ . consecutive down-traversal of the DAG from both roots.
The resulting abstraction is a mapping of all bit vector
its satisfiability by means of a SAT solver. As Γ is terms encountered during traversal, according to the
an over-approximation of φ , we immediately conclude rules 1-3 above. For function applications (e.g. applyi )
with unsatisfiable if Γ is unsatisfiable. If Γ is satisfiable, fresh bit vector variables (e.g. αλ (applyi )) are intro-
we have to check if the current satisfying assign- duced, where the remaining sub-DAGs are therefore cut
ment σ (as provided by procedure DPB ) is consistent off. The resulting abstraction αλ (ψ1 ) is given in Fig. 5.
w.r.t. preprocessed formula π. If σ is consistent, i.e., if
it can be extended to a valid satisfying assignment for VII. C ONSISTENCY C HECKING
the preprocessed formula π, we immediately conclude
In this section, we introduce a consistency checking
with satisfiable. Otherwise, assignment σ is spurious,
algorithm consistentλ as a generalization of the con-
consistentλ (π, σ ) identifies a violation of the function
sistency checking approach presented in [3]. However,
congruence axiom EUF, and we generate a symbolic
in contrast to [3], we do not propagate so-called access
lemma lemmaλ (π, σ ) which is added to formula re-
nodes but function applications and further check axiom
finement ξ in its abstracted form αλ (lemmaλ (π, σ )).
EUF (while applying partial β -reduction to evaluate
Note that in φ , in contrast to the decision procedure
function applications under a current assignment) in-
introduced in [3], all array variables and array opera-
stead of checking array axioms A1 and A2. Given a
tions in the original input have been abstracted away
satisfiable over-approximated and refined formula Γ,
and replaced by corresponding λ -terms and operations
procedure consistentλ determines whether a current
on λ -terms. Hence, various integral components of the
satisfying assignment σ (as provided by the under-
original procedure (αλ , consistentλ , lemmaλ ) have been
lying decision procedure DPB ) is spurious, or if it
adapted to handle λ -terms as follows.
can be extended to a valid satisfying assignment for
the preprocessed input formula π. Therefore, for each
VI. F ORMULA A BSTRACTION
function application in π, we have to check both if
In this section, we introduce a partial formula ab- the assignment of the corresponding abstraction vari-
straction function αλ as a generalization of the ab- able is consistent with the value obtained by applying
straction approach presented in [3]. Analogous to [3], partial β -reduction, and if axiom EUF is violated. If
we replace function applications by fresh bit vector consistentλ does not find any conflict, we immediately
variables and generate a bit vector skeleton as for- conclude that formula π is satisfiable. However, if
mula abstraction. Given π as the preprocessed input current assignment σ is spurious w.r.t. preprocessed
formula φ , our abstraction function αλ traverses down formula π, either axiom EUF is violated or partial β -
the DAG structure starting from the roots, and generates reduction yields a conflicting value for some function
an over-approximation of π as follows. application in π. In both cases, we generate a lemma
1) Each bit vector variable and symbolic constant is as formula refinement. In the following we will equally
mapped to itself. use function symbols f , g, and h for UF symbols and
2) Each function application λx̄ (a0 , . . . , an ) is mapped λ -terms.
to a fresh bit vector variable. In order to check axiom EUF, for each λ -term and UF
3) Each bit vector term t(y0 , . . . , ym ) is mapped to symbol we maintain a hash table ρ, which maps λ -
t(αλ (y0 ), . . . , αλ (ym )). terms and UF symbols to function applications. We
Note that by introducing fresh variables for function check consistency w.r.t. π by applying the following
applications, we essentially cut off λ -terms and UF rules.
and therefore yield a pure bit vector skeleton, which I: For each f (ā), if ā is not parameterized,
is subsequently eagerly encoded to SAT. add f (ā) to ρ( f )

33
C: For any pair s := g(ā), t := h(b̄) ∈ ρ( f ) check p r o c e d u r e consistentλ (π, σ )
n
^ S ← nonparam_apps ( π )
σ (αλ (ai )) = σ (αλ (bi )) → σ (αλ (s)) = σ (αλ (t)) w h i l e S 6= 0/
i=0 (g, f (a0 , . . . , an )) ← pop ( S )
encode ( f (a0 , . . . , an ) )
B: For any s := λȳ (a0 , . . . , an ) ∈ ρ(λx̄ ) with /∗ check r u l e C ∗/
t := λx̄ [x0 \a0 , . . . , xn \an ]p , i f n o t congruent ( g, f (a0 , . . . , an ) )
return ⊥
check rule P, if P fails, check eval(t) = σ (αλ (s)) add ( f (a0 , . . . , an ), ρ(g) )
P: For any s := λȳ (a0 , . . . , an ) ∈ ρ(λx̄ ) with i f is_UF ( g ) c o n t i n u e
encode ( g )
t := g(b0 , . . . , bm ) = λx̄ [x0 \a0 , . . . , xn \an ]p , /∗ check r u l e B ∗/
n
^ t ← g[x0 \a0 , . . . , xn \an ]p
if n = m ∧ ai = bi , propagate s to ρ(g) i f assigned ( t )
i f σ (t) 6= σ (αλ ( f (a0 , . . . , an )))
i=0 return ⊥
e l i f t = h(a0 , . . . , an ) / ∗ c h e c k r u l e P ∗ /
Given a λ -term (UF symbol) f and a correspond- push ( S, (h, f (a0 , . . . , an )) )
ing hash table ρ( f ). Rule I, the initialization rule, continue
else
initializes ρ( f ) with all non-parameterized function apps ← f resh apps(t)
applications on f . Rule C corresponds to the function f o r a ∈ apps
encode ( a )
congruence axiom and is applied whenever we add i f eval ( t ) 6= σ (αλ ( f (a0 , . . . , an )))
a function application g(a0 , . . . , an ) to ρ( f ). Rule B return ⊥
f o r h(b0 , . . . , bm ) ∈ apps
is a consistency check w.r.t. the current assignment push ( S, (h, h(b0 , . . . , bm )) )
σ , i.e., for every function application s in ρ( f ), we return >
check if the assignment of σ (αλ (s)) corresponds to
Fig. 6: Procedure consistentλ in pseudo-code.
the assignment evaluated by the partially β -reduced
term λx̄ [x0 \a0 , . . . , xn \an ]p . Finally, rule P represents a loop, we check rules C and B for each tuple as follows.
crucial optimization of consistentλ , as it avoids unnec- First we check if f (a0 , . . . , an ) violates the function con-
essary conflicts while checking B. If P applies, both gruence axiom EUF w.r.t. function g and return ⊥ if this
function applications s and t have the same arguments. is the case. Note that for checking rule C, we require an
As function application s ∈ ρ(λx̄ ), rule C implies assignment for arguments a0 , . . . , an , hence we encode
that s = λx̄ (a0 , . . . , an ). Therefore, function applications them on-the-fly. If rule C is not violated and function f
s and t must produce the same function value as is an uninterpreted function, we continue to check the
t := λx̄ [x0 \a0 , . . . , xn \an ]p = λȳ [x0 \a0 , . . . , xn \an ]p , i.e., next tuple on stack S. However, if f is a λ -term we
function application t must be equal to the result of still need to check rule B, i.e., we need to check if the
applying partial β -reduction to function application s. assignment σ (αλ ( f (a0 , . . . , an ))) is consistent with the
Assume we encode t and add it to the formula. If DPB value produced by g[x0 \a0 , . . . , xn \an ]p . Therefore, we
guesses an assignment s.t. σ (αλ (t)) 6= σ (αλ (s)) holds, first encode all non-parameterized expressions in the
we have a conflict and need to add a lemma. However, scope of partial β -reduction β p (g) (cf. encode(g))
this conflict is unnecessary, as we know from the start before applying partial β -reduction with arguments
that both function applications must map to the same a0 , . . . , an , which yields term t. If term t has an as-
function value in order to be consistent. We avoid this signment, we can immediately check if it differs from
conflict by propagating s to ρ(g). assignment σ (αλ ( f (a0 , . . . , an ))) and return ⊥ if this is
Figure 6 illustrates our consistency checking algo- the case. However, if term t does not have an assign-
rithm consistentλ , which takes the preprocessed input ment, which is the case when t has been instantiated
formula π and a current assignment σ as arguments, and from a parameterized expression, we have to compute
proceeds as follows. First, we initialize stack S with all the value for term t. Note that we could also encode
non-parameterized function applications in formula π term t to get an assignment σ (t), but this might add a
(cf. nonparam_apps(π)) and order them top-down, considerable amount of superfluous clauses to the SAT
according to their appearance in the DAG represen- solver. Before computing a value for t we check if rule
tation of π. The top-most function application then P applies and propagate f (a0 , . . . , an ) to h if applicable.
represents the top of stack S, which consists of tuples Otherwise, we need to compute a value for t and
(g, f (a0 , . . . , an )), where f and g are initially equal and check if t contains any function applications that were
f (a0 , . . . , an ) denotes the function application propa- instantiated and not yet encoded (cf. fresh_apps(t))
gated to function g. In the main consistency checking and encode them if necessary. Finally, we compute

34
the value for t (cf. eval(t)) and compare it to the we run DPB on αλ (ψ1 ) and it returns a satisfying
assignment of αλ ( f (a0 , . . . , an )). If the values differ, assignment σ such that σ (i) 6= σ ( j), σ (ai ) = σ (a j ),
we found an inconsistency and return ⊥. Otherwise, σ (i) < 0 and σ (ai ) 6= σ (−i). First, we check con-
we continue consistency checking the newly encoded sistency for λx (i) and check rule C, which is not
function applications apps. We conclude with >, if all violated as σ (i) 6= σ ( j), and continue with checking
rule B. We apply partial β -reduction and obtain term
function applications have been checked successfully t := λx [x/i]p = λy (i) (since σ (i) < 0) for which rule P
and no inconsistencies have been found. is applicable. We propagate λx (i) to λy , check if λx (i)
is consistent w.r.t. λy , apply partial β -reduction, obtain
A. Lemma generation t := λy [y/i]p = −i and find an inconsistency according
Following [3], we introduce a lemma generation to rule B: σ (ai ) 6= σ (−i) but we obtained σ (ai ) =
σ (−i). We generate lemma i < 0 → ai = −i. Assume
procedure lemmaλ , which generates a symbolic lemma
that in the next iteration DBP returns a new satisfying
whenever our consistency checker detects an inconsis- assignment σ such that σ (i) 6= σ ( j), σ (ai ) = σ (a j ),
tency. Depending on whether rule C or B was violated, σ (i) < 0, σ (ai ) = σ (−i) and σ ( j) > σ (−i). We first
we generate a symbolic lemma as follows. Assume check consistency for λx (i), which is consistent due to
that rule C was violated by function applications s := the lemma we previously generated. Next, we check
g(a0 , . . . , an ), t := h(b0 , . . . , bn ) ∈ ρ( f ). We first collect rule C for λx ( j), which is not violated since σ (i) 6=
all conditions that lead to the conflict as follows. σ ( j), and continue with checking rule B. We apply
partial β -reduction and obtain term t := λx [x/ j]p = j
1) Find the shortest possible propagation path ps (pt )
(since σ ( j) > σ (−i) and σ (i) < 0) and find an incon-
from function application s (t) to function f . sistency as σ (ai ) = σ (−i), σ (ai ) = σ (a j ) and σ ( j) >
2) Collect all ite conditions cs0 , . . . , csj (ct0 , . . . , ctl ) on σ (−i), but σ (a j ) = σ ( j). We then generate lemma
path ps (pt ) that were > under given assignment σ . j > 0 → a j = j.
3) Collect all ite conditions cs0 , . . . , csk (ct0 , . . . , ctm ) on
path ps (pt ) that were ⊥ under given assignment σ . VIII. E XPERIMENTS
We generate the following (in general symbolic) We applied our lemmas on demand approach for
lemma: λ -terms on three different benchmark categories: (1)
j
^ k
^ l
^ m
^ n
^ crafted, (2) SMT’12, and (3) application. For the crafted
csi ∧ ¬csi ∧ cti ∧ ¬cti ∧ ai = bi → s = t category, we generated benchmarks using SMT-LIB v2
i=0 i=0 i=0 i=0 i=0 macros, where the instances of the first benchmark set
Assume that rule B was violated by a function (macro blow-up) tend to blow up in formula size if
application s := λȳ (a0 , . . . , an ) ∈ ρ(λx̄ ). We obtained SMT-LIB v2 macros are treated as C-style macros.
t := λx̄ [x0 \a0 , . . . , xn \an ]p and collect all conditions that The benchmark sets fisher-yates SAT and fisher-yates
lead to the conflict as follows. UNSAT encode an incorrect and correct but naive im-
1) Collect ite conditions cs0 , . . . , csj and cs0 , . . . , csk for s plementation of the Fisher-Yates shuffle algorithm [11],
as in steps 1-3 above. where the instances of the fisher-yates SAT also tend
to blow up in the size of the formula if SMT-LIB
2) Collect all ite conditions ct0 , . . . , ctl that evaluated to
v2 macros are treated as C-style macros. The SMT’12
> under current assignment σ when partially β -
category consists of all non-extensional QF AUFBV
reducing λx̄ to obtain t.
benchmarks used in the SMT competition 2012. For
3) Collect all ite conditions ct0 , . . . , ctm that evaluated
the application category, we considered the instantia-
to ⊥ under current assignment σ when partially β -
tion benchmarks1 generated with LLBMC as presented
reducing λx̄ to obtain t.
in [10]. The authors also kindly provided the same
We generate the following (in general symbolic) benchmark family using λ -terms as arrays, which is
lemma: j k l m denoted as lambda.
^ ^ ^ ^
csi ∧ ¬csi ∧ cti ∧ ¬cti → s = t We performed all experiments on 2.83GHz Intel Core
i=0 i=0 i=0 i=0 2 Quad machines with 8GB of memory running Ubuntu
Example 5: Consider formula ψ1 and its prepro- 12.04.2 setting a memory limit of 7GB and a time limit
cessed formula abstraction αλ (ψ1 ) from Ex. 1. For the for the crafted and the SMT’12 benchmarks of 1200
sake of better readability, we will use λx and λy to seconds. For the application benchmarks, as in [10]
denote functions f and g, and further use ai and a j
as a shorthand for αλ (applyi ) and αλ (applyj ). Assume 1 http://llbmc.org/files/downloads/vstte-2013.tgz

35
Time Space Time Space
Solver Solved TO MO Solver Solved TO MO
[103 s] [GB] [103 s] [GB]
Boolector 100 0 0 24.2 9.4 Boolector 139 10 0 19.9 14.8

SMT’12
macro blow-up

Boolectornop 100 0 0 18.2 8.4 Boolectornop 134 15 0 26.3 14.5
Boolectorβ 28 49 23 91.5 160.0 Boolectorβ 137 11 1 21.5 22.7
CVC4 21 0 79 95.7 551.6 Boolectorsc12 140 9 0 15.9 10.3
MathSAT 51 2 47 64.6 395.0
SONOLAR 26 74 0 90.2 1.7 TABLE II: Results SMT’12 benchmark.
Z3 21 0 79 95.0 552.2
Boolector 7 10 1 14.0 7.5 mark set is SONOLAR. However, it is not clear how
Boolectornop 4 13 1 17.3 7.0
fisher-yates

Boolectorβ 6 1 11 15.0 76.4 SONOLAR handles SMT-LIB v2 macros. Surprisingly,
SAT

CVC4 5 1 12 15.7 83.6 on these benchmarks Boolectornop performs better than
MathSAT 6 10 2 14.7 17.3 Boolector with optimization rule P, which needs fur-
SONOLAR 3 14 1 18.1 6.9
Z3 6 12 0 14.8 0.2 ther investigation. On the fisher-yates SAT benchmarks
Boolector 5 13 1 17.4 7.1 Boolector not only solves the most instances, but re-
Boolectornop 4 14 1 18.2 6.9
fisher-yates

quires 107 seconds for the first 6 instances, for which
UNSAT

Boolectorβ 9 0 10 12.1 72.0
CVC4 3 4 12 19.2 82.1 Boolectorβ , MathSAT and Z3 need more than 300
MathSAT 6 11 2 15.9 14.7 seconds each. Boolectornop does not perform as well
SONOLAR 3 15 1 19.2 6.8
Z3 10 9 0 11.2 2.2 as Boolector due to the fact that on these benchmarks
optimization rule P is heavily applied. In fact, on these
TABLE I: Results crafted benchmark.
benchmarks, rule P applies to approx. 90% of all prop-
we used a time limit of 60 seconds. We evaluated agated function applications on average. On the fisher-
four different versions of Boolector: (1) our lemmas yates UNSAT benchmarks Z3 and Boolectorβ solve the
on demand for λ -terms approach DPλ (Boolector), most instances, whereas Boolector and Boolectornop do
(2) DPλ without optimization rule P (Boolectornop ), not perform so well. This is mostly due to the fact
(3) DPλ with full β -reduction (Boolectorβ ), and (4) that these benchmarks can be simplified significantly
the version submitted to the SMT competition 2012 when macros are eagerly eliminated, whereas partial
(Boolectorsc12 ). For comparison we used the following β -reduction does not yield as much simplifications.
SMT solvers: CVC4 1.2, MathSAT 5.2.6, SONOLAR We measured overhead of β -reduction in Boolector on
2013-05-15, STP 1673 (svn revision), and Z3 4.3.1. these benchmarks and it turned out that for the macro
Note that we limited the set of solvers to those which blow-up and fisher-yates UNSAT instances the overhead
currently support SMT-LIB v2 macros and the theory is negligible (max. 3% of total run time), whereas for
of fixed-size bit vectors. As a consequence, we did not the fisher-yates SAT instances β -reduction requires over
compare our approach to UCLID (no bit vector support) 50% of total run time.
and Yices, which both have native λ -term support, but Table II summarizes the results of running all
lack support for the SMT-LIB v2 standard. four Boolector versions on the SMT’12 benchmark
As indicated in Tables I, II and III, we measured the set. We compared our three approaches Boolector,
number of solved instances (Solved), timeouts (TO), Boolectornop , and Boolectorβ to Boolectorsc12 , which
memory outs (MO), total CPU time (Time), and total won the QF AUFBV track in the SMT competition
memory consumption (Space) required by each solver 2012. In comparison to Boolectorβ , Boolector solves 5
for solving an instance. If a solver ran into a timeout, unique instances, whereas Boolectorβ solves 3 unique
1200 seconds (60 seconds for category application) instances. In comparison to Boolectorsc12 , both solvers
were added to the total time as a penalty. In case of a combined solve 2 unique instances. Overall, on the
memory out, 1200 seconds (60 seconds for application) SMT’12 benchmarks Boolectorsc12 still outperforms
and 7GB were added to the total CPU time and total the other approaches. However, our results still look
memory consumption, respectively. promising since none of the approaches Boolector,
Table I summarizes the results of the crafted bench- Boolectornop and Boolectorβ are heavily optimized yet.
mark category. On the macro blow-up benchmarks, On these benchmarks, the overhead of β -reduction in
Boolector and Boolectornop benefit from lazy λ -term Boolector is around 7% of the total run time.
handling and thus, outperform all those solvers which Finally, Table III summarizes the results of the appli-
try to eagerly eliminate SMT-LIB v2 macros with a cation category. We used the benchmarks obtained from
very high memory consumption as a result. The only the instantiation-based reduction approach presented
solver not having memory problems on this bench- in [10] (instantiation benchmarks) and compared our

36
Time Space
Solver Solved TO MO in the SMT competition 2012. With the application
[s] [MB]
Boolector 37 8 0 576 235 benchmarks, we demonstrated the potential of native
instantiation

Boolectornop 35 10 0 673 196 λ -term support within an SMT solver. Our experiments
Boolectorβ 44 1 0 138 961
Boolectorsc12 39 6 0 535 308 look promising even though we employ a rather naive
STP 44 1 0 141 3814 implementation of β -reduction in Boolector and also
Boolector 37 8 0 594 236
Boolectornop 35 10 0 709 166 do not incorporate any λ -term specific rewriting rules
lambda

Boolectorβ 45 0 0 52 676 except full β -reduction.
Boolectorsc12 - - - - - In future work we will address the performance
STP - - - - -
bottleneck of the β -reduction implementation and will
TABLE III: Results application benchmarks. further add λ -term specific rewriting rules. We will
analyze the impact of various β -reduction strategies on
new approaches to STP, the same version of the solver
our lemmas on demand procedure and will further add
that outperformed all other solvers on these benchmarks
support for extensionality over λ -terms. Finally, with
in the experimental evaluation of [10]. On the instanti-
the recent and ongoing discussion within the SMT-LIB
ation benchmarks Boolectorβ and STP solve the same
community to add support for recursive functions, we
number of instances in roughly the same time. However,
consider extending our approach to recursive λ -terms.
Boolectorβ requires less memory for solving those
instances. Boolector, Boolectornop and Boolectorsc12 X. ACKNOWLEDGEMENTS
did not perform so well on these benchmarks because We would like to thank Stephan Falke, Florian Merz
in contrast to Boolectorβ and STP, they do not ea- and Carsten Sinz for sharing benchmarks and Bruno
gerly eliminate read operations, which is beneficial on Duterte for explaining the implementation and limits
these benchmarks. The lambda benchmarks consist of of lambdas in SMT solvers, and more specifically in
the same problems as instantiation, using λ -terms for Yices.
representing arrays. On these benchmarks, Boolectorβ
clearly outperforms Boolector and Boolectornop and R EFERENCES
solves all 45 instances within a fraction of time. [1] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jo-
Boolectorsc12 and STP do not support λ -terms as arrays vanovic, T. King, A. Reynolds, and C. Tinelli. CVC4. In
CAV, volume 6806 of LNCS, pages 171–177. Springer, 2011.
and therefore were not able to participate on this bench- [2] A. Biere and R. Brummayer. Consistency Checking of All
Different Constraints over Bit-Vectors within a SAT Solver.
mark set. By exploiting the native λ -term support for In FMCAD, pages 1–4. IEEE, 2008.
arrays in Boolectorβ , in comparison to the instantiation [3] R. Brummayer and A. Biere. Lemmas on Demand for the
Extensional Theory of Arrays. JSAT, 6(1-3):165–201, 2009.
benchmarks we achieve even better results. Note that on [4] R. E. Bryant, S. K. Lahiri, and S. A. Seshia. Modeling and
the lambda (instantiation) benchmarks, the overhead in Verifying Systems Using a Logic of Counter Arithmetic with
Lambda Expressions and Uninterpreted Functions. In CAV,
Boolectorβ for applying full β -reduction was around volume 2404 of LNCS, pages 78–92. Springer, 2002.
[5] A. Cimatti, A. Griggio, B. J. Schaafsma, and R. Sebastiani.
15% (less than 2%) of the total run time. The MathSAT5 SMT Solver. In TACAS, volume 7795 of
Benchmarks, binaries of Boolector and all log files LNCS, pages 93–107. Springer, 2013.
[6] L. De Moura and N. Bjørner. Z3: an efficient SMT solver.
of our experiments can be found at: http://fmv.jku.at/ In Proc. ETAPS’08, pages 337–340, 2008.
difts-rev-13/lloddifts13.tar.gz. [7] L. M. de Moura, H. Rueß, and M. Sorea. Lazy Theorem
Proving for Bounded Model Checking over Infinite Domains.
In CADE, volume 2392 of LNCS. Springer, 2002.
[8] B. Dutertre and L. de Moura. The Yices SMT solver. Tool
IX. C ONCLUSION paper at http://yices.csl.sri.com/tool-paper.pdf, Aug. 2006.
[9] N. Eén and N. Sörensson. Temporal induction by incremental
In this paper, we introduced a new decision procedure SAT solving. ENTCS, 89(4):543–560, 2003.
for handling non-recursive and non-extensional λ -terms [10] S. Falke, F. Merz, and C. Sinz. Extending the Theory of
Arrays: memset, memcpy, and Beyond. In Proc. VSTTE’13.
as a generalization of the array decision procedure [11] R. Fisher and F. Yates. Statistical tables for biological,
agricultural and medical research. Oliver and Boyd, 1953.
presented in [3]. We showed how arrays, array op- [12] V. Ganesh and D. L. Dill. A Decision Procedure for Bit-
erations and SMT-LIB v2 macros are represented in Vectors and Arrays. In Proc. CAV’07. Springer-Verlag, 2007.
[13] F. Lapschies, J. Peleska, E. Gorbachuk, and T. Mangels.
Boolector and evaluated our new approach with 3 SONOLAR SMT-Solver. System Desc. SMT-COMP’12. http:
different benchmark categories: crafted, SMT’12 and //smtcomp.sourceforge.net/2012/reports/sonolar.pdf, 2012.
[14] J. McCarthy. Towards a Mathematical Science of Computa-
application. The crafted category showed the benefit tion. In IFIP Congress, pages 21–28, 1962.
[15] O. Ohrimenko, P. J. Stuckey, and M. Codish. Propagation via
of lazily handling SMT-LIB v2 macros where eager lazy clause generation. Constraints, 14(3):357–391, 2009.
macro elimination tends to blow-up the formula in size. [16] R. Sebastiani. Lazy Satisability Modulo Theories. JSAT, 3(3-
4):141–224, 2007.
We further compared our new implementation to the [17] S. A. Seshia. Adaptive Eager Boolean Encoding for Arith-
version of Boolector that won the QF AUFBV track metic Reasoning in Verification. PhD thesis, CMU, 2005.

37
CHIMP: a Tool for Assertion-Based Dynamic
Verification of SystemC Models

Sonali Dutta Deian Tabakov Moshe Y. Vardi
Rice University Schlumberger Rice University
Email:Sonali.Dutta@rice.edu Email:deian@dtabakov.com Email:vardi@cs.rice.edu

Abstract—CHIMP is a tool for assertion-based dynamic The growing popularity of SystemC has motivated
verification of SystemC models. The various features of research aimed at assertion-based dynamic verification
CHIMP include automatic generation of monitors from (ABDV) of SystemC models [9]. ABDV involves two
temporal assertions, automatic instrumentation of the steps: generating run-time monitors from input asser-
model-under-verification (MUV), and three-way commu-
tions [8], and executing the model-under-verification
nication among the MUV, the generated monitors, and the
SystemC simulation kernel during the monitored execution
(MUV) while running the monitors along with the model.
of the instrumented MUV. Empirical results show that The monitors observe the execution of the MUV and
CHIMP puts minimal runtime overhead on the monitored report if the observed behavior is consistent with the
execution of the MUV. specified behavior. For discussion of related work in
A newly added path in CHIMP results in a significant
ABDV of SystemC see [7].
(over 75%) reduction of average monitor generation and CHIMP, available as an open-source tool1 , imple-
compilation time. The average size of the monitors is re-
ments the monitoring framework for temporal SystemC
duced by over 60%, without increasing runtime overhead.
properties described in [9]–see discussion below. CHIMP
consists of (1) off-the-self components, (2) modified com-
I. I NTRODUCTION ponents, and (3) an original component. CHIMP has
two off-the-self components: (1) spot-1.1.1 [3], a
SystemC (IEEE Standard 1666-2005) has emerged as C++ library used for LTL-to-Büchi conversion , and (2)
a de facto standard for modeling of hardware/software AspectC++-1.1 [6], a C++ aspect compiler that is
systems [4], supporting different levels of abstraction, used for instrumentation of the MUV. CHIMP has two
iterative model refinement, and execution of the model modified components: (1) a patched version of the OSCI
during each design stage. SystemC is implemented as kernel-2.2 to facilitate communication between the
a C++ library with macros and base classes for mod- kernel and the monitors [9], and (2) an extension of
eling processes, modules, channels, signals, ports, and Automaton-1.11 [5], a Java tool used for deter-
the like, while an event-driven simulation kernel allows minization and minimization of finite automata, with the
efficient simulation of concurrent models. Thus, on one ability to read automata descriptions from file. Finally,
hand, SystemC leverages the natural object-oriented en- the original component is MONASGEN, a C++ tool for
capsulation, data hiding, and well-defined inheritance automatic generation of monitors from assertions [8] and
mechanism of C++, and, on the other hand, it allows for automatic generation of an aspect advice file for
modeling and efficient simulation of hardware/software instrumentation [11]. Fig. 1 shows the five components of
designs by its simulation kernel and predefined hardware- CHIMP as described above. The component Automaton-
specific macros and classes. The SystemC code is avail- 1.11 is dotted because a newly added improved path
able as open source, including a single-core reference in CHIMP has been able to remove the dependency of
simulation kernel, referred to as the OSCI kernel; see CHIMP on Automaton-1.11 (explained in Section V).
http://www.accellera.org/downloads/standards/systemc.
CHIMP takes the MUV and a set of temporal as-
Work supported in part by NSF grants CNS 1049862 and CCF- sertions about the behavior of that MUV as inputs,
1139011, by NSF Expeditions in Computing project ”ExCAPE:
Expeditions in Computer Augmented Program Engineering”, by BSF
1
grant 9800096, and by gift from Intel. www.sourceforge.net/projects/chimp-rice

38
II. A SSERTIONS : S YNTAX AND S EMANTICS
CHIMP accepts input assertions, defined using the
temporal specification framework for SystemC described
in [10], where a temporal assertion consists of a linear
temporal formula accompanied by a Boolean expression
serving as a clock. This supports assertions written at
different levels of abstraction with different temporal
resolutions. The framework of [10] proposes a set of
SystemC-specific Boolean variables that refer to Sys-
temC’s software features and its simulation semantics;
see examples below. Input assertions in CHIMP are of the
form “hLT L f ormulai@hclock expressioni”, where
the LT L f ormula expresses a temporal property and the
clock expression denotes when CHIMP should sample
the execution trace of the MUV.
Fig. 1. CHIMP components
Several of the additional Boolean variables pro-
posed in [10] refer to the simulation phases. Accord-
ing to SystemC’s formal semantics, there are 18 pre-
and outputs “FAILED” or “NOT FAILED”, for each
defined kernel phases. CHIMP has Boolean variables
assertion. CHIMP performs white-box validation of user
that enable referring to these phases. For example
code and black-box validation of library code. (If a user
MON DELTA CYCLE END denotes the end of each
wishes to do white-box validation of library code, it can
delta cycle and MON THREAD SUSPEND denotes the
be accomplished by treating the library code as part of
suspension moment of each thread. By using these vari-
the user code.) The two main features of CHIMP are: (1)
ables as clock expressions, we can sample the execu-
CHIMP generates C++ monitors, one for each assertion
tion trace at different temporal resolutions. (By default,
to be verified, and (2) CHIMP automatically instruments
CHIMP samples at each kernel phase.) Other Boolean
the user model [11] to expose the user model’s states and
variables refer to SystemC events, which are key ele-
syntax to the monitors. CHIMP puts nominal overhead
ments of SystemC’s event-driven simulation semantics.
on the runtime of the MUV, supports a rich set of
CHIMP is also able to sample the execution at various
assertions, and can handle a wide array of abstractions,
key locations, e.g., function calls and function returns.
from statement level to system level.
This gives the user the flexibility to write assertions at
A recently added path in CHIMP from LTL to different levels of abstraction, from the level of individual
monitor can bypass the old performance bottleneck, C++ statements to the level of SystemC kernel phases.
Automaton-1.1, and improves the monitor generation and The Boolean primitives supported by CHIMP are
compilation time by a significant amount. It also reduces summarized below; see, [10] and [11] for further details:
the size of the generated monitors notably. This entirely
Function primitives: Let f() be a C++ function in the
removes the dependency of CHIMP on Automaton-1.1.
user or library code. The Boolean primitives f:call and
The theoretical and algorithmic foundations of f:return refer to locations in the source code that contain
CHIMP were described in [8]–[11]. In this paper we the function call, and to locations immediately after
describe the architecture, usage, and recent evolution the function call, respectively. The primitives f:entry
of CHIMP. The rest of the paper is organized as fol- and f:exit refer to the locations immediately before the
lows. Section II describes the syntax and semantics of first executable statement and immediately after the last
assertions. Section III presents an overall picture about executable statement in f(), respectively. If f() is a
the usage, implementation and performance of CHIMP. library function then the entry and exit primitives are not
Section IV describes the C++ monitor generated by supported (black-box verification model for libraries).
CHIMP. Section V describes the new improved path in Value primitives: If a function f() has k arguments,
CHIMP and its improved performance. Finally, Section CHIMP defines variables f : 1, . . . , f : k , where the
VI presents a summary and talks about future work. value and type of f : i are equal to the value and type

39
of the ith parameter of function f() before executing
the first statement in the definition of f(). CHIMP
also defines the variable f : 0, whose value and type
are equal to the value and type of the object that f()
returns. For example, if the function int divide(int
dividend, int divisor) is defined in the MUV,
then the formula G (division:entry -> “division:2 !=
0”) asserts that the divisor is nonzero whenever division
function starts execution.
Phase primitives: The user can refer to the 18 pre-
defined kernel states [10] in the assertions to spec-
ify when the state of the MUV should be sam-
pled. For example, the assertion G (“p == 0”) @
MON DELTA CYCLE END requires the value of vari-
able p to be zero at the end of every delta cycle.
Event primitives: For each SystemC event E , CHIMP Fig. 2. Monitor generation flow
provides a Boolean primitive E.notified that is true
only when the OSCI kernel actually notifies E . For ex-
ample, the assertion G (s.value changed event().notified)
@MON UPDATE PHASE END says that the signal s
changes value at the end of every update phase.

III. U SAGE , I MPLEMENTATION AND P ERFORMANCE
Running CHIMP consists of three steps: (1) In the
first step, the user writes a configuration file containing
all assertions to be verified, as well as other necessary
information. The user can also provide inputs through
command-line switches. For each LTL assertion in the
configuration file, MONASGEN first generates a non-
deterministic Büchi automaton on words (NBW) using
SPOT, then converts that NBW to a minimal deter-
ministic finite automaton on words (DFW), using the
Automaton-1.11 tool for determinization and minimiza-
tion. Then, MONASGEN generates the C++ monitors
from the DFW, one monitor for each assertion (Fig. 2). Fig. 3. MUV instrumentation flow
(2) MONASGEN produces an aspect-advice file that is
then used by AspectC++ to generate the instrumented
MUV (Fig. 3). (3) Finally, the monitors and the instru- of the system submit requests and the system uses a
mented MUV are compiled together and linked to the randomly generated flight database to find a direct flight
patched OSCI kernel, and the code is then run to execute or a sequence of up to three connecting flights. Those are
the simulation with inputs provided by user. The inputs returned to the user for approval, payment and booking.
can be generated using any standard stimuli-generation This model is intended to run forever. It is inspired
technique. For every assertion, CHIMP produces output by actual request/grant subsystems currently used in
indicating if the assertion held or failed for that particular hardware design.
input (Fig. 4).
We used a patched version of the 2.2.0 OSCI kernel
For experimental evaluation we used a SystemC and compiled it using the default settings in the Makefile.
model with about 3,000 LOC, implementing a system The empirical results below were measured on a Pentium
for reserving and purchasing airplane tickets. The users 4, 3.20GHz CPU, 1 GB RAM machine running GNU

40
ARS: Overhead per monitor for Properties (1) and (2)
1.4
Property 1 (safety)

Overhead per monitor (% of baseline)
Property 2 (liveness)
1.2
Both properties

0.8

0.6

0.4

0.2

0 1 2 3
10 10 10
Number of monitors

Fig. 5. Run time overhead for monitoring Properties (1) and
(2)
Fig. 4. Running instrumented MUV with monitors using patched
OSCI kernel
−4
x 10
2

Linux. To assess runtime overhead imposed by the mon- 1.8

Percentage overhead per call
itors, we measured the effect of running with different 1.6

assertions and also increasing number of assertions [9]. 1.4
In each case we first ran the model without monitors and
1.2
user-code instrumentation to establish the baseline, and
1
then ran several simulations with instrumentation and an
increasing number of copies of the same monitor. The 0.8

results report runtime overhead per monitor as a per- 0.6

centage of the baseline. (For these experiments monitors 0.4
were constructed manually.)
0.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
The first property we checked is a safety Number of monitor calls x 10
5

property asserting that whenever the event
new requests nonfull is notified, the corresponding Fig. 6. Instrumentation overhead per monitor call
queue new planning requests must have space for at as a percentage of the baseline run time. Y axis
is ((instrumentation overhead − baseline)/(baseline ×
least one request. number of monitors)) × 100%

G "new_planning_requests.size() < capacity"
@ new_requests_nonfull.notified
(1)
the above two properties as a percentage of the baseline.
The second property says that the system must propa- Checking Property 2 is relatively expensive because it
gate each request through each channel (or through each requires a lot of communication from the model to the
module) within 5 cycles of the slow clock. This property monitor. It is shown in [1] that the finite state monitors
is a conjunction of 16 bounded liveness assertions similar have less runtime overhead than transaction-based moni-
to the one shown here. tors generated by tools like Synopsys VCS.

... We also evaluated the cost of instrumentation sepa-
// Propagate through module within 5 clock ticks rately by simulating one million SystemC clock cycles
ALWAYS (io_module_receive_transaction($1) -> with focus on the overhead of instrumentation [11]. The
( within [5 slow_clock.pos()]
io_module_send_to_master($2) & ($1 == $2)
average wall-clock execution time of the system over 10
) AND ... runs without instrumentation was 33 seconds. We call
(2)
this “baseline execution”. Fig. 6 shows the cost of the
Fig. 5 presents the runtime overhead of monitoring instrumentation per monitor call, as a percentage of the

41
baseline execution. The data suggest that there is a fixed
cost of the instrumentation, which, when amortized over
more and more calls, leads to lower average cost. The
average cost per call stabilizes after 300,000 calls, and is
less than 0.5 × 10−4 %.
Another testbench we used is an Adder model that
implements the squaring function by repeated increment
by 1. It uses all three kinds of event notifications. It is Fig. 7. The DFW generated by MONASGEN from the LTL
scalable as it spawns a SystemC thread for each addition formula G(a → Xb). All the states are accepting. The DFW
rejects bad prefixes by no available transition. State 0 is the
in the squaring function. We monitored several properties initial state.
of the Adder model using CHIMP.
To study the effect of monitor encoding on runtime
possible transitions from each state are encoded using an
overhead, Tabakov and Vardi [8] describes 33 different
inner if-elseif block, the condition statement being the
monitor encodings, and shows that front det switch en-
guard on a transition.
coding with state minimization and no alphabet mini-
mization is the best in terms of runtime overhead. The As a running example, we show how the monitor step()
downside is that this flow suffers from a rather slow function is encoded for an assertion G(a → Xb) @ clk .
monitor generation and compilation time. This led us to clk here is some boolean expression specifying when the
develop a new flow for the tool, as described below. step() function needs to be called during the simulation.
This assertion asserts that, always, if a is true, then in the
IV. CHIMP MONITORS next clock cycle b has to be true. The DFW generated by
CHIMP for this assertion is shown in Fig. 7.
In CHIMP, the LT L f ormula in every assertion
hLT L f ormulai @ hclock expressioni is converted Listing 1 shows the step() function of the C++ monitor
into a C++ monitor class. Each C++ monitor has a step() generated by CHIMP for the assertion G(a → Xb) @
function that implements the transition function of the clk . The variables current_state and next_state
DFW generated from the LT L f ormula (as described are member variables of the monitor class and store the
in Fig. 9). This step() function is called at every sam- current automaton state and next automaton state respec-
pling point defined by clock expression. The monitor.h tively. If next state becomes −1 after the execution of the
and monitor.cc files, generated by MONASGEN, contains step() function, it means that no transition can be made
one monitor class for every assertion and a class called from the current state. In this case, the monitor outputs
local_observer that is responsible for invoking the ”FAILED”. The initial state 0 of the monitor is assigned
callback function, which invokes step() function of the ap- to the variable current_state inside the constructor
propriate monitor class at the right sampling point during of the monitor class as shown in Listing 2.
the monitored simulation. Different encodings have been
Listing 1. The step() function of the monitor for G(a → Xb)
explored for writing the monitor’s step() function. For the
/∗ ∗
new flow of CHIMP (described below), Fig. 13 shows that
∗ Simulate a step of the monitor .
front det ifelse encoding is the best among all of them
∗/
in terms of runtime overhead.
void
In front det ifelse encoding, the C++ monitor pro- monitor0 : : step ( ) {
duced is deterministic in terms of transitions. This means / / I f t h e p r o p e r t y has not f a i l e d y e t
from one state, either one or no transition is possible. If i f ( s t a t u s == NOT FAILED ) {
no transition is possible from a state, the monitor rejects / / number o f s t e p s e x e c u t e d s o f a r
and outputs ”FAILED”. Else, the monitor keeps executing n u m s t e p s ++;
until the end of simulation and outputs ”NOT FAILED”. / / assign next s t a t e to current s t a t e
In front det ifelse encoding, each state of the monitor current state = next state ;
is encoded as an integer, from 0 upto the total number of / / make n e x t s t a t e i n v a l i d
states. The step() function of the monitor uses an outer n e x t s t a t e = − 1;
if-elseif statement block to determine the next state. The

42
i f ( c u r r e n t s t a t e == 0 ) {
if (!( a) )
{ next state = 0; }
else if (( a) )
{ next state = 1; }
} / / i f ( c u r r e n t s t a t e == 0 )

e l s e i f ( c u r r e n t s t a t e == 1 ) {
i f ( ( b ) && ! ( a ) ) Fig. 8. Old LTL to monitor path in CHIMP
{ next state = 0; }
e l s e i f ( ( a ) && ( b ) )
{ next state = 1; }
} / / i f ( c u r r e n t s t a t e == 1 )
Fig. 9. New LTL to monitor path in CHIMP
/ / FAILED i f no t r a n s i t i o n p o s s i b l e
b o o l n o t s t u c k = ( n e x t s t a t e ! = − 1);
if (! not stuck ) {
property failed (); uses SPOT to convert the LTL formula to a nondeter-
} ministic Büchi Automaton (NBW). Then MONASGEN
} / / i f ( s t a t u s == NOT FAILED ) prunes the NBW to remove all states that do not lead to
} / / step () any accepting state, and generate the corresponding non-
deterministic finite automaton (NFW), see [2]. MONAS-
GEN then uses the Automaton Tool to determinize and
Listing 2. The constructor of the monitor for G(a → Xb) minimize the NFW to generate minimal deterministic
/∗ ∗ finite automaton (DFW). Finally MONASGEN converts
∗ The c o n s t r u c t o r the DFW to C++ monitor.
∗/
monitor0 : : monitor0 ( . . . ) : The main bottleneck in this flow was minimization
sc core : : mon prototype ( ) { and determination of NFW using the Automaton tool as
next state = 0; / / i n i t i a l s t a t e id it consumes 90% of total monitor generation and com-
c u r r e n t s t a t e = − 1; pilation time. Also, the Automaton tool may generate
s t a t u s = NOT FAILED ; multiple edges between two states, resulting in quite large
num steps = 0; monitors. The newest version of CHIMP introduces a
. . . new path as shown in Fig. 9, which bypasses Automaton
} / / Constructor tool completely and uses only SPOT. So the component
Automaton-1.11 in Fig. 1 is not needed by CHIMP any-
The sampling points can be kernel phases, e.g., more. This new path leverages the new functionality of
MON DELTA CYCLE END, or event notification, e.g., SPOT-1.1.1 to convert an LTL formula to a minimal DFW
E.notified (E is an event). In such cases, the OSCI kernel that explicitly rejects all bad prefixes.
needs to communicate with the local_observer at
After replacing the Automaton Tool by SPOT to con-
the right time (when a delta cycle ends or when event E
vert the NBW to minimal DFW, the improvement in
is notified) to call the step() function of the monitor. This
compilation time and monitor size is evident. This new
communication is done by the patch, put on OSCI ker-
flow results in 75.93% improvement in monitor genera-
nel. This patch contains a class called mon_observer,
tion and compilation time. To evaluate the performance of
which communicates with the local_observer class
the revised CHIMP, we use the same set of 162 pattern
on behalf of the OSCI kernel.
formulas and 1200 random formulas as mentioned in [8].
The scatter plot on Fig. 10 shows the comparison of
V. E VOLUTION OF CHIMP
monitor generation and compilation time of the new flow
Fig. 8 shows the old path of generating C++ monitor vs the old flow. Most of the points are above the line with
from an LTL property in CHIMP. First, MONASGEN slope 1, which indicates that for most of the monitors, the

43
generation and compilation time by old CHIMP is more
than by new CHIMP. Fig. 10 - Fig. 14 are all scatter plots2
and interpreted in the same way.
The new flow also merges multiple edges between
two states into one edge guarded by the disjunction of
the guards on all edges between the states. In this way
the average reduction in monitor size in bytes is 61.27%.
Fig. 11 shows the comparison of the size in bytes of the
monitors generated by the new flow vs the old flow.
Since the main focus of CHIMP has always been
minimizing runtime overhead, we need to ensure that this
new flow does not incur more runtime overhead compared
to the old flow as a cost of reduced monitor generation and
compilation time. So we ran the same set of 162 pattern
formulas and 1200 random formulas as mentioned above,
to compare the runtime overhead incurred by the new flow
vs the old flow. Fig. 12 shows that the runtime overhead Fig. 10. Comparison of monitor generation and compilation
of CHIMP using the new flow has been reduced compared time of the old CHIMP vs new CHIMP
to the old flow. The reduction is 7.97% on average.
As CHIMP has evolved to follow a different and more
efficient path and the monitors have been reduced in size,
it was not clear a priori which monitor encoding is the
best. We conducted the same experiment as in [8] with
the same set of formulas, 162 pattern formulas and 1200
random formulas, to identify the best encodings in terms
of both runtime overhead and monitor generation and
compilation time. We identified two new best encodings.
Fig. 13 shows that the new best encoding in terms of
runtime overhead is now front det ifelse. Fig. 14 shows
that the new best encoding in terms of monitor genera-
tion and compilation time is back ass alpha. CHIMP
now provides two configurations for monitor generation,
best runtime, which has minimum runtime overhead
and best compiletime, which has minimum monitor
generation and compilation time. Since the bigger concern
is usually runtime overhead, best runtime is the default
Fig. 11. Comparison of monitor size (bytes) generated by the
configuration, given that its monitor generation and com- old CHIMP vs new CHIMP
pilation time is very close to that of best compiletime.

VI. C ONCLUSION we plan to look at the possibility of verifying parametric
We present CHIMP, an Assertion-Based Dynamic Ver- properties, for example G(send(id) → F (receive(id)))
ification tool for SystemC models, and show that it puts where one can pass a parameter (here id), to the variables
minimal overhead on the runtime of the MUV. We show in the LTL formula of the assertion. The above property
how the new path in CHIMP results in significant re- means that always if the message with ID id is sent, it
duction of monitor generation and compilation time and should be received eventually. Also at present the user
monitor size, as well as runtime overhead. In the future needs to know the system architecture to declare an asser-
tion. We would like to make CHIMP work with assertions
2
http://en.wikipedia.org/wiki/Scatter plot that are declared in the elaboration phase.

44
Fig. 13. Comparison of runtime overhead of front det ifelse
encoding vs all other encodings
Fig. 12. Comparison of runtime overhead incurred by old
CHIMP vs new CHIMP

R EFERENCES
[1] R. Armoni, D. Korchemny, A. Tiemeyer, M. Vardi, and Y. Zbar.
Deterministic dynamic monitors for linear-time assertions. In
Proc. Workshop on Formal Approaches to Testing and Run-
time Verification, volume 4262 of Lecture Notes in Computer
Science. Springer, 2006.
[2] M. d’Amorim and G. Ro¸su. Efficient monitoring of ω-
languages. In Proc. 17th International Conference on Com-
puter Aided Verification, pages 364–378, 2005.
[3] A. Duret-Lutz and D. Poitrenaud. SPOT: An extensible model
checking library using transition-based generalized Büchi au-
tomata. Modeling, Analysis, and Simulation of Computer
Systems, 0:76–83, 2004.
[4] C. Helmstetter, F. Maraninchi, L. Maillet-Contoz, and M. Moy.
Automatic generation of schedulings for improving the test
coverage of Systems-on-a-Chip. In FMCAD ’06: Proceedings
of the Formal Methods in Computer Aided Design, pages 171–
178, Washington, DC, USA, 2006. IEEE Computer Society.
[5] A. Møller. dk.brics.automaton. Fig. 14. Comparison of monitor generation time and com-
http://www.brics.dk/automaton/, 2004. pile time overhead of back ass alpha encoding vs all other
[6] O. Spinczyk, A. Gal, and W. Schröder-Preikschat. AspectC++: encodings
an aspect-oriented extension to the C++ programming lan-
guage. In CRPIT ’02: Proceedings of the Fortieth Interna-
tional Conference on Tools Pacific, pages 53–60, Darlinghurst, [10] D. Tabakov, M. Vardi, G. Kamhi, and E. Singerman. A
Australia, Australia, 2002. Australian Computer Society, Inc. temporal language for SystemC. In FMCAD ’08: Proc. Int.
[7] D. Tabakov. Dynamic Assertion-Based Verification for Sys- Conf. on Formal Methods in Computer-Aided Design, pages
temC. PhD thesis, Rice University, Houston, 2010. 1–9. IEEE Press, 2008.
[8] D. Tabakov, K. Rozier, and M. Y. Vardi. Optimized temporal [11] D. Tabakov and M. Y. Vardi. Automatic aspectization of
monitors for SystemC. Formal Methods in System Design, SystemC. In Proceedings of the 2012 workshop on Modularity
41(3):236–268, 2012. in Systems Software, MISS ’12, pages 9–14, New York, NY,
[9] D. Tabakov and M. Vardi. Monitoring temporal SystemC USA, 2012. ACM.
properties. In Proc. 8th Int’l Conf. on Formal Methods and
Models for Codesign, pages 123–132. IEEE, July 2010.

45
Abstraction-Based Livelock/Deadlock Checking for
Hardware Verification

In-Ho Moon and Kevin Harer
Synopsys Inc.
{mooni, kevinh}@synopsys.com

ABSTRACT has any incoming edges to the SCC in the state transi-
Livelock/deadlock is a well known and important problem tion graph representing a hardware design. However, even
in both hardware and software systems. In hardware veriﬁ- though the method in [27] is an improved method from its
cation, a livelock is a situation where the state of a design previous work [13] in symbolic approaches, it is still infea-
changes within only a smaller subset of the states reachable sible to apply the method to the industrial designs, simply
from the initial states of the design. Deadlock is a special due to the capacity problem of symbolic methods. In gen-
case in which there is only one state in a livelock. However, eral, any BDD-based method can handle only up to several
livelock/deadlock checking has never been actively used in hundred latches without any abstraction or approximation
hardware veriﬁcation in practice, mainly due to the com- techniques, whereas there can be millions of latches in in-
plexity of the computation which involves ﬁnding strongly dustrial designs.
connected components. In this paper, we ﬁrst present an improved BDD-based
This paper presents a practical abstraction-based live- algorithm ﬁnding TSCCs from[27] in the following aspects.
lock/deadlock checking algorithm for hardware veriﬁcation. First, initial state is taken into account in ﬁnding TSCCs.
The proposed livelock/deadlock checking works on FSMs Especially, the improved algorithm handles multiple initial
rather than the whole design. For each FSM, we make an states eﬃciently. Secondly, unreachable TSCCs are distin-
abstract machine of manageable size from the cone of inﬂu- guished from reachable TSCCs which are more interesting to
ence of the FSM. Once a livelock is found on an abstract designers. Thirdly, we provide more intuitive state classiﬁ-
machine, the livelock is justiﬁed on the concrete machine cation as main, transient, and livelock groups as opposed to
with trace concretization. Experimental results shows that transient and recurrence classes in [27]. In our classiﬁcation,
the proposed abstraction-based livelock checking ﬁnds real a set of transient states is further classiﬁed into main and
livelock errors in industrial designs. transient groups. Recurrence class in [27] is mapped into
livelock group in our classiﬁcation. Main group is an SCC
1. INTRODUCTION that contains the initial state. Transient group is a set of
Livelock/deadlock is a well known and important problem states that belong to neither main nor livelock group. There
in both hardware and software systems. In hardware veriﬁ- is one or zero main group in a design per one initial state.
cation, a livelock is a situation where the state of a design This paper also presents a practical approach for checking
changes within only a subset of the states reachable from livelock using abstraction techniques. The proposed live-
the initial states of the design. In a state transition graph, a lock checking works on FSMs(Finite State Machines)2 rather
livelock is a set of states from which there is no path going than the whole design. For each FSM, we make an abstract
to any other states that are reachable from the initial state. machine (by localization reduction [15]) of manageable size
Since deadlock is a special case in which there is only one by the improved BDD method from the COI(Cone of Inﬂu-
state in a livelock, deadlock checking can be done by livelock ence) of the FSM. Once a livelock is found on an abstract
checking. Thus, livelock implies both livelock and deadlock machine, the livelock is justiﬁed on the concrete machine
in this paper. However, livelock checking1 has never been with trace concretization using SAT (Satisﬁability[9]) and
actively used in hardware veriﬁcation in practice, mainly due simulation. When there is no livelock on the abstract ma-
to the complexity of the computation which involves ﬁnd- chine, there is no guarantee for no livelock on the concrete
ing SCCs (Strongly Connected Components). Thus, livelock machine. However, the bigger the abstract size is, the more
checking has been on hardware designer’s wish list to verify conﬁdence we have that no livelock exists on the concrete
their designs. machine. The key beneﬁt of this abstraction-based livelock
There have been many approaches on ﬁnding SCCs [13, checking is that it enables ﬁnding real livelock groups that
27, 28, 3, 20, 12]. Among these work, Xie and Beeral pro- cannot be found by tackling whole design directly.
posed a symbolic method ﬁnding terminal SCCs (in short, Once an FSM is given, its COI is ﬁrst computed. Then,
TSCCs) using BDDs (Binary Decision Diagrams [4]) in [27]. an abstract machine is computed by ﬁnding N influential
In a state transition graph, a TSCC is an SCC that does latches from the COI. Inﬂuential latches are the latches that
not have any outgoing edges to any state outside the SCC. are likely related with the FSM. N is either pre-deﬁned or a
Thus, a TSCC becomes a livelock group when the TSCC user-deﬁned number of latches in the abstract machine, or
1 2
Livelock checking is diﬀerent from liveness checking and FSMs are either automatically extracted [26] or any sets of
the diﬀerence will be explained in Section 2.3. sequential elements that are user-speciﬁed.

46
gradually increased. In general, N is up to a few hundred algorithms. Ravi et al. [20] provided a taxonomy of those
latches. Inﬂuential latches are computed mainly by approx- symbolic algorithms. One is SCC-hull algorithms (without
imate state decomposition [6]. However, in many cases, the enumerating SCCs) [11, 14, 24], and the other is SCC enu-
size of COIs is too big for even approximate state decom- meration algorithms [28, 3, 12, 20]. The details are in [20].
position. Thus, a structural abstraction is applied by using
connectivity and sequential depth before approximate state 2.2 Finding TSCCs
decomposition. This structural abstraction reduces the COI Even though TSCCs are a subset of SCCs in the states of a
to a manageable size by approximate state decomposition. design, the algorithms for ﬁnding TSCCs can be signiﬁcantly
There is another important hardware property called tog- optimized since not all SCCs are of interest.
gle deadlock. A state variable has a toggle deadlock if the This section recapitulates the work on ﬁnding TSCCs by
state variable initially toggles and the state variable becomes Xie and Beeral [27]. This algorithm classiﬁes all states into
a constant after a certain number of transitions. However, either recurrence or transient class. Recurrence class is a set
notice that this is not a constant variable since it initially of TSCCs and the rest belongs to transient class. Let S be
toggles. Toggle deadlock may or may not happen depending the set of states. With i, j ∈ S, i → j denotes that there is
on input stimuli in simulation. Therefore, toggle deadlock is at least one path from i to j. Deﬁnition 1 deﬁnes forward
also an important property to check with formal approaches. set and backward set of a state.
Experimental results shows that the proposed abstraction- Definition 1. The forward set of state i ∈ S, denoted by
based approach ﬁnds real livelock and toggle deadlock errors F (i), is the set of states that have a path from i. That is,
from industrial designs. F (i) = {j ∈ S | i → j}. Similarly, the backward set of state
The contributions of this paper are in the three aspects. i, denoted by B(i), is the set of states that have a path to i.
• Improved algorithm for livelock checking That is, B(i) = {j ∈ S | j → i}.
The proposed algorithm improved the existing algo- Lemma 1. Let i, j ∈ S. If j ∈ F (i), then F (j) ⊆ F (i).
rithm [27] in many aspects, such as providing new Similarly, if j ∈ B(i), then B(j) ⊆ B(i).
state classiﬁcation with initial state, handling multi-
ple initial states, reﬁning the search space eﬃciently Theorem 1. A state i ∈ S is recurrent if and only if
with care states, early termination, and trimming out F (i) ⊆ B(i). In other words, i is transient if and only if
transient states. F (i) * B(i).
• Abstraction-based livelock checking Theorem 2. If state i ∈ S is transient, then states in
This paper presents theories and an implementation B(i) are all transient. If state i is recurrent, on the other
on abstraction-based livelock checking to handle large hand, states in F (i) are all recurrent. In the latter case, set
designs in practice. F (i) is a recurrence class, and set B(i)\F (i) (if not empty)
• Toggle deadlock checking contains only transient states.
To the best of our knowledge, this paper presents the Lemma 1, Theorem 1 and 2 are from [27]. Lemma 1 shows
ﬁrst method to solve toggle deadlock problem. a subset relation between two forward sets as well as two
The remainder of this paper is organized as follows. Sec- backward sets when j is in either F (i) or B(i). Theorem 1
tion 2 brieﬂy recapitulates ﬁnding SCCs and TSCCs, and and 2 show how a state is determined whether the state
describes related work. Section 3 describes our improved al- belongs to either recurrence or transient class. Based on
gorithm for ﬁnding TSCCs. Section 4 explains how livelock Theorem 1 and 2, all TSCCs can be found by performing
is checked on FSM using abstraction. Section 5 describes forward and backward reachability iteratively. The detailed
how to check toggle deadlocks. Experimental results are algorithm can be found in [27] and our improved algorithm is
presented and discussed in Section 6. We conclude with described in Section 3.2 with the comparisons to the original
Section 7. algorithm.
2. PRELIMINARIES 2.3 Related work
2.1 Finding SCCs There are two types of properties in model checking; safety
Given a graph, G = (V, E) where G is an inﬁnite transition and liveness properties [16]. A safety property represents
system of the Kripke structure [8], V is a ﬁnite set of states ’something bad never happens’, whereas a liveness property
and E ⊆ V × V is the set of edges, a strongly connected represents ’something good eventually happens’. Liveness
component (SCC) is a maximal set of state U ⊆ V such checking with a liveness property can be performed by ﬁnd-
that for every pair (u, v) ∈ U , u and v are reachable from ing SCCs [20]. Liveness checking can also be performed by
each other, that is, u is reachable from v and v is reachable safety checking with proper transformations [1].
from u [27, 28]. Livelock checking is diﬀerent from liveness checking in the
Finding SCCs has a variety of applications in formal ver- sense that liveness checking requires a liveness property to
iﬁcation such as Buchi emptiness [11], LTL model check- work on a design, whereas livelock checking does not require
ing [25], CTL model checking with fairness constraints [10], any property and works on a design directly.
Liveness checking [16, 1], and so on. There have been many publications on ﬁnding all SCCs [24,
The traditional approach to ﬁnd SCCs is to use Tarjan’s 28, 3, 12]. Even though all TSCCs can be found by any of
method [23]. Since this method manipulates the states of the these approaches on ﬁnding all SCCs, it is not necessary to
graph explicitly, even though it runs in linear time in the size ﬁnd all SCCs for ﬁnding all TSCCs since we are interested
of the graph, the size of the graph grows exponentially as in ﬁnding only all TSCCs for livelock checking.
the number of state variables grows. Hachtel et al. [13] proposed a symbolic approach to ﬁnd
To overcome this state explosion problem in explicit al- all recurrence classes concurrently identifying all TSCCs by
gorithms, there have been many publications on symbolic computing transitive closure [17] on the transition graph

47
with the Markov chain. Due to the complexity of transi- states, the reachable states are a through i inside the dot-
tive closure, this approach takes signiﬁcantly more time and ted rectangle. The unreachable states are j through o out-
memory than a reachability-based approach does. side the dotted rectangle. There are ﬁve SCCs that are
Qadeer et al. [19] proposed an algorithm to ﬁnd single {a, b, c}, {e, f, g}, {h, i}, {j, k, l}, and{m, n, o}. Since a is the
TSCC in the context of safe replacement in sequential equiv- initial state, {a, b, c} becomes the main group. {h, i} and
alence checking [21, 22]. In this approach, multiple TSCCs {m, n, o} are TSCCs and only {h, i} is a livelock group (reach-
are not considered. able STSCC) since it is reachable from a. {m, n, o} is called
Xie and Beeral proposed a reachability-based algorithm an unreachable TSCC. The rest states, {d, e, f, g, j, k, l}, be-
to ﬁnd all TSCCs iteratively [27]. This is also a symbolic long to the transient group in which the states are contained
approach that outperforms the method in [13]. However, in neither the main group nor the TSCCs.
this approach does not consider initial states.
None of the above previous work on ﬁnding TSCCs was
3.2 Finding Livelock
used in real designs in practice, due to the design sizes. Our We ﬁrst deﬁne transition relation in Deﬁnition 3 to explain
abstraction-based approach is the ﬁrst in publication to han- our algorithms to check livelock.
dle large designs in practice. Definition 3. Let x = {x1 , . . . , xn }, y = {y1 , . . . , yn },
Case et al. [5] proposed a method ﬁnding transient sig- and w = {w1 , . . . , wp } be sets of variables ranging over B =
nals using ternary simulation. A transient signal is a toggle {0, 1}. A (finite state) machine is a pair of boolean functions
deadlock on over-approximate reachable states. The toggle ⟨Q(x, w, y), I(x)⟩, where Q : B 2n+p → B is 1 if and only if
deadlock checking in this paper ﬁnds transients signals in there is a transition from the state encoded by x to the state
exact reachable states. encoded by y under the input encoded by w. I : B n → B is
1 if the state encoded by x is an initial state. Q(x, w, y) is
3. IMPROVED LIVELOCK CHECKING called transition relation. The sets x, y, and w are called the
3.1 State Classification present state, next state, and input variables, respectively.
The state classiﬁcation in [27] consists of one transient The procedure ComputeF orwardSet in Figure 2 is a mod-
class and one or more recurrence classes. However, in hard- iﬁed version of the procedure f orward set in [27] in order
ware veriﬁcation, initial states are given to verify the hard- to compute forward set of a given state s only within the
ware behavior only in reachable state space. One problem of given care states careSet in the procedure and to perform
the state classiﬁcation in [27] is that there is no distinction early termination when stop is not ZERO (empty BDD). ⇓
between reachable TSCCs and unreachable TSCCs from the represents a restrict operator [7] that is used to minimize
initial states. Also, the reachable TSCCs may vary depend- the transition relation with respect to careSet in Line 2.
ing on initial states. The minimized transition relation is denoted by Q̃. In Line
We propose a new state classiﬁcation that is shown in Fig- 7, y ← x represents that y variables are replaced by x vari-
ure 1, assuming that there is one single initial state. Han- ables by BDD substitution. Early termination is another
dling multiple initial states is explained in Section 3.3. big diﬀerence from f orward set in [27] and is used in Fig-
Definition 2. STSCC is a sink TSCC that has incoming ure 3. This is to bail out computing forward set as soon as
edges from any states outside the TSCC. any newly reached state intersects with the states in stop
We ﬁrst deﬁne sink TSCC (in short, STSCC) in Deﬁni- as in Line 11. BddIteConstant is a BDD ITE(if-then-else)
tion 2. The new state classiﬁcation consists of main group, operation without creating a new BDD node. O is an array
transient group, livelock groups (reachable STSCCs) and of states to store newly reached states at each iteration and
unreachable TSCCs for a given initial state. The transient O is called onion rings. These onion rings are used later in
class in [27] is further classiﬁed into main group or tran- Section 3.3. ComputeF orwardSet returns the forward set
sient group. Main group is an SCC containing the initial F (s) and the onion rings O.
state and there exists either one or no main group. The ComputeForwardSet(Q, careSet, s, stop) {
recurrence classes in [27] are further classiﬁed into livelock 1 F (s) = ZERO;
groups (reachable STSCCs) and unreachable TSCCs. When 2 Q̃(x, w, y) = Q(x, w, y) ⇓ careSet;
there is no livelock, there exists only one SCC which is the 3 f rontier(x) = s;
4 Put s in O;
main group. 5 while (f rontier(x) ̸= ZERO) {
6 image(y) = ∃x, w. Q̃(x, w, y) ∧ f rontier(x);
Unreachable TSCC 7 image(x) = image(y)|y←x ;
8 F (s) = F (s) ∨ image(x);
9 f rontier = image(x) ∧ ¬F (s);
k l m n 10 Put f rontier in O;
11 if (BddIteConstant(f rontier, stop, ZERO) != ZERO)
j o 12 break;
Reachable States 13 }
14 return (F (s), O);
c g i }
Figure 2: Computing forward set.
a b d e f h
Livelock Group ComputeBackwardSet is a dual procedure to Compute
Main Group Transient Group (Reachable STSCC) F orwardSet, except not using stop and not computing the
onion rings O.
Figure 3 is a procedure for ﬁnding TSCCs from the given
Figure 1: State classification.
set of states S. The procedure F indT SCCs is a modiﬁed
In Figure 1, there are states a through o and a is the version of the procedure State classif ication in [27]. The
initial state that is marked with thick circle. Among all modiﬁed procedure utilizes care states careSet, assuming S

48
is not necessarily all state space. T is a set of transient states Line 11. TR represents the set of transient states that are
in S, and R is an array of TSCCs in S. P ickOneState in reachable from s. In Line 12, R and TR are computed by
Line 5 picks a random state from careSet as a seed state to calling F indT SCCs with careSet. If s ∈ / M (means that
ﬁnd a TSCC. In Line 7, early termination is used in comput- the main group is empty), s is added to TR in Line 13-14.
ing the forward set F (s), by setting stop in Figure 2 as the TU represents the set of transient states that are unreachable
negation of B(s). This is because while we compute F (s) from s and TU is computed in Line 15. T is computed by
within B(s) for the state s, once any state outside B(s) union of TR and TU in Line 16.
is reachable from s, all states in B(s) are transient. An-
other big diﬀerence is trimming transient states in Line 12 3.3 Multiple Initial States
and 16. T rimT ransient(Q, careSet, T, dir) trims out the It is possible for a design to have multiple initial states
transient states from the current care states by the given when some of the state variables do not have concrete initial
direction (dir) that is either PREFIX, SUFFIX, or BOTH. values. In the presence of multiple initial states, ﬁnding
PREFIX(SUFFIX) means to trim out the lasso preﬁx(suﬃx) livelock groups has to be devised correctly to avoid false
states. This is the same technique used in ﬁnding SCCs [20]. positives and redundant computations.
Finally, F indT SCCs returns R (a set of TSCCs) and T (a Figure 5 shows an example with multiple initial states.
set of transient states). In this example, there are six states, S = {a, b, c, d, e, f }.
FindTSCCs(Q, S) { There are two SCCs, {a, b, c} and {d, e, f }. We can see
1 R = { }; that {d, e, f } is a TSCC. a and d are initial states, I =
2 T = ZERO; {a, d}, as shown with thick circles. Suppose that we com-
3 careSet = S;
4 while (careSet ̸= ZERO) { pute livelock by calling F indLivelock(Q, S, I). Then, we
5 s = PickOneState(careSet); get F (I) = B(I) = M = {a, b, c, d, e, f } and R = {} which
6 B(s) = ComputeBackwardSet(Q, careSet, s); is not correct since there is a reachable TSCC. Now, let
7 F (s) = ComputeForwardSet(Q, careSet, s, ¬B(s));
8 if (F (s) ⊆ B(s)) { us try to call F indLivelock for each single initial state.
9 R = R ∪ F (s); First for the initial state a, we get F (a) = {a, b, c, d, e, f }
10 T = T ∨ (B(s) ∧ ¬F (s)); and B(a) = {a, b, c}. This gives us Ma = {a, b, c} and
11 careSet = careSet ∧ ¬B(s);
12 TrimTransient(Q, careSet, T , PREFIX); Ra = {d, e, f }. There is a livelock group Ra for the initial
13 } else { state a. Now, for the initial state d, F (d) = {d, e, f } and
14 T = T ∨ (s ∨ B(s)); B(d) = {a, b, c, d, e, f }. This gives us Md = {d, e, f } and
15 careSet = careSet ∧ ¬(s ∨ B(s));
16 TrimTransient(Q, careSet, T , BOTH);
Rd = {} and TU = {a, b, c}. There is no livelock group for
17 } the initial state d. Therefore, we can see that livelock check-
18 } ing has to be applied for each single initial state separately
19 return (R, T );
}
in the presence of multiple initial states.
Figure 3: Finding TSCCs.
FindLivelock(Q, S, s) {
c f
1 (F (s), O) = ComputeForwardSet(Q, S, s, ZERO);
2 B(s) = ComputeBackwardSet(Q, S, s);
3 reached = F (s) ∨ s; a b d e
4 if (F (s) ⊆ B(s)) {
5 M = F (s);
6 R = { }; Figure 5: Multiple initial states.
7 T = ZERO;
8 } else {
9 M = F (s) ∧ B(s);
Theorem 3. When there are two initial states (i0 and
10 careSet = F (s) ∧ ¬(M ∨ s); i1 ), if i1 is included in the reached states from i0 , the livelock
11 TrimTransient(Q, careSet, T , PREFIX); groups from i1 are a subset of the livelock groups from i0 .
12 (R, TR ) = FindTSCCs(Q, careSet);
13 if (s ∈
/ M) Proof. Since i1 is included in the reached states from
14 TR = TR ∨ s; i0 , i1 is in either main, transient, or livelock groups from
15 TU = B(s) ∧ ¬M ;
16 T = TR ∨ TU ;
i0 . When i1 is in the main group, the same livelock groups
17 } from i1 are obtained. When i1 is in the transient group, all
18 return (M, R, T, reached, O); or a subset of the livelock groups i1 is obtained. When i1
}
is in one of the livelock groups, the livelock group including
Figure 4: Finding livelock. i1 becomes the main group from i1 , and no livelock group
Figure 4 shows the procedure to perform our new state exists from i1 since the other livelock groups from i0 become
classiﬁcation. As explained in Section 3.1, we ﬁnd main unreachable TSCCs from i1 . From the above three cases, no
group (M ), transient group (T ), and livelock groups (R) new livelock group is obtained from i1 compared to the ones
from the given initial state (s) within the given care states from i0 . Therefore, the livelock groups from i1 are a subset
(S). F indLivelock starts computing forward set F (s) and of the livelock groups from i0 .
backward set B(s) in Line 1 and 2. In Line 3, reached is the Theorem 3 says that when there is large number of initial
reached states from s in S. If F (s) ⊆ B(s) in Line 4, there states, we can skip livelock checking for any initial states
is no livelock in S. In this case, F (s) becomes the main that are already in the forward sets of other initial states.
group and both R and T are set to empty in Line 5-7. If In Figure 5, livelock checking for the initial state d can be
F (s) * B(s) in Line 8, there must exist at least one livelock skipped because of d ∈ F (a), assuming that a is used ﬁrst.
group. In this case, M is computed by intersecting F (s) and However, there is an order dependency on which initial state
B(s) in Line 9. careSet is set to a subset of F (s) in Line is used ﬁrst. If d is used ﬁrst, we still need to run live-
10. The lasso preﬁx states in careSet are trimmed out in lock checking with a. In practice, the number of calls to

49
F indLivelock is greatly reduced because of Theorem 3 in propose a framework for abstraction-based livelock checking
the presence of multiple initial states. on an abstracted COI of the FSM. Once we ﬁnd a livelock on
Figure 6 is the top-level procedure that checks livelock the abstract machine, we justify whether the livelock exists
with multiple initial states. CheckLivelock takes transition on the concrete machine. Notice that a livelock on the ab-
relation(Q), a set of states(S), a set of initial states(I), and stract machine can be mapped into more than one livelock
a concrete machine(C) as procedure inputs. The use of C on the concrete machine.
is explained in Section 4. CheckLivelock ﬁrst ﬁnds live- Figure 7 shows how an abstract machine is obtained from
lock groups in the reachable states in Line 1-17 and then it the COI of an FSM. Suppose an FSM that has two state
ﬁnds TSCCs in the unreachable states in Line 18-23. The variables f and g. Then, we compute the COI of the FSM.
while loop (Line 6-17) performs livelock checking for a cur- Suppose that there are state variables {a, b, c, d, e} in the
rent initial state s until all initial states are covered with COI of the FSM. The size of the abstract machine is pre-
iteration index k. For this, remaining is initially set to deﬁned and let us suppose that the size is N . Then, a set
I in Line 3 and updated by eliminating the newly reached of inﬂuential latches from the COI is computed from the
states reachedk from remaining in Line 13. reached is the FSM variables. The minimum abstract machine is the FSM
reached states from all initial states. reached is initially set itself and the maximum abstract machine is the concrete
to ZERO in Line 1 and updated by adding reachedk that is machine. In this example, N =4 and we get the abstract
the reached states from s in Line 12. Then, the next initial machine {f, g, d, e}.
state is chosen from remaining in Line 15. TU is the union
of unreachable transient states from each initial state. TU is Abstract Machine
initially set to ZERO in Line 2 and updated by adding the COI of FSM
FSM
unreachable states of Tk in Line 14. For the current initial a
d f
state s, F indLivelock is called in Line 7. |Rk | represents b
e g
the number of livelock groups in Rk in Line 8. For each c

Rkj , a trace tracejk is generated in Line 9 and the livelock
Concrete Machine
is reported with the trace in Line 10. Generating trace is
explained in Section 4.2 and reporting livelock is explained Figure 7: Abstract machine.
in Section 4.3. Theorem 4. If any state in a livelock group on an ab-
CheckLivelock(Q, S, I, C) { stract machine is reachable from the initial state on the con-
1 reached = ZERO;
2 TU = ZERO; crete machine, the livelock exists on the concrete machine.
3 remaining = I; Proof. Since the abstraction is an over-approximation,
4 k = 0; the set of all transitions on the abstract machine is a superset
5 s = PickOneState(I);
6 while (s ̸= ZERO) {
of the set of all transitions on the concrete machine. Since
7 (Mk , Rk , Tk , reachedk , Ok ) = FindLivelock(Q, S, s); there is no path from any state in the livelock group to any
8 for (j = 0; j < |Rk |; j++) { state in the main group on the abstract machine, there is
9 tracejk = GenerateTrace(C, Rk j
, s, Ok ); still no path from any projected states of the livelock group
10 ReportLivelock(s, Mk , Rk , Tk , tracejk );
j

11 }
on the concrete machine to any projected states of the main
12 reached = reached ∨ reachedk ; group on the concrete machine. Now, suppose that the live-
13 remaining = remaining ∧ ¬reachedk ; lock does not exist on the concrete machine. In order for the
14 TU = TU ∨ (Tk ∧ ¬reachedk );
15 s = PickOneState(remaining);
livelock not to exist on the concrete machine, the only con-
16 k++; dition is that there is no path from the projected main group
17 } to the projected livelock group on the concrete machine. In
18 careSet = ¬(reached ∨ TU ); other words, the projected livelock group has to be unreach-
19 if (careSet ̸= ZERO) {
20 Rk = FindTSCCs(Q, careSet); able from the initial state. However, this contradicts the
21 for (j = 0; j < |Rk |; j++) assumption that any state of the livelock group is reachable
j
22 ReportUnreachLivelock(Rk ); from the initial state on the concrete machine. Therefore,
23 }
}
the livelock group still exists on the concrete machine.
Figure 6: Checking livelock. Thanks to Theorem 4, this abstraction-based livelock ﬁnds
Once all reachable livelock groups are found, we next ﬁnd a livelock on small abstract machine using BDD-based sym-
unreachable TSCCs. We set the care states careSet to the bolic method, then justiﬁes the existence of the livelock on
negation of all visited states so far in Line 18, then call the concrete machine by trace concretization in Section 4.2,
F indT SCCs with careSet in Line 20. If there is any un- by using SAT techniques that can handle large designs. The
reachable TSCC, the TSCC is reported in Line 22. abstraction-based livelock checking is an incomplete method
in the sense that it does not provide the proof of no livelock
4. LIVELOCK CHECKING ON FSM unless the checking is performed on a concrete machine. No
To check whether a livelock exists in a design or not, the livelock on an abstract machine does not guarantee no live-
checking should be done on the whole design. However, this lock on the concrete machine. However, the abstraction-
is infeasible due to the size of the design in practice. Thus, based livelock checking enables ﬁnding real livelock errors
we propose a practical method for checking livelock on FSMs on industrial large designs.
on the design.
Even when we check livelock on an FSM, the entire COI 4.1 Causality Checking
logic of the FSM must be considered in order to get an ex- Let V be the set of state variables in an abstract machine
act result on livelock. However, this is still computationally for livelock checking. Suppose that R(V ) is the reached
very expensive or not feasible, in most real designs. Thus, we states in the abstract machine and L(V ) is a livelock group

50
containing a TSCC. Also, suppose that v is a state variable using the onion rings Ok in Figure 6. The second step is to
in V . We are interested in whether v contributes to the generate an abstract trace. Starting from the target state,
livelock as in Deﬁnition 4. This is called variable causality. an abstract trace can be computed by applying BDD-based
Definition 4. When a livelock exists in the abstract ma- pre-image computation iteratively until the initial state is
chine, a variable v in V contributes to the livelock if the live- reached. The third step is to generate a concrete trace by
lock disappears by eliminating v from the abstract machine. making a BMC (Bounded Model Checking [2]) problem from
In other words, there is no livelock in another abstract ma- the abstract trace, in order to see whether the livelock group
chine that is composed of the variables, V \v. is reachable on the concrete machine. An eﬃcient approach
for concretization was proposed in [18]
Equation 1 shows a condition for existence of livelock.
L(V ) ⊂ R(V ) (1) 4.3 Reporting Livelock
Now, let R̃ be the quantiﬁed reached states and L̃ be the Once a concrete trace is generated for a livelock group,
quantiﬁed livelock states with respect to a state variable v, the livelock is real on the concrete machine. We report the
as shown in Equation 2 and 3. livelock group with the state classiﬁcation mentioned in Sec-
tion 3.1. A livelock group is reported with its initial state,
R̃(V \v) = ∃v . R(V ) (2)
the main group, transient group, and the unreachable states
L̃(V \v) = ∃v . L(V ) (3) in terms of the number of states and the percentage in each
Then, it is determined by Equation 4 to check whether group on the abstract machine.
the variable v contributes to the livelock. Theorem 5 says By looking at the transient and livelock groups, we can
that if Equation 4 holds, v contributes to the livelock. see what fraction of the state space is in problematic zone.
A good design is expected to have only one main group per
L̃(V \v) ⊂ R̃(V \v) (4) one initial state without any transient and livelock groups,
Theorem 5. When a livelock group is found on an ab- unless the design has an intended reset sequence to a normal
stract machine (L(V ) ⊂ R(V )), if L̃(V \v) ⊂ R̃(V \v) holds mode.
for a variable v, the variable v contributes to the livelock.
Proof. Let M1 be the machine consisting of V and sup- 5. TOGGLE DEADLOCK CHECKING
pose that a livelock group exists in M1 . Let M2 be the ma- There is another important design property, called toggle
chine consisting of (V \v) by eliminating v from M1 . Also, deadlock that is related to livelock. A livelock may occur for
let T1 (T2 ) be the set of transitions in M1 (M2 ), respectively. multiple state variables of a design, whereas a toggle dead-
Since M2 is an over-approximated machine from M1 , M2 has lock may occur on a single state variable. A state variable
more transitions than M1 (T1 ⊂ T2 ). Let Td be the diﬀer- has a toggle deadlock if the variable initially toggles, but
ence between T1 and T2 . If there is any transition (in Td ) the variable gets stuck at a constant value after a certain
that makes a path from any state in the livelock to any state number of cycles.
in the main group in M2 , the livelock group merges into the Figure 8 shows an example of toggle deadlock. There are
main group and both groups become a single SCC, yielding two state variables {a, b} and four states {s0 , s1 , s2 , s3 } as in
L̃(V \v) = R̃(V \v). Thus, M2 becomes a machine without the example. Provided that s0 is the initial state, the main
the livelock. This means that v is a necessary variable to group is {s0 , s1 } and the livelock group is {s2 , s3 }. Once the
have the livelock in M1 . Therefore, if L̃(V \v) = R̃(V \v), v state transition reaches to s2 that is a state in the livelock
contributes to the livelock. group, the value of b gets stuck at 1, whereas a still toggles.
This causality checking can also be applied to a set of Thus, we say that b has a toggle deadlock.
variables, especially with FSM variables, in order to report a=0, b=0 a=0, b=1
whether the livelocks are related with the FSM. Let F be the s0 s3
set of variables in an FSM and C be the set of variables in the
COI of the FSM. Suppose that R(F, C) is the reached states
s1 s2
in the abstract machine and L(F, C) is a livelock group con-
a=1, b=0 a=1, b=1
taining a TSCC. The quantiﬁed reached states and the quan-
tiﬁed livelock states are computed in Equation 5 and 6 with Figure 8: Toggle deadlock.
respect to the FSM variables, respectively.
R̃(C) = ∃F . R(F, C) (5) Theorem 6. If there is no STSCC in a design, there is
no toggle deadlock on any variable.
L̃(C) = ∃F . L(F, C) (6)
Proof. To be a toggle deadlock, a variable is supposed to
Then, Equation 7 shows the causality checking with the toggle at a cycle and to hold the value forever from the cycle.
FSM variables to check whether the FSM variables con- No STSCC implies that there is only main group in the
tribute to the livelock. design. If a variable appears as constant in the main group,
L̃(C) = R̃(C) (7) the variable is a constant. However, the main group does not
have any preﬁx behavior. This means it is not possible for
4.2 Trace concretization the variable to get toggled before the main group. Therefore,
Once a livelock group is found on an abstract machine, we no STSCC implies no toggle deadlock.
need to justify whether the livelock group is reachable on the
concrete machine. This can be done by the following three Theorem 6 shows that toggle deadlock occurs in the pres-
steps. The ﬁrst step is to pick a target state in the livelock ence of a livelock. It is also possible that there is no toggle
group. The target state is chosen randomly from the livelock deadlock on a design that has a livelock. Thus, toggle dead-
group, but is one of the closest states to the initial states by lock on a state variable can be computed by two steps. First,

51
Statistics Results
Design L I F T COI N Llk Dlk New1 New2 TraceGen
Time Mem Ops Time Mem Ops Time Len
D1 1163 1330 7 - 632 30 0 - 16:20 107.1 44 15:31 107.9 41 - -
60 0 - 41:11 136.7 66 41:31 137.7 66 - -
90 0 - 2:30:33 224.1 81 2:24:50 224.3 81 - -
120 - - time-out (> 24h) time-out (> 24h) - -
D2 385 352 25 - 68 30 0 - 0:01 21.0 12 0:01 21.1 12 - -
60 0 - 0:40 38.7 28 0:40 38.7 28 - -
68 12032 - 4:47:52 95.7 386439 0:53:58 96.5 70329 - -
D3-F1 32541 912 2 - 28941 30 1 - 0:49 140.4 54 0:49 140.5 55 6:37 66
D3-F2 32541 912 4 - 28930 4 1 - 0:09 129.1 9 0:09 129.1 12 1:54 14
- 4 28930 30 - 1 1:31 132.8 168 1:31 132.8 172 2:11 14

Table 1: Experimental results.

we ﬁnd STSCCs on an abstract machine from the state vari- image/pre-image computations(Ops). N ew1 is the proposed
able. The abstract machine is made in the same way as in method without the trimming technique, whereas N ew2 is
livelock checking on FSM. Secondly, we evaluate the value the proposed method with the trimming technique. The
of the state variable in the livelock if the livelock exists. times are in the form of hh:mm:ss and the memory con-
Figure 9 shows the procedure that checks toggle deadlock sumptions are in M-byte. The ﬁnal two columns(T raceGen)
on a given state variable t. CheckT oggleDeadlock takes show the results on trace generation on concrete machine for
transition relation (Q), a set of states (S), a set of initial the livelock or toggle deadlock found by N ew2, and T ime
states (I), a concrete machine (C), and the state variable shows the time spent for trace generation and Len shows
(t) as procedure inputs. CheckT oggleDeadlock is similar the trace length.
to CheckLivelock in Figure 6. For each reachable livelock We have chosen 3 industrial designs (D1, D2, and D3).
group Rkj in Line 6, T estT oggleDeadlock checks whether the For each design, we have run livelock or toggle deadlock
value of t toggles or not in the livelock group and returns dlk checking on several sizes of abstract machines with the mul-
and c in Line 7. dlk represents whether the state variable is tiples of 30 latches. We have set the maximum run time to
in toggle deadlock or not, and c is the constant value (0 or 24 CPU hours.
1) in the case of toggle deadlock. In D1, there is one FSM automatically extracted. The
CheckToggleDeadlock(Q, S, I, C, t) { FSM consists of 7 latches and contains 632 latches in its COI.
1 remaining = I; We can see that the run time is exponentially increased,
2 k = 0;
3 s = PickOneState(I); depending on the size of the abstract machine. On this
4 while (s ̸= ZERO) { design, the livelock checking became infeasible when N =120.
5 (Rk , reachedk , Ok ) = FindLivelock(Q, S, s); In D2, there is also one FSM that was user-speciﬁed. The
6 for (j = 0; j < |Rk |; j++) {
7 (dlk, c) = TestToggleDeadlock(Rk j
, t);
FSM consists of 25 latches and contains only 68 latches in
8 if (dlk) { its COI. This design has a livelock group. However, the
j j
9 tracek = GenerateTrace(C, Rk , s, Ok ); livelock was not detected when N=30 and N=60. The live-
j
10 ReportToggleDeadlock(s, Rk , tracejk , c); lock was detected only when all the latches in the COI were
11 } included in the abstract machine. In other words, the ab-
12 }
13 remaining = remaining ∧ ¬reachedk ; stract machine is the concrete machine at the FSM point of
14 s = PickOneState(remaining); view. Since the livelock was found on the concrete machine,
15 k++; trace concretization is not required since the abstract trace
16 }
} in Section 4.2 is already a concrete trace.
Figure 9: Checking toggle deadlock. D2 is the only design showing a signiﬁcant performance
diﬀerence between N ew1 and N ew2 in the table. This is be-
6. EXPERIMENTAL RESULTS cause this design has many transient states as well as many
We have implemented the proposed livelock checking and livelock groups. In this case, the trimming technique signif-
toggle deadlock checking algorithms. Table 1 shows our ex- icantly reduced the number of image/pre-image operations
perimental results on livelock and toggle deadlock checking, from 386K to 70K (5.5X reduction) that gave big speed-up
generated on a 1.4 GHz Intel processor machine with 4 GB from 5 hours to 1 hour (5X speed-up). This shows that the
memory running Red Hat Linux. trimming technique helps the performance when there are
The ﬁrst column lists the design names. The next ﬁve many transient states. When there is no transient states,
columns present the statistics on the designs, in terms of the the trimming technique becomes a pure overhead as shown
number of latches (L), the number of inputs (I), the number in D3. However, the overhead is almost negligible from the
of latches in FSM (F ), the number of toggle signals to check experiment.
(T ), and the number of latches in the COI of either FSM In D3, there are two FSMs (F 1 and F 2). F 1 is composed
and a toggle signal (COI). The next three columns show the of 2 latches and a livelock was found with N =30 within 49
results on livelock and toggle deadlock checking. The col- seconds. The livelock was justiﬁed by trace concretization
umn with N shows how many latches were in the abstract that took 397 seconds, and the trace length was 66. F 2 is
machine. The column with Llk shows how many livelock composed of 4 latches and a livelock was found with N =4
groups are found and the column with Dlk shows how many (the FSM itself) in 9 seconds. The livelock was also justi-
toggle deadlock are found. The next six columns compare ﬁed by trace concretization that took 114 seconds, and the
the performance between two methods (N ew1 and N ew2), trace length was 14. We have also tried the toggle dead-
in terms of time(T ime), memory(M em), and the number of

52
lock checking on F 2 separately from the livelock checking. [5] M. Case, H. Mony, J. Baumgartner, and R. Kanzelman.
A toggle deadlock was found in 90 seconds and the concrete Enhanced veriﬁcation by temporal decomposition. In Formal
Methods in Computer Aided Design, pages 37–54, 2009.
trace was generated in 131 seconds. D3 shows the value of [6] H. Cho, G. D. Hachtel, E. Macii, M. Poncino, and F. Somenzi.
abstraction-based livelock and toggle deadlock checking. Automatic state space decomposition for approximate fsm
Table 2 shows a comparison on ﬁnding all SCCs with four traversal based on circuit analysis. IEEE Transactions on
algorithms (XB [28], Lockstep [3], Skeleton [12], IXB [20]) Computer-Aided Design, 15(12):1451–1464, Dec. 1996.
[7] O. Coudert and J. C. Madre. A uniﬁed framework for the
on the design D2 from Table 1. In this design, the number of formal veriﬁcation of sequential circuits. In Proceedings of the
recurrent states is 2.07e8 and the number of transient states International Conference on Computer-Aided Design, pages
is 1.2e6 that is only 0.6% of all states. However, it turned out 126–129, Nov. 1990.
[8] O. G. E. M. Clarke and D. Peled. Model Checking. The MIT
that how to handle these transient states eﬃciently is the key Press, 1999.
factor in the performance. One main diﬀerence between XB [9] N. Een and N. Sorensson. MiniSat.
and IXB is that IXB trims out those transient states as much http://www.cs.chalmers.se/Cs/Research/FormalMethods/
as possible. This trimming technique makes the IXB method MiniSat.
[10] E. A. Emerson and C. Lei. Modalities for model checking:
outperform on this design: faster in time (more than 15X) Branching time logic strikes back. Science of Computer
and fewer number of image operations (more than 10X) than Programming, 8:275–306, 1987.
the other methods. This explains why N ew2 outperformed [11] E. A. Emerson and C.-L. Lei. Eﬃcient model checking in
on D2 in Table 1. Table 2 also shows why livelock checking is fragments of the propositional mu-calculus. In Proceedings of
the First Annual Symposium of Logic in Computer Science,
done by ﬁnding TSCCs instead of SCCs. Finding all livelock pages 267–278, June 1986.
groups took 54 minutes, whereas ﬁnding all SCCs took 100 [12] R. Gentilini, C. Piazza, and A. Policriti. Computing strongly
minutes (2X) even with IXB. connected components in a linear number of symbolic steps. In
SODA ’03: Proceedings of the fourteenth annual ACM-SIAM
Method Time Memory Ops SCCs States symposium on Discrete algorithms, pages 573–582, 2003.
XB 84:07:56 98.2 1013333 [13] G. D. Hachtel, E. Macii, A. Pardo, and F. Somenzi. Markovian
Lockstep 45:55:53 237.3 2590724 19458 2.08e8 analysis of large ﬁnite state machines. IEEE Transactions on
Skeleton 26:01:54 266.5 2609008 Computer-Aided Design, 15(12):1479–1493, Dec. 1996.
IXB 1:39:47 92.5 102990 [14] R. Hojati, H. Touati, R. P. Kurshan, and R. K. Brayton.
Eﬃcient ω-regular language containment. In Computer Aided
Table 2: Finding all SCCs in D2. Verification, pages 371–382, Montréal, Canada, June 1992.
[15] R. P. Kurshan. Computer-Aided Verification of Coordinating
7. CONCLUSIONS Processes. Princeton University Press, Princeton, NJ, 1994.
We have presented a framework for abstraction-based live- [16] L. Lamport. Proving the correctness of multiprocess programs.
lock and toggle deadlock checking, in order to handle large IEEE Transactions on Software Engineering,
SE-3(2):125–143, Mar. 1977.
designs in practice. Since exact livelock and toggle deadlock [17] Y. Matsunaga, P. C. McGeer, and R. K. Brayton. On
checking is infeasible on real designs directly, our approach computing the transitive closure of a state transition relation.
is to check livelock and toggle deadlock on abstract machine In Proceedings of the Design Automation Conference, pages
of either an FSM or a toggle signal. Once we ﬁnd a livelock 260–265, June 1993.
[18] K. Nanshi and F. Somenzi. Constraints in one-to-many
or toggle deadlock, we justify the livelock or toggle deadlock concretization for abstraction reﬁnement. In Proceedings of the
on the concrete machine by concretizing the abstract trace Design Automation Conference, pages 569–574, 2009.
on the concrete machine. [19] S. Qadeer, R. K. Brayton, V. Singhal, and C. Pixley. Latch
Even though the proposed approach does not prove the redundancy removal without global reset. In Proceedings of the
International Conference on Computer Design, pages
non-existence of livelock or toggle deadlock on a design un- 432–439, 1996.
less the design is small enough to handle, this approach ﬁnds [20] K. Ravi, R. Bloem, and F. Somenzi. A comparative study of
livelocks or toggle deadlocks on the design if there exists. symbolic algorithms for the computation of fair cycles. In
W. A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods
To the best of our knowledge, it is the ﬁrst approach to in Computer Aided Design, pages 143–160. Springer-Verlag,
use the abstraction-based livelock checking and also the ﬁrst Nov. 2000. LNCS 1954.
approach for checking toggle deadlock. The experimental re- [21] V. Singhal. Design replacements for sequential circuits. Ph.D.
sults showed that the abstraction-based approach ﬁnds live- dissertation, University of California at Berkeley, 1996.
[22] V. Singhal, C. Pixley, A. Aziz, and R. K. Brayton. Theory of
lock errors on the real designs. safe replacements for sequential circuits. IEEE Transactions
As future work, we are interested in improving the con- on Computer-Aided Design, 20(2):249–265, Feb. 2001.
cretization, ﬁnding more accurate inﬂuential latches, and [23] R. Tarjan. Depth ﬁrst search and linear graph algorithms.
optimizing the computations with multiple FSMs or toggle SIAM Journal of Computing, 1:146–160, 1972.
[24] H. J. Touati, R. K. Brayton, and R. P. Kurshan. Testing
signals by considering the overlaps in their COIs. language containment for ω-automata using BDD’s.
Information and Computation, 118(1):101–109, Apr. 1995.
8. REFERENCES [25] M. Y. Vardi and P. Wolper. An automata-theoretic approach to
[1] A. Biere, C. Artho, and V. Schuppan. Liveness checking as automatic program veriﬁcation. In Proceedings of the First
safety checking. In International Workshop in Formal Symposium on Logic in Computer Science, pages 322–331,
Methods for Industrial Critical Systems, pages 160–177, 2002. Cambridge, UK, June 1986.
[2] A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symbolic model [26] T.-H. Wang and T. Edsall. Practical FSM analysis for verilog.
checking without BDDs. In Fifth International Conference on In IVC-VIUF ’98: Proceedings of the International Verilog
Tools and Algorithms for Construction and Analysis of HDL Conference and VHDL International Users Forum,
Systems (TACAS’99), pages 193–207, Amsterdam, The pages 52–58, 1998.
Netherlands, Mar. 1999. LNCS 1579. [27] A. Xie and P. A. Beeral. Eﬃcient state classiﬁcation of
[3] R. Bloem, H. Gabow, and F. Somenzi. An algorithm for ﬁnite-state markov chains. IEEE Transactions on
strongly connected component analysis in n log n symbolic Computer-Aided Design, 17(12):1334–1339, Dec. 1998.
steps. In Formal Methods in Computer Aided Design, pages [28] A. Xie and P. A. Beeral. Implicit enumeration of strongly
37–54, 2000. connected components and an application to formal
[4] R. Bryant. Graph-based algorithms for boolean function veriﬁcation. IEEE Transactions on Computer-Aided Design,
manipulation. IEEE Transactions on Computers, 19(10):1225–1230, Oct. 2000.
C-35(8):677–691, Aug. 1986.

53
DIFTS13 Keyword Index

Keyword Index

abstraction 4
abstraction-based 46
assertion 38

Boolector 28

Data-path equivalence checking 9
deadlock checking 46
dynamic verification 38

IC3 19

Lambda 28
Lemmas on Demand 28
livelock checking 46

Model Checking 19
modelchecking 4

Polynomial equivalence checking 9

QF BV SMT solving 9

reparameterization 4

SAT solver 19
SMT 28
synthesis 4
systemC 38

verification 4
DIFTS13 Author Index

Author Index

Biere, Armin 28
Brayton, Robert 9

Cabodi, Gianpiero 19
Case, Michael 9
Cleaveland, Rance 1

Dutta, Sonali 38

Een, Niklas 4

Fujita, Masahiro 2

Goswami, Dhiraj 3

Harer, Kevin 46

Long, Jiang 9

Mishchenko, Alan 4, 19
Moon, In-Ho 46

Niemetz, Aina 28

Palena, Marco 19
Preiner, Mathias 28

Tabakov, Deian 38

Vardi, Moshe Y. 38