Pr
 rogram
      m Proceeding
                 gs
                               Program Chairs


Malay K. Ganai               NEC Labs America USA


Armin Biere                  Johannes Kelpler University Austria


                         Technical Program Committee


Clark W. Barrett             New York University USA


Armin Biere                  Johannes Kelpler University Austria


Alessandro Cimatti           Fondazione Bruno Kessler Italy


Cindy Eisner                 IBM Haifa Research Lab Israel


Malay K. Ganai               NEC Labs America USA


Ganesh Gopalakrishnan        University of Utah USA


Daniel Kroening              Oxford University UK


Robert P. Kurshan            Cadence Design Systems USA


Ken Mcmillan                 Microsoft Research USA


Chao Wang                    Virginia Tech, USA


                          Local Arrangement Chairs


Sandip Ray                   UT Austin, USA


Nadia Papakonstantinou       NEC Labs America


 DIFTS 2011                          i
                                          Preface

The first DIFTS (Design and Implementation of Formal Tools and Systems) workshop was held at
Austin, Texas on Nov 3rd, 2011, co-located with Formal Methods in Computer-Aided Design
(FMCAD) Conference. The workshop emphasized the insightful experiences in tool and system
design. It provided a forum for sharing challenges and solutions that are highly original with ground
breaking results. It took a broad view of the formal tools area, and solicited contributions from
various domains including decision procedures, verification, testing, validation, diagnosis,
debugging, and synthesis. It provided a forum for discussing the engineering aspect of the tools,
and various design decisions required to put them in practical use. The workshop provided a
discussion forum for a pragmatic view of practicing formal methods.

The workshop received 9 original submissions, out of which 4 were chosen under tool category
and 2 were chosen under system category. There were also two invited talks: one given by
Andreas Kuehlmann, Sr. VP of R&D at Coverity on “The pain of making research tools into
software products”, and another by Chris Morison, Chief Architect, Real Intent Inc. on “From putty
to product: what it takes to bring a verification tool to market”.

First of all we thank FMCAD’s steering committee for their approval of the workshop. We also
thank Anna Slobodova, David Rager, and Sandip Ray for their help in local arrangements and
Nadia Papakonstantinou for her help in web updates. We sincerely thank the program committee
members and sub reviewers for selecting the papers and providing candid review feedbacks to the
authors. Last but not least, we thank all the authors for contributing to the workshop and to all the
participants of the workshop.


Malay K. Ganai and Armin Biere

Program Chairs


     DIFTS 2011                                   ii
      DIFTS’11                                                                                                  Table of Contents


      Table of Contents
      The pain of making research tools into software products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   1
        Andreas Kuehlmann
      From Putty to Product: What it takes to bring a Verification Tool to Market. . . . . . . . . . . . .                         2
        Chris Morrison
      An Application of Formal Methods to Cognitive Radios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     3
       Konstantine Arkoudas, Ritu Chadha and Jason Chiang
      Data Structure Choices for On-the-Fly Model Checking of Real-Time Systems . . . . . . . . . . . . 13
       Peter Fontana and Rance Cleaveland
      metaSMT: Focus on Your Application not on Solver Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 22
       Finn Haedicke, Stefan Frehse, Goerschwin Fey, Daniel Grosse and Rolf Drechsler
      A Study of Sweeping Algorithms in the Context of Model Checking . . . . . . . . . . . . . . . . . . . . . . 30
        Zyad Hassan, Yan Zhang and Fabio Somenzi
      Enhancing ABC for stabilization verification of SystemVerilog/VHDL models . . . . . . . . . . . . . 38
        Jiang Long, Sayak Ray, Baruch Sterin, Alan Mishchenko and Robert K. Brayton
      On Incremental Satisfiability and Bounded Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
       Siert Wieringa


                                                                                                                                   1


DIFTS 2011                                                          iii
    The Pain of Making Research Tools into Software Products
                                      Andreas Kuehlmann
                                  Sr. VP of R&D at Coverity
                President of IEEE Council on EDA (CEDA), and an IEEE fellow

Abstract: Building a research implementation of a new algorithm and validating it for a small
set of benchmarks is a key component of pushing the state of the art of design and verification
tools. Moreover, if the algorithm works, it is often also the most exciting phase of the long
journey to have a new idea used by many people. The less exciting and often more painful part
includes the productization of the research into a form that can be applied by hundreds or
thousands of users on a daily basis. In this talk we discuss some of the challenges that developers
encounter in a product development environment from prototyping to code maintenance. The
presentation will cover practical aspects of developing software in larger teams from product
planning to deployment and support. We will also elaborate on some of the opportunities for new
research to help the practical deployment and maintenance of software products.


DIFTS 2011                                       1
      From Putty to Product: What it takes to bring a
              Verification Tool to Market
                                       Dr. Chris Morrison
                                 Chief Architect, Real Intent Inc.

Abstract: For a new company interested in developing a formal verification product, the
temptation would be to believe that the majority of the time-to-market would be spent on
developing the core formal technology. As a corollary, if that core technology could be bought
off-the-shelf, the time-to-market would be greatly reduced. While that belief may have been true
for the first generation of formal verification products, it is, unfortunately, not credible today.
There are many "significant items" in state-of-the-art products that now consume considerable
resources. The input language has changed from BLIF to SystemVerilog '09 and VHDL '08.
Designs that the formal tools had to consider used to be single-clock and simple reset designs.
Now they have 10+ clocks, derived and gated clocks, complex resets and multiple functional
modes. The output is no longer a simple text file, but a hyperlinked report tying together VCD
waveforms, source RTL, and state machine and schematic viewers. Manual control with many
switches is no longer acceptable. A nearly automatic flow with very few switches is demanded.
The tool must manage to deliver very high throughput for thousands of checks on multimillion-
gate designs as well as acceptable user experience on high-latency checks. User productivity is
the combination of tool run time, piloting convenience as well as the debug experience. The
infrastructure needed to achieve these goals (which was originally simple and relatively quick to
implement) is now very complex involving an intricate flow through a variety of substantial
software packages and multiple databases. With the goal of reaching a broad class of customers,
current and future verification products must be automated, easy to use and be able to reliably
produce actionable results. Delivering on that requires attention to every part of the product
rather than just the formal verification technology.


DIFTS 2011                                       2
                                                                                                                   ∗
                      An Application of Formal Methods to Cognitive Radios


                           Konstantine Arkoudas and Ritu Chadha and Jason Chiang
                                             Telcordia Research
                                             One Telcordia Drive
                                            Piscataway, NJ 08854
                            {konstantine, chadha, chiang}@research.telcordia.com


ABSTRACT                                                                        As wireless devices continue to proliferate, demand for
We discuss the design and implementation of a formal policy                 access to RF spectrum is becoming increasingly pressing,
system regulating dynamic spectrum access (DSA) for cog-                    and increasingly difficult to achieve. After all, the (usable)
nitive radios. DSA policies are represented and manipulated                 spectrum is finite, while the demand for it continues to in-
in a proof framework based on first-order logic with arith-                 crease without bounds. However, it has been recognized [13,
metic and algebraic data types. Various algebraic operations                14] that problems of spectrum scarcity and overuse arise not
combining such policies can be easily implemented in such                   so much from physical shortage of frequencies, but mostly
a framework. Reasoning about transmission requests is for-                  as a result of the centralized and rigid way in which spectrum
mulated as a satisfiability-modulo-theories (SMT) problem.                  allocation has been managed. Spectrum has been allocated
Efficient SMT solvers are used to answer the corresponding                  in a static way: bands of the RF range are statically assigned
queries and also to analyze the policies themselves. Further,               to licenced users on a long-term basis, who then have ex-
we show how to reason about transmission requests in an op-                 clusive rights to the corresponding region. Examples here
timal way by modeling the problem as an SMT instance of                     include the 824–849 MHz, 1.85–1.91 GHz, and 1.930–1.99
weighted Max-SAT, and we demonstrate that this problem                      GHz frequency bands, which are reserved for FCC-licenced
too can be efficiently solved by cutting-edge SMT solvers.                  cellular and personal communications services (PCS). Other
We also show that additional optimal operations on trans-                   parts of the RF spectrum, by contrast, are unlicensed, and
mission requests can be cast as classic optimization prob-                  anyone can transmit in those frequencies as long as their
lems, and to that end we give an algorithm for minimizing                   power does not exceed the regulatory maximum. Conse-
integer-valued objective functions with a small number of                   quently, large portions of the RF range remain underutilized
calls to an oracle SMT solver. We present experimental re-                  for long periods of time, while other parts—such as the ISM
sults on the performance of our system, and compare it to                   bands—are overutilized.
previous work in the field.                                                     Observations like these provided the impetus behind the
                                                                            NeXt Generation (XG) Communications program, a tech-
                                                                            nology development program sponsored by DARPA [9], which
1.    INTRODUCTION                                                          proposed opportunistic spectrum access as a means of cop-
   One of the world’s most prized physical resources is the                 ing with spectrum scarcity. The underlying approach has
electromagnetic spectrum, and particularly its radio frequency              come to be known as dynamic spectrum access (DSA), and
(RF) portion, stretching roughly from 10 KHz to 300 GHz.                    has evolved into a general methodology for dealing with
The RF spectrum is so valuable that its allocation is strictly              spectrum scarcity [17]. The main idea is that a DSA net-
regulated by world governments, and these days even small                   work has two classes of users: primary and secondary. Pri-
parts of it can be sold for billions of dollars in spectrum auc-            mary users are licenced to use a particular spectrum band
tions.                                                                      and always have full access to that band whenever they need
∗The research reported in this paper was performed in connection
                                                                            it. Secondary users are allowed to use such spectrum por-
                                                                            tions but only as long as their use does not interfere with the
with contract number W15P7T-08-C-P213 with the U. S. Army
                                                                            operating needs of the primary users. The secondary users
Communications Electronics Research and Development Engineer-
ing Center (CERDEC). The views and conclusions contained in this            are typically cognitive radios [4] that can dynamically adjust
document are those of the authors and should not be interpreted as          their operating characteristics (such as waveform, power, etc.).
presenting the official policies or position, either expressed or im-       Secondary users sense the spectrum for available transmis-
plied, of the U. S. Army CERDEC, or the U. S. Government unless             sion opportunities, determine the presence of primary users,
so designated by other authorized documents. Citation of manufac-
turers or trade names does not constitute an official endorsement
                                                                            and attempt to use the spectrum in a way that interferes as
or approval of the use thereof. The U. S. Government is autho-              little as possible with the activities of the primary users.
rized to reproduce and distribute reprints for Government purposes              Policies are used for specifying how secondary radio nodes
notwithstanding any copyright notation herein.


     DIFTS 2011                                                         3
are allowed to behave. Policies consist of declarative state-            gramming language [3]. Athena’s rich logic, sort system,
ments that dictate what secondary radios may or may not                  and programming language significantly facilitated the rep-
do, without prescribing how they might do it [18]. Different             resentation and manipulation of policies. Another innova-
policies may be applicable in different regions of the world.            tion of our paper is the use of SMT solvers [15] for rea-
Even in one and the same region there may be multiple poli-              soning with and about policies. The problem of determin-
cies in place, reflecting different constraints imposed by dif-          ing whether a transmission request is permissible by a given
ferent regulatory bodies. Policy systems for cognitive radios            policy can be couched as a satisfiability problem, namely,
should be able to load new policies on the fly.                          the problem of determining whether the request is consistent
   There are two key questions in designing such a policy                with the policy. So SAT solving would seem to be a natu-
system, one representational and the other computational:                ral fit for this domain. However, policy rules in this domain
How should we represent policies, and how can we make ef-                also make heavy use of equality and numeric relations, not to
ficient use of them? Policies regulating DSA must be able to             mention symbolic values such as transmission modes. Thus,
express rules such as the following: “Allow a transmission if            viewed as an abstract syntax tree, the body of a policy rule
its frequency is in a certain range and the power does not ex-           has an arbitrarily complicated boolean structure internally,
ceed a certain limit, or if the transmission is an emergency.”           with relational atoms at the leaves, whose semantics come
Thus, for representation purposes, we need at least the full             from standard theories such as those of arithmetic, equal-
power of propositional logic, as well as numbers (both inte-             ity, and so on. That is what makes SMT solving an ideal
gers and reals), and numeric comparison relations. It would              paradigm for this problem. We will show that the problem
also be helpful to have algebraic datatypes at our disposal,             of making optimal adjustments to transmission parameters
so that we could, for example, represent the mode of a trans-            can also be formulated quite naturally in this setting as a
mission as one of a finite number of distinct symbolic val-              weighted Max-SAT problem, many instances of which can
ues, such as everyDay or emergency. A certain amount of                  be efficiently solved despite the problem’s high theoretical
quantification would also be convenient, so that we could,               complexity. Indeed, competitive SMT solvers such as CVC3
e.g., quantify over all possible transmissions. We must also             [7] and Yices [6] are highly optimized programs that can
be able to introduce arbitrary transmission parameters, so a             readily decide extremely large formulas. As a quick com-
strong type system, preferrably with subtyping, should be                parison, another recently built policy system for spectrum
available. Hence, first-order logic with arithmetic, and per-            access, Bresap, which uses BDDs as its underlying technol-
haps additional features such as algebraic datatypes and sub-            ogy, cannot handle policies with more than 20 atomic ex-
sorting, would appear to be a natural representation choice,             pressions [1, p. 83]. By contrast, our system can easily han-
provided that we can make efficient use of it, a point that we           dle policies with many thousands of atomic expressions.
will address shortly.                                                       The remainder of this paper is structured as follows. The
   So much for representation. What are the main computa-                next section describes the representation of spectrum-access
tional problems that a policy system must solve? The main                policies in Athena.1 Section 3 discusses the integration of
problem is this: determine whether an arbitrary transmission             SMT solvers with Athena, and shows how to reduce the
request should be allowed or denied, given the current set of            problem of reasoning with spectrum-access policies to SMT
policies. In fact, if this were all there was to using policies          solving. In section 4 we model the problem of making opti-
for spectrum access, things would be relatively simple. But              mal changes to transmission requests as a Max-SAT prob-
we usually want the policy engine to do more. For example,               lem, and we describe how our system models and solves
when a transmission request is not permissible, a simple “de-            such problems. We also present an optimizing algorithm
nied” answer is not very useful. The policy system should                that uses a version of binary search in order to minimize
also tell the requesting radio why the request was denied,               integer-valued objective functions with very few calls to an
and more importantly, what it would take for its request to              oracle SMT solver (at most log(n) such calls, where n is
be allowed. As a simple example, the system might tell the               the maximum value that can be attained by the objective
radio: “Your request cannot be allowed in its present form,              function). This algorithm can be used to further optimize
but it will be allowed if you only change your transmission              transmissions, e.g., by minimizing the distance between the
power to such-and-such level.” Further, in such cases the                values desired by the cognitive radio and the values allowed
policy system will usually have many choices as to which                 by the policy. Section 5 presents results evaluating the per-
parts of the transmission request to modify in order to make             formance of our policy engine. Finally, section 6 discusses
it permissible. We then want it to make an optimal choice,               related work and concludes.
i.e., a choice that satisfies the original request to the greatest
possible extent.                                                         2.   POLICY REPRESENTATION
   In this paper we present an implementation of a policy
system in Athena, an interactive proof system for polymor-               1
                                                                           While some Athena code is presented, knowledge of the language
phic multi-sorted first-order logic with equality, algebraic             is not necessary. Any reader familiar with formal methods and
data types, subsorting, and a higher-order functional pro-               functional programming languages should be able to follow the
                                                                         code.


     DIFTS 2011                                                      4
   The set of radio transmissions is treated as an abstract data           normally be where the ellipses now appear in (1) (or in (2)).
type with fields such as frequency, power, mode, etc. The                  Then the result of the said application is the sentence ob-
abstract domain of transmissions is introduced in Athena as                tained from body by replacing every free occurrence of the
follows:                                                                   quantified variable ?x:Transmission by the term t. In
                                                                           Athena code, this procedure is defined as follows:
domain Transmission
                                                                           d e f i n e apply :=
A transmission field (or “parameter”) can then be introduced                  lambda (C t)
as a function from transmissions to values of some appro-                          match C {
                                                                                      (forall x ((_ x) <==> body)) =>
priate sort. For illustration purposes, we demonstrate the                              (replace-var x t body) }
declaration of the following parameters: frequency, power,
mode, and time. Additional parameters can be introduced as                 The built-in ternary Athena procedure replace-var is
needed.                                                                    such that (replace-var x t p) produces the result ob-
                                                                           tained from the sentence p by replacing every free occur-
declare     frequency: [Transmission] -> Int                               rence of variable x by the term p. As a concrete example,
declare     power: [Transmission] -> Real                                  here is a sample AC:
declare     mode: [Transmission] -> Mode
declare     hours, minutes: [Transmission] -> Int                          d e f i n e AC1 :=
                                                                            (forall ?x .
Here Int and Real are built-in Athena sorts represent-                           allow ?x <==>
                                                                                   frequency ?x in [5000 5700] &
ing the sets of integers and real numbers, respectively, with                      power ?x < 30.0 &
Int a subsort of Real. The sort Mode is a finite algebraic                         mode ?x = everyDay & hours ?x <= 11)
datatype:
                                                                              It says that a transmission is (provisionally) permissible iff
d a t a t y p e Mode := emergency | everyDay | ...
                                                                           its frequency is in the 5000 . . . 5700 MHz range; its power is
   There are two distinguished unary predicates on the set of              less than 30.0 dBm; its mode is everyDay; and the trans-
all possible transmissions: allow and deny. These predi-                   mission’s time is no later than 11 in the morning. Here in is
cates are used by policies to specify which transmissions are              a binary procedure defined as follows:
considered permissible and which are not. More concretely,
a policy is represented as a pair of sentences: an allow con-              d e f i n e in :=
dition and a deny condition. An allow condition (or AC for                    lambda (x range)
short) is of the form: 2                                                             match range {
                                                                                       [a b] => (a <= x & x <= b)}
   (forall ?x:Transmission.           allow ?x <==> · · · )      (1)

while a deny condition (DC) is of the form:                                (Note that variables do not need to be explicitly annotated
                                                                           with their sorts. Athena has polymorphic sort inference, so
   (forall ?x:Transmission.            deny ?x <==> · · · )      (2)       a Hindley-Milner algorithm will recover the most general
 Either condition may be absent, i.e., a policy may specify                possible sorts of all variable occurrences.) We can now con-
only an allow condition but no deny condition, or only a                   struct a policy with the above AC and with no DC as follows:
deny condition but no allow condition. An absent condition                 d e f i n e policy-1 := (make-policy AC1 ())
is represented by the unit value, (). The constructor for
policies is therefore a procedure that takes two conditions                Our second policy example has both an AC and a DC:
and simply puts them together in a two-element list:                       d e f i n e policy-2 :=
d e f i n e make-policy :=                                                    (make-policy
   lambda (AC DC) [AC DC]                                                            (forall ?x .
                                                                                       allow ?x <==> mode ?x = emergency)
                                                                                     (forall ?x .
There are two destructor or selector procedures, one for re-                            deny ?x <==> power ?x > 25.0))
trieving each component of the policy:
d e f i n e [get-AC get-DC] := [first second]
                                                                           A key operation on policies is that of merging. New policies
                                                                           may be downloaded into the policy database at any given
   An important operation is the application of a given pol-               time. A newly acquired policy must be integrated with the
icy condition C of the form (1) or (2) to a term t of sort                 existing policy that is in place in order to produce a single
Transmission. Let body be whatever sentence would                          merged policy with a new AC and a new DC. Merging is
2
                                                                           therefore a binary operation. The current definition of merg-
  A more accurate term instead of allow would be “allow provi-             ing two policies p1 and p2 is simple: it is disjunction on
sionally,” because, as we will see, for a transmission request to be
granted, the allow condition must hold and the deny condition must         both the AC and DC. Specifically, the AC of the merged pol-
not hold. With this caveat in mind, we will continue to use the term       icy allows a transmission t iff p1 allows t or p2 allows it;
allow in the interest of simplicity.                                       and it denies t iff p1 denies it or p2 denies it. Merging is


     DIFTS 2011                                                        5
implemented as an Athena procedure. Other alternative def-           a yes/no answer from the policy system. But in general the
initions of merging are possible, and these could be straight-       policy system needs to solve more interesting versions of this
forwardly implemented in the same way. As an example,                problem, namely: If the transmission request is granted, find
here is the result of merging policy-1 and policy-2,                 appropriate values for any missing parameters, in case the
as these were defined above:                                         request was incomplete; if the request is not granted, find ap-
d e f i n e merged-policy :=                                         propriate values for the transmission parameters that would
   (merge-policies policy-1 policy-2)                                render the transmission permissible. Moreover, in the sec-
                                                                     ond case, we often want to compute values in a way that is
>(get-AC merged-policy)                                              optimal w.r.t. to the given request, e.g., so that the original
                                                                     transmission request is disrupted as little as possible. We
Sentence:
  (forall ?v4:Transmission                                           discuss such optimality issues in section 4.
   (iff (allow ?v4)                                                     The above problems are naturally formulated as satisfia-
        (or (and (and (<= 5000                                       bility problems. However, the most appropriate satisfiability
                          (frequency ?v4))                           framework here is not that of straight propositional logic, but
                      (<= (frequency ?v4)                            rather that of satisfiability modulo theories, or SMT for short
                           5700))
                 (and (< (power ?v4)                                 [15]. In the SMT paradigm, the solver is given an arbitrary
                          30.0)                                      formula of first-order logic with equality and its task is to de-
                      (and (= (mode ?v4)                             termine whether or not the formula is satisfiable with respect
                                everyDay)                            to certain background theories. Typical background theories
                            (<= (hours ?v4)                          of interest are integer and/or real arithmetic (typically linear,
                                 11))))
            (= (mode ?v4)                                            but not necessarily), inductive data types (free algebras), the
               emergency))))                                         theory of uninterpreted functions, as well as theories of lists,
                                                                     arrays, bit vectors, etc. Hence, most of the function symbols
>(get-DC merged-policy)                                              and predicates that appear in the input formula have fixed in-
                                                                     terpretations, given by these background theories. An SMT
Sentence: (forall ?v5:Transmission
            (iff (deny ?v5)                                          solver will not only determine whether a sentence p is satisfi-
                 (> (power ?v5)                                      able; if it is, it will also provide appropriate satisfying values
                    25.0)))                                          for the free variables and/or constants that occur in p.
                                                                        Athena is integrated with a number of SMT solvers, such
(Note that while Athena accepts input in either infix or pre-        as CVC3 and Yices; the one used for this project is Yices [6],
fix form, sentences are output in prefix for indentation pur-        mostly because it can solve Max-SAT problems. The main
poses.)                                                              interface is given by a unary Athena procedure smt-solve,
   We define a transmission request as a set of constraints          which takes an arbitrary first-order sentence p and attempts
over some object of sort Transmission, typically just                to determine its satisfiability with respect to the appropri-
a free variable ranging over Transmission. These con-                ate theories. If successful, smt-solve will return a list of
straints, which are usually desired values for some—or all—          values for all free variables and/or constants that occur in p.
of the transmission parameters, can be expressed simply as           Some examples:
a sentence. A sample transmission request:
                                                                     d a t a t y p e Day :=
d e f i n e request-1 := (frequency ?t = 150 &
                                                                        Mon | Tue | Wed | Thu | Fri | Sat | Sun
                          mode ?t = everyDay &
                          hours ?t = 16)
                                                                     >(smt-solve ?x = 2 & ?y = 3.4 |
                                                                                 ?x = 5 & ?d = Mon)
This is a conjunction specifying that the frequency of the
desired transmission (represented by the free variable ?t)           List: [(= ?y 3.4) (= ?x 2) (= ?d Mon)]
should be 150, its mode should be everyDay, and its time
should be between 4:00 and 4:59 (inclusive) in the evening.          The input to smt-solve here was a disjunction of two
A transmission request may be incomplete, i.e., it may not           conjunctions. The free variable ?d ranges over the datatype
specify desired values for every transmission parameter. The         Day; while ?x and ?y range over Int and Real, respec-
preceding request was incomplete, since it did not specify           tively. The output is a list of identities providing values for
values for the power and minutes parameters.                         each free variable that render the input satisfiable. Because
                                                                     the argument to smt-solve is a sentence, i.e., a native
3.   POLICY REASONING                                                Athena data value, this provides a very flexible and high-
   The main task of the policy system is this: determine             bandwidth programmable interface to SMT solvers. This in-
whether the policies currently in place allow or deny a given        terface proved extremely useful for our system.
transmission request. This is in fact the simplest formula-             The policy reasoning problems described earlier have a
tion of the core reasoning problem, in that it only requires         natural formulation in SMT. Specifically, consider an arbi-


     DIFTS 2011                                                  6
trary transmission request tr, i.e., a first-order sentence with        with the desired value vi of each transmission parameter pi .
a free variable x ranging over Transmission (there may                  The weight wi reflects the importance that the radio attaches
be other free variables as well). To determine whether tr               to pi taking the value vi . The SMT solver will then try to
is allowed by the current policy: (1) We construct a longer             find a satisfying assignment for the request that has maxi-
request, tr0 , which is the conjunction of (a) tr; (b) the appli-       mal total weight. In that case, the request is not just a list
cation of the current policy’s AC to x; and (c) the application         of constraints [c1 · · · cn ] but a list of pairs of constraints
of the negation of the current policy’s DC to x:                        and weights [[c1 w1 ] · · · [cn wn ]]. A sample trans-
tr & (apply (get-total-AC) x) &                                         mission request might then be:
     ˜ (apply (get-total-DC) x)                                         d e f i n e weighted-request :=
                                                                           [[(frequency ?t = 8000) 10]
The procedure get-total-AC returns the AC of the cur-                         [(power ?t = 35.0)       15]
rent (“total”) policy; likewise for get-total-DC. So tr0                      [(hours ?t = 13)         30]]
represents all the constraints imposed on the requested trans-
mission, both by the original transmission request, tr, and             indicating that the relative importance of the frequency being
by the policy itself, whose AC must hold for the transmis-              8000 is 10, that of the power being 35.0 is 15, and that of
sion variable x while its DC must not. (2) We then run                  hours being 13 is 30. Suppose further that the policy in
smt-solve on tr’. If the result is a satisfying assignment,             place has no DC and a disjunctive AC:
then the request is granted, and all we need to do is report
                                                                        d e f i n e AC1 :=
values for any missing parameters (parameters that did not                 (forall ?t . allow ?t <==>
figure in the given request). If, by contrast, smt-solve                          (frequency ?t in [5000 7000] &
determines that tr’ is unsatisfiable, then tr in its given form                    power ?t <= 30.0 & hours ?t > 12) |
must be rejected. In that case we make a blank request con-                       (frequency ?t in [6000 9000] &
                                                                                   power ?t <= 40.0 & hours ?t <= 8))
sisting of the application of the policy’s AC to x conjoined
with the application of the negation of the policy’s DC to              d e f i n e policy := (make-policy AC1 ())
x, and run smt-solve on it. This blank request therefore
imposes no constraints at all on the transmission apart from            Now consider evaluating weighted-request with re-
those imposed by the policy. If the result is a satisfying as-          spect to this policy. Clearly, the request cannot be allowed
signment, we provide it to the user, otherwise we report that           as is. We can make it permissible in more than one way:
the current policy is unsatisfiable.                                    (a) we could demand a change of frequency and power in
   The Athena code for this algorithm is expressed in a pro-            accordance with the values prescribed by the first branch of
cedure evaluate-request, which accepts an arbitrary                     AC1, while keeping the hours parameter to the requested
request and processes it as described above. Here is the out-           value of 13; or (b) we could demand a change of the hours
put for the example of the previous section:                            parameter only, in accordance with the second disjunctive
>(evaluate-request request-1)                                           branch of AC, while keeping the desired frequency and power
                                                                        values; or (c) we could demand that all three parameter val-
Transmission allowed. Appropriate values                                ues change. From these possibilities, only (a) is optimal, in
for missing parameters:                                                 that it honors the original request to the greatest extent pos-
[(= (power ?t1) 20.0)]
                                                                        sible (as determined by the given weights). Running this
                                                                        example in our system results in the following output:
4.   COMPUTING OPTIMAL ADJUSTMENTS                                      >(evaluate-request weighted-request)
     TO TRANSMISSION PARAMETERS
                                                                        The following parts of the request
   When a transmission request is denied, there are usually
                                                                        could not be satisfied:
many different ways of modifying it so as to make it per-
missible. For instance, the solver could change the desired             (= (power ?t) 35.0) (= (frequency ?t) 8000)
transmission’s time; or it could change its frequency; or it
could change its power and mode; or it could change all of              Here is a maximally satisfying assignment:
the above. Some of these actions may be preferable to oth-
                                                                        [(= (mode ?t) everyDay)
ers. For instance, the radio might be less willing to change             (= (frequency ?t) 5999)
the time of the transmission, or its power level, rather than            (= (power ?t) 30) (= (hours ?t) 13)]
the frequency. In such cases we want the policy system to
return a satisfying assignment that is optimal in the sense of          By contrast, if we had changed the weight of the hours
respecting as many such preferences of the radio as possible.           parameter from 30 to 20, the result would then change the
   Our system achieves this in a flexible way by formulating            hours parameter instead of the frequency and power, since
the problem as an (SMT) instance of Max-SAT. In its trans-              retaining the values of the two latter parameters would result
mission request, the radio can provide a weight wi along                in a maximum weight of 25.


     DIFTS 2011                                                     7
   Note that weights can be arbitrary integers or a special              z in [720 800]) |
“infinite” token, indicating a hard constraint that must be             (x in [500 600] & y in [30 40] &
satisfied. In fact this infinite weight is the one that Athena           z in [920 925])
attaches to the constraints obtained from the AC (and the
negation of the DC) of the current policy. But radios can             Suppose also that the desired values are x = 13, y = 15,
also use infinite weights in their requests.                          z = 922. The task is to find values for these variables that
   Occasionally there may be additional requirements on the           satisfy c while diverging from the desired values as little as
assignments returned by the policy system, beyond honor-              possible. We can readily model this problem in a form that
ing the original request to the greatest extent possible. For         is amenable to smt-solve-and-minimize as follows.
example, if a parameter value must change, we may want                First we define the objective-function term t, as the sum of
the new value to be as close as possible to the original re-          the three differences:
quested value. If we are not allowed to transmit at the exact         d e f i n e t := (?d-x:Int + ?d-y:Int + ?d-z:Int)
desired time, for instance, we may want to transmit as close
to it as possible (as can be allowed by the currently installed       with the difference terms defined as follows:
policies). To meet such requirements we generally need the
                                                                      d e f i n e d-x-def :=
ability to perform optimization by minimizing some objec-                (ite (x > 13) (?d-x = x - 13)
tive function, in this case the absolute difference between de-                            (?d-x = 13 - x))
sired and permissible parameter values. Most SMT solvers              d e f i n e d-y-def :=
do not perform optimization (apart from Max-SAT, in the                  (ite (y > 15) (?d-y = y - 15)
case of Yices, though see below), but we can efficiently im-                               (?d-y = 15 - y))
                                                                      d e f i n e d-z-def :=
plement an integer optimizer in terms of SMT solving. The                (ite (z > 922) (?d-z = z - 922)
idea is to use binary search in order to discover the opti-                                  (?d-z = 922 - z))
mal cost with as few calls to the SMT solver as possible: at
most O(log n) calls, where n is the maximum value that the            Here ite is a simple if-then-else procedure:
objective function can attain. Specifically, let c be an arbi-
trary constraint that we wish to satisfy in such a way that           d e f i n e ite :=
                                                                         lambda (c x y) ((c ==> x) &              (˜ c ==> y))
the value of some “cost” term t is minimized, where max is
the maximum value that can be attained by the cost function           Assume that we do not know the exact maximum value that
(represented by t). (Tf this value is not known a priori, it          t can attain, but we know that it cannot be more than 106 .
can be taken to be the greatest positive integer that can be          We can then solve the problem with the following call:
represented on the computer.) The algorithm is the follow-
ing: We first try to satisfy c conjoined with the constraint          d e f i n e diff-clauses :=
that the cost term t is between 0 and half of the maximum                (d-x-def & d-y-def & d-z-def)
                                                                      d e f i n e query := (c & diff-clauses)
possible value: 0 ≤ t ≤ (max div 2). If we fail, we recur-
sively call the algorithm and try to satisfy c augmented with         >(smt-solve-and-minimize query t 1000000)
the constraint (max div 2) + 1 ≤ t ≤ max. Whereas, if
we succeed, we recursively call the algorithm and try to sat-         List: [(= ?x 13) (= ?y 15) (= ?z 800)
isfy c augmented with the constraint 0 ≤ t ≤ (max div 4).                    (= ?d-x 0) (= ?d-y 0) (= ?d-z 122)]
Some care must be taken to get the bounds right on each
call, but this algorithm is guaranteed to converge to the min-        This solution was found by a total of 8 calls to the SMT
imum value of t for which c is satisfied, provided of course          solver (for a total time of about 100 milliseconds). Why
that the original constraint c is satisfiable for some value of       were only 8 calls required when we started the binary search
t. This algorithm is implemented by a ternary Athena proce-           with a maximum of 106 ? One would expect about log 106
dure smt-solve-and-minimize, such that                                calls to the smt solver, i.e., roughly 20 such calls. However,
                                                                      our implementation uses the information returned by each
      (smt-solve-and-minimize c t max)                                call to the SMT solver to speed up the search. That often
returns a satisfying assignment for c that minimizes t (whose         results in drastic shortcuts, cutting down the total number of
maximum value is max).                                                iterations by more than a factor of 2.
   For example, suppose that x, y, and z are integer variables           This procedure enables our system to optimize any quan-
to be solved for:                                                     tity that can be given an integer metric. Moreover, unlike
                                                                      independent branch-and-bound algorithms for integer pro-
d e f i n e [x y z] := [?x:Int ?y:Int ?z:Int]                         gramming, it has the advantage that it allows not just for nu-
                                                                      meric constraints, but for arbitrary boolean structure as well,
subject to the following disjunctive constraint:                      along with constraints from other theories such as reals, lists,
d e f i n e c :=                                                      arrays, bit vectors, inductive datatypes, etc. While more so-
   (x in [10 20] & y in [1 20] &                                      phisticated approaches to optimization in the SMT paradigm


    DIFTS 2011                                                    8
are beginning to emerge (e.g., [2]), the implementation de-
scribed here has been quite adequate for our purposes.

5.     PERFORMANCE
   In order to test our system we wrote an Athena program
that generates test instances, with two independently con-
trollable variables: parameter number and policy number.
In particular, we defined a procedure make-policy-set with
two parameters, param-num and policy-num. The output is
a list of policy-num policies, where each policy involves
param-num transmission parameters. The latter came from
                                                                        Figure 1: SMT solving times for processing regular
a pre-declared pool of 100 transmission parameters, half of
                                                                        transmission requests.
them integer-valued and the other half real-valued. We dis-
tinguished the following types of policies:
     • permissive policies, which have only an AC;                      among all the preceding types. Specifically, 20% of the
                                                                        policies in L are ordering/N policies. About 1/3 of these
     • prohibitive policies, which have only a DC and no AC;            ordering/N policies are permissive, 1/3 are prohibitive, and
                                                                        1/3 mixed. The next 20% of L consists of equational/N poli-
     • mixed policies, which have both.
                                                                        cies, and these are again evenly split between permissive-,
     • Ordering/N policies. These policies are parameter-               prohibitive-, and mixed-policy subsets. The third 20% con-
       ized over N > 0. Specifically, an ordering/N policy is           tains inequational/N policies, and so forth. Once a list L of
       one whose AC and/or DC is of the following form:                 policies is thereby obtained, we combine them all into a sin-
                                                                        gle policy by folding the merge-policies operator over
            (forall ?t:Transmission . D ?t <==>                         L, with the empty-policy as the identity element. It is
           (f i1 ?t R1 ci1 ) & ... & (f ik ?t Rk cik ))                 with respect to this merged policy that we tested transmis-
                                                                        sion requests. The requests were generated randomly.
       where D is either allow or deny, and ∀ j ∈ {1, . . . , k}:          Figure 1 shows the SMT-solving times for processing plain
       f ij is a transmission parameter; Rj ∈ {<, >, <=, >=};           transmission requests (i.e., without weights attached to the
       and cij is a constant number of the appropriate sort. We         various transmission parameters), for various combinations
       refer to the ordering constraints (f ij ?t Rj cij ) as           of policy-set sizes and transmission parameter numbers, where
       the atoms of the AC (or DC). An ordering/N policy                the policy sets are evenly mixed as described above. Fig-
       may be permissive, prohibitive, or mixed, but its total          ure 2 shows the corresponding times for optimal processing
       number of atoms (i.e., the number of the AC’s atoms              of transmission requests, i.e., requests that attach weights to
       plus the number of the DC’s atoms) must equal N .                each of the transmission parameters and are solved as Max-
                                                                        SAT problems. In order to stress-test the implementation
     • Equational/N policies. The AC and/or DC of such a
                                                                        further, we repeated the experiments with sets of policies
       policy is of the same form as shown above, except that
                                                                        that were more structurally complex, doing away with sim-
       each Ri is the identity relation. These identities are the
                                                                        ple ordering, equational, and inequational policies, and us-
       atoms of the condition. The total number of atoms (of
                                                                        ing instead only range and disjunctive-conjunctive policies.
       the AC and DC together) must be N .
                                                                        The corresponding results are shown in Figure 3 and in Fig-
     • Inequational/N policies. Same as above, except that
       the relation in question here is inequality.
     • Range/N policies. The AC and/or DC of such a policy
       is of the same form as that of an ordering/N policy, ex-
       cept that each Ri is the relation in, and each constant
       ci is a range [a b].
     • Disjunctive-Conjunctive/N policies. The AC and/or
       DC of such a policy is a disjunction of conjunctions,
       where each atom is of the form (f x ?t in [· · · ]).
       The number of atoms are required to add up to N .
   A call of the form (make-policy-set N P) returns
a list L containing P policies, each of which involves N trans-         Figure 2: Max-SAT SMT solving times for optimized
mission parameters. This list is roughly equally partitioned            processing of transmission requests.


      DIFTS 2011                                                    9
                                                                        to get the timing data. In most cases, the reported system
                                                                        time (the sys entry of time’s output) was 0.000, i.e., too
                                                                        small to be reliably measured, and hence the time reported
                                                                        here is real (wall clock) time, which includes time seg-
                                                                        ments spent by other processes and times during which the
                                                                        SMT solver was blocked (e.g., waiting for I/O to complete).
                                                                        Therefore, the times reported here are overly conservative;
                                                                        the actual times spent on SMT solving are smaller.

                                                                        6.   RELATED WORK & CONCLUSIONS
Figure 3: Solving times for processing of transmission                     We have presented an implementation of a policy system
requests with structurally complex policies.                            for dynamic spectrum access. Policies are represented and
                                                                        manipulated in Athena, a formal proof system for first-order
                                                                        logic with polymorphism, subsorting, inductive datatypes,
ure 4. With these sort of structurally complex policies in              arbitrary computation, and definitional extension. Most of
place, the highest increase occurs in optimal processing of             these features have proven useful in the engine’s implemen-
transmission requests with more than 50 policies and param-             tation; others would become useful if the engine were to be
eters.                                                                  extended so that it used, e.g., structured ontologies instead
   The graphs show only the time spent on SMT solving,                  of flat data types.
which is the bulk of the work that needs to be done when pro-              Previous work in this area includes BBN’s XGPLF [8] and
cessing transmission requests. We do not include the time               SRI’s Coral [10, 11]. XGPLF does not have a formal se-
spent on translating from Athena to SMT notation and back.              mantics and is limited in what it can express (it is based on
This translation is a straightforward linear-time algorithm             OWL [19]). Moreover, XGPLF cannot model inheritance.
(linear in the size of the formula to be translated, on average,        To the best of our knowledge, the only implementation of
since it uses hash tables for the necessary mappings between            a policy reasoning engine based on XGPLF is the one built
Athena and SMT namespaces), and the time spent on it in                 by the Shared Spectrum Company [12]. They used SWI-
most cases is neglibible. For huge policies containing hun-             Prolog as the underlying reasoning engine. In field tests us-
dreds of thousands of nodes, the translation does take longer,          ing resource-limited radio devices, SSC replaced the Prolog
though still typically a fraction of a second. Virtually all of         reasoner with “a simplified reasoner developed in C/C++”
the translation cost is incurred when translating the policy,           [12, p. 504]; no details are provided regarding the design,
since the transmission requests are tiny by comparison. Ob-             implementation, or correctness of this simplified reasoner.
serve, however, that there is no reason to be translating the           SSC’s engine can only return yes/no answers to transmission
entire policy anew each time the engine needs to process a              requests.
new request. Instead, the (merged) policy can be translated                Coral (Cognitive Radio Policy Language) is a new lan-
off-line, once, and subsequently cached in SMT notation in              guage specifically designed for expressing policies govern-
some file. Thus, the translation time is an one-time, off-line          ing the behavior of cognitive radios. Like Athena, Coral of-
cost.                                                                   fers a rich and extensible first-order logical language that
   All experiments were performed on a 2.4 GHz Intel Core               includes arithmetic and allows for the introduction and def-
i3-370M dual-core processor with 4GB RAM, running Win-                  inition of new function symbols. It also features subtyping
dows 7. The time command of the Cygwin shell was used                   and can therefore express inheritance and ontologies. Some
                                                                        notable differences from Athena include: (a) Unlike Coral,
                                                                        Athena has polymorphism. Thus, e.g., it is not necessary
                                                                        to introduce lists for integers and lists for booleans sepa-
                                                                        rately; a generic parameterized definition is sufficient. (b)
                                                                        Athena has automatic sort inference based on the Hindley-
                                                                        Milner algorithm, which is convenient in practice because
                                                                        it allows for shorter—and usually cleaner—specifications.
                                                                        (c) Athena has a general-purpose formally defined program-
                                                                        ming language that can seamlessly interact with its logical
                                                                        layer, and which can be used to dynamically construct and
                                                                        manipulate logical formulas very succinctly. This program-
                                                                        ming language offers procedural abstraction, as well as side
                                                                        effects through mutation, including reference cells and vec-
Figure 4: Max-SAT SMT solving times for transmission                    tors, features which are often important for efficiency. The
requests with structurally complex policies.                            ability to compute with terms and sentences as primitive data


    DIFTS 2011                                                     10
values was very useful. While Coral can express some com-           system that was recently implemented at Virginia Tech [1,
putations via universally quantified identities which can then      5]. Unlike the other systems discussed above, Bresap is lim-
be interpreted as executable rewrite rules by a tool such as        ited to policies that can be adequately represented as finite
Maude [16], it does not offer procedural abstraction, i.e., it      Boolean functions, which it encodes as binary decision di-
is not possible to define arbitrary procedures.                     agrams (BDDs). Therefore, unlike Coral and our system,
   Two policy engines based on Coral have been implemented          Bresap is not suitable as an expressive language for specify-
[10, 11]. The initial implementation used Prolog as the un-         ing policies. For instance, it has no facilities for introducing
derlying reasoning engine, and was only able to return yes/no       new policy concepts and/or rules. Indeed, Bresap is not a
answers to transmission requests. The second (and current)          language. It is a system that reads certain types of policies
implementation uses a custom-made proof system to rea-              expressed in XML and converts them to BDDs. This is done
son about transmission requests. The system has four kinds          by assigning a distinct Boolean variable to each atomic ex-
of proof rules: forward chaining, backward chaining, par-           pression that appears in the policy. For instance, a policy
tial evaluation based on conditional rewriting, and constraint      such as “allow transmission in the frequency range a . . . b
propagation and simplification. The proof system is im-             if the power does not exceed m” would result in the intro-
plemented in the rewriting engine Maude. One issue with             duction of two distinct Boolean variables x1 and x2 , with
this implementation is that a positive answer is given only         x1 corresponding to the atom a ≤ f ≤ b (where f stands
if the transmission request is an exact match of the opera-         for the transmission’s frequency); and with x2 correspond-
tive policy conditions. But that is not likely to be the case       ing to the atomic expression p ≤ m (where p stands for the
in practice. Most transmission requests are likely to be in-        transmission’s power). The accepting condition could then
complete or to diverge from the policy, at least in some small      be represented by the boolean function x1 ∧ x2 . Graph al-
degree. In such cases, the Coral implementation does not re-        gorithms are used to merge different policy BDDs. With a
turn values for the relevant transmission parameters. Rather,       single BDD in place representing the result of merging all
it returns an arbitrarily complicated first-order formula rep-      active policies, Bresap can then accept an incoming trans-
resenting all the possible “opportunity constraints” that the       mission request and evaluate it. A transmission request must
radio could use to modify its request so as to make it per-         be an attribute-value list of the form a1 = v1 , . . . , an = vn ,
missible. But that is not likely to be of much use to the           where ai is a transmission parameter (such as mode or fre-
requesting party. It is not realistic to require that whoever       quency) and vi is the corresponding desired value. Notably,
made the transmission request should be able to understand          Bresap is capable of handling underspecified (incomplete)
and reason about arbitrarily complicated logical formulas in        requests. It is also capable of attaching costs to transmission
order to understand the policy system’s output.                     parameters (where a given cost indicates the penalty paid for
   A second issue with this implementation is efficiency. The       changing the value of that parameter), and then modifying
Coral team has not published precise benchmarks describing          the parameter values of a transmission request in a way that
their engine’s performance for variable numbers and types of        minimizes the overall cost.
policies (and for variable numbers of transmission parame-             A drawback of Bresap is that the underlying policy repre-
ters), but they have released a demo of their implementation        sentation framework, that of BDDs (or propositional logic,
that can be used to evaluate transmission requests with re-         more generally), lacks semantics for the numeric constraints
spect to policies that have a fixed set of transmission param-      that pervade spectrum access policies. To put it simply, BDDs
eters, namely: location (latitude and longitude); time (hours,      do not know arithmetic, and thus do not have the where-
minutes, and seconds); sensed power; emissions; network             withal to “understand” that < is transitive, that 1 + x is
id; and role (slave/master). Even with this fairly small set of     greater than x, that 10 and 14 − 4 are the same object, and
transmission parameters, and with only 11 active policies in        so on. Whereas an SMT solver immediately realizes that
place, it has been reported [1, p. 18] that evaluating a sin-       f < 10 conjoined with f ≥ 30 is contradictory, because
gle transmission request took the Coral engine 58 seconds,          it uses a dedicated reasoning procedure for arithmetic, in
and resulted in an output formula comprising 75 constraints.        the approach of Bresap these two atoms would result in the
In the preceding section we saw that our implementation can         introduction of two Boolean variables, x1 and x2 , the for-
evaluate transmission requests with several dozens of param-        mer representing f < 10 and the latter representing f ≥
eters and with over 50 complex policies in place in a fraction      30. Thus, the conjunction of x1 and x2 is represented by
of a second; this is so even when the requests are processed        the innocuous-looking Boolean function x1 ∧ x2 , a perfectly
optimally. Moreover, the Coral engine has no notion of opti-        consistent condition. To represent the fact that in the present
mality. It does not allow the requesting party to specify that      context this is actually inconsistent, one has to append to it
some parameters are more important than others, or to ask           ad hoc clauses such as C = (¬x1 ∨ ¬x2 ) ∧ (x1 ∨ x2 ). This
that the request should be satisfied to the greatest extent pos-    extra information, C, conjoined with x1 ∧ x2 , would then al-
sible, or with as little divergence from the requested values       low us to conclude that the condition is unsatisfiable. That is,
as possible.                                                        in fact, what Bresap does: It “semantically analyzes” the var-
   Another policy engine for spectrum access is Bresap, a           ious atomic expressions in the policy in order to generate ad-


    DIFTS 2011                                                 11
ditional constraints—so-called “auxiliary rules” [1, section                        Verification (CAV ’07), volume 4590 of Lecture Notes
3.5]—that prevent the engine from taking “illogical” paths                          in Computer Science, pages 298–302.
in the BDD. This approach has some disadvantages.3 First,                           Springer-Verlag, 2007. Berlin, Germany.
it is an essentially ad hoc encoding of arithmetic facts, the                   [8] BBN Technologies. XG Working Group, XG Policy
generation of which should not be part of the policy engine’s                       Language Framework, Request for Comments, version
trusted computing base. Second, the additional encoded in-                          1.0. www.ir.bbn.com/projects/xmac/pollang.html,
formation, even if it is polynomial in size, can significantly                      2004.
blow up the resulting BDD. Other efficiency issues in Bre-                      [9] Darpa. News Release for Next Generation (XG)
sap include the conversion algorithm that produces the BDD                          Communications Program Request for Comments.
from the atomic Boolean expressions of the policy. This al-                         www.darpa.mil/news/2004/xg_jun_04.pdf.
gorithm has exponential time complexity in the number of                       [10] G. Denker, D. Elenius, R. Senanayake, M.-O. Stehr,
expression variables. As a result, Bresap is not currently                          and D. Wilkins. A Policy Engine for Spectrum
able to handle policies with more than 20 atomic Boolean                            Sharing. In New Frontiers in Dynamic Spectrum
expressions [1, p. 83].                                                             Access Networks, 2007. DySPAN 2007. 2nd IEEE
   In conclusion, we believe that the combination of a pro-                         International Symposium on, pages 55 –65, April
grammable and expressive formal framework such as Athena                            2007.
with an efficient SMT solver is a highly suitable implemen-                    [11] D. Elenius, G. Denker, M. O. Stehr, R. Senanayake,
tation vehicle for spectrum access policy engines. It com-                          C. Talcott, and D. Wilkins. CoRaL–Policy Language
bines rich expressivity with efficient performance, a consid-                       and Reasoning Techniques for Spectrum Policies.
eration that is likely to be crucial for cognitive radios. To                       Policies for Distributed Systems and Networks, IEEE
the best of our knowledge, SMT solvers have not been used                           International Workshop on, pages 261–265, 2007.
before for reasoning about policies, although we believe that                  [12] F. Perich. Policy-Based Network Management for
they are ideal for this task. Indeed, we believe that there                         NeXt Generation Spectrum Access Control. In New
is not much that is special about spectrum access, and that                         Frontiers in Dynamic Spectrum Access Networks,
the same approach we have introduced here could be used to                          2007. DySPAN 2007. 2nd IEEE International
represent and reason about policies in other domains.                               Symposium on, pages 496 –506, April 2007.
                                                                               [13] FCC Spectrum Policy Task Force. Report of the
7.    REFERENCES                                                                    spectrum efficiency working group. Available from
                                                                                    www.fcc.gov/sptf/files/SEWGFinalReport_1.pdf.
 [1] A. A. Deshpande.
                                                                               [14] I. F. Akyildiz and W.-Y. Lee and M. C. Vuran and S.
     Policy Reasoning for Spectrum Agile Radios. Masters
                                                                                    Mohanty. NeXt Generation / Dynamic Spectrum
     thesis, Electrical and Computer Engineering
                                                                                    Access / Cognitive Radio Wireless Networks: A
     Department, Virginia Tech, 2010.
                                                                                    Survey. Computer Networks Journal (Elsevier),
 [2] A. Cimatti and A. Franzn and A. Griggio and R.
                                                                                    50:2127–2159, September 2006.
     Sebastiani and C. Stenico. Satisfiability Modulo the
                                                                               [15] L. de Moura and B. Dutertre and N. Shankar. A
     Theory of Costs: Foundations and Applications. In
                                                                                    Tutorial on Satisfiability Modulo Theories. In
     TACAS 2010, pages 99–113, 2010.
                                                                                    Computer Aided Verification, volume 4590 of LNCS,
 [3] K. Arkoudas. Athena.
                                                                                    pages 20–36. Springer, 2007.
     http://www.pac.csail.mit.edu/athena.
                                                                               [16] M. Clavel and F. Durán and S. Eker and P. Lincoln and
 [4] B. A. Fette. Cognitive Radio Technology. Academic
                                                                                    N. Martı́-Oliet and J. Meseguer and C. Talcott. The
     Press, 2nd edition, 2009.
                                                                                    Maude 2.0 System. In R. Nieuwenhuis, editor,
 [5] B. Bahrak and A. A. Deshpande and M. Whitaker and
                                                                                    Rewriting Techniques and Applications (RTA 2003),
     J. M. Park. BRESAP: A Policy Reasoner for
                                                                                    number 2706 in LNCS, pages 76–87. Springer, 2003.
     Processing Spectrum Access Policies Represented by
                                                                               [17] Q. Zhao. A Survey of Dynamic Spectrum Access.
     Binary Decision Diagrams. In New Frontiers in
                                                                                    IEEE Signal Processing Magazine, 24(3):79–89,
     Dynamic Spectrum, 2010 IEEE Symposium on, pages
                                                                                    2007.
     1 –12, April 2010.
                                                                               [18] R. Chadha and L. Kant. Policy-Driven Mobile Ad hoc
 [6] B. Dutertre and L. de Moura. The Yices SMT Solver.
                                                                                    Network Management. Wiley-IEEE Press, 2007.
     Tool paper, available online from
                                                                               [19] World Wide Web Consortium. OWL 2 Web Ontology
     yices.csl.sri.com/tool-paper.pdf.
                                                                                    Language Document Overview. Available from
 [7] C. Barrett and C. Tinelli. CVC3. In W. Damm and
                                                                                    www.w3.org/TR/owl2-overview/, 2009.
     H. Hermanns, editors, Proceedings of the 19th
     International Conference on Computer Aided
3
  Note that this criticism is not peculiar to Bresap. It applies to all
policy engines that represent semantically rich policies (requiring
at least linear arithmetic) as Boolean functions.


     DIFTS 2011                                                           12
       Data Structure Choices for On-the-Fly Model
             Checking of Real-Time Systems
                          Peter Fontana                                             Rance Cleaveland
               Department of Computer Science                              Department of Computer Science
              University of Maryland, College Park                        University of Maryland, College Park
                 Email: pfontana@cs.umd.edu                                    Email: rance@cs.umd.edu


   Abstract—This paper studies the performance of sparse-            checking based on proof search: a formula correspond-
matrix-based data structures to represent clock zones (con-          ing to the assertion that a timed automaton satisfies a
vex sets of clock values) in an on-the-fly predicate equation        mu-calculus property can be checked in a goal-driven
system model checker for timed automata. We analyze the              fashion to determine its truth. Zhang and Cleaveland [6]
impact of replacing the dense difference bound matrix
                                                                     demonstrated the efficiency of this approach vis à vis
(DBM) with both the linked-list CRDZone and array-list
CRDArray data structure. From analysis on the paired-
                                                                     other real-time model-checking approaches.
example-by-example differences in time performance, we                  In this paper we consider the special model check-
infer the DBM is either competitive with or slightly faster          ing case of timed automata and timed modal equation
than the CRDZone, and both perform faster than the                   systems representing safety properties (also studied in
CRDArray. Using similar analysis on space performance,               [6]), for which there are still many opportunities for
we infer the CRDZone takes the least space, and the DBM              performance improvements. One component of such a
takes less space than the CRDArray.                                  model checker that has a noticeable influence on perfor-
                                                                     mance is the data structure for the sets of clock values.
                    I. I NTRODUCTION                                 When analyzing safety properties, each desired set of
   Automatic verification of real-time systems is under-             clock values forms a convex set of clock values, or clock
taken using notations for verifiable programs and check-             zone (see Definition 3). The conventional way to store
able specifications (see [1]–[6]). One common program                a clock zone is as a difference bound matrix (DBM)
notation is timed automata [7]. There are specification              (see Definition 4) [15], which stores the constraints as
notations such as timed computation tree logic (TCTL)                a matrix. This approach is used by UPPAAL [16] and
[1], [8] and timed extensions of a modal mu-calculus,                described in [17]. To potentially save space and time,
including one in [3] and another given in [5]. Specifica-            instead of representing the set of constraints as a matrix,
tions in a timed modal mu-calculus can be written as lists           one can represent the set as an ordered linked path of
of equations, known as timed modal equation systems                  constraints where any clock difference not on the path
[5], [6], [9]. For information on the untimed modal-mu               has the implicit constraint < ∞. If we generalize this
calculi, see [10]–[12], and see [10], [11] for information           to allow for a union of zones to be represented by a
on modal equation systems.                                           directed graph of constraints (representing a tree of paths
   One approach to model checking timed automata                     as opposed to a single path), we get a clock restriction
with timed modal mu-calculus specifications is to use                diagram (CRD) [18]. If we compress the nodes to have
predicate equation systems (PES), which were invented                them represent upper and lower bound constraints as well
independently by Groote and Willemse (as parameter-                  as explicitly encoding both valid and invalid paths, we
ized boolean equation systems) [13] and by Zhang and                 get a clock difference diagram (CDD) [2]. These two
Cleaveland [6], [9]. Predicate equation systems provide              structures are extensions of binary decision diagrams
a general framework for program models including para-               (BDDs) (see [19] for information).
metric timed automata [6] and Presburger systems [14].                  To improve performance, we take the above idea of
They also admit a natural on-the-fly approach to model               a linked implementation and incorporate the sparseness
                                                                     of the implementation of CRDs while simplifying (or
  Research supported by NSF Grant CCF-0820072.                       shrinking) the structure to only support a single clock


     DIFTS 2011                                                 13
zone (CRDs and CDDs in general can encode unions of                  • I : L −→ Φ(CX) gives a clock constraint for each
clock zones). This simplified structure is a sparse sorted             location l. I(l) is called the invariant of l.
linked-list implementation of a DBM, the CRDZone (see                • E ⊆ L × Σ × Φ(CX) × 2CX × L is the set of edges.
Definition 5). We also implement an array-list version                 In an edge e = (l, a, φ, Y, l0 ) from l to l0 with action
of the CRDZone, the CRDArray (see Definition 6). A                     a, φ ∈ Φ(CX) is the guard of e, and Y represents
CRDZone may be seen as a sparse sorted linked-list                     the set of clocks to reset to 0.
implementation of a DBM, and the CRDArray a sparse
array-list implementation of the CRDZone. We examine                 Some sources [6], [23] and our PES checker allow
the time and space performance of all three clock zone            clock assignments (x1 := x2 ) in addition to clock resets
implementations: the matrix DBM, linked-list CRDZone              on edges, other sources [17] allow constraints on clock
and array-list CRDArray.                                          differences and other sources [1] allow states to be
   The contributions of this paper are:                           labelled with atomic propositions that each state satisfies.
                                                                     Timed automata use clock valuations ν ∈ V (V =
   • We run experiments comparing time and space
                                                                  CX −→ R≥0 is the set of all clock valuations), which
     performance of a model checker (on safety (reach-
                                                                  at any moment stores a non-negative real value for each
     ability) properties) with the DBM, CRDZone and
                                                                  clock x ∈ CX . The semantics of a timed automaton are
     CRDArray data structure implementations.
                                                                  described as an infinite-state machine, where each state
   • We formalize and extend the analysis style per-
                                                                  is a location-valuation pair (l, ν). Transitions represent
     formed in the model checking experiments of [2],
                                                                  either time advances or edge executions (performing an
     [6], [9], [18], [20], [21] by utilizing paired data
                                                                  action). For a formal definition of the semantics of a
     (each implementation checked the same examples)
                                                                  timed automaton, see [7].
     and applying descriptive statistics on the paired
     example-by-example differences on time and space
     consumption. See Section VI for details on the
                                                                       Real-Time Model Checking
                                                                  Example 1 (Example of a timed automaton). Consider
                                                                  the timed automaton in Figure 1, which models a train
     statistics and Section VI-B for the analysis.                in the generalized railroad crossing (GRC) protocol.
   After analyzing the experimental results, for time per-
                                                                                   exit, x1 > 1
formance we infer the DBM is either competitive with or
slightly faster than the CRDZone and both perform faster
                                                                         0: far                       1: near                  2: in
than the CRDArray for the examples in this experiment.                            approach, x1 := 0   x1 < 4    in, x1 = 4,   x1 < 15
In terms of space, we infer the CRDZone takes up the                                                            x1 := 0

least space, and the DBM and takes less space than the
                                                                    TCTL
                                                                  Fig.        (Invalid):
                                                                       1. Timed   automatonAF
                                                                                            TA<∞ [near ∨in]
                                                                                               1 , a model of a train in the generalized
CRDArray for the examples in this experiment.
                                                                  railroad crossing (GRC) protocol.
                                                                    TCTL (Valid): AG<∞[near ! AF!TP+TDU[far]]
     II. P ROGRAM M ODEL AND S PECIFICATIONS
                                                                     There are three locations—0: far (initial location),
A. Timed Automata                                                 1: near and 2: in, with one clock x1 . There are the actions
                                                                                                                             2

  A timed automaton encodes the behavior of a real-time           approach, in and exit in Σ. Here, location 1 has the
system [7], [22].                                                 invariant x1 ≤ 4 while 0 has no invariant. The edge
                                                                  (1: near, in, x1 = 4, {x1 },2: in) has the guard x1 = 4
Definition 1 (Clock constraint φ ∈ Φ(CX)). Given a set            and resets x1 to 0.
of clocks CX , a clock constraint φ is constructed with
the following grammar, where xi is a clock and c ∈ Z:             B. Modal Equation Systems (MES)
                                                                     We use a modal equation system (MES) to represent
    φ ::= xi < c | xi ≤ c | xi > c | xi ≥ c | φ ∧ φ
                                                                  real-time temporal properties that timed automata can
Φ(CX) is the set of all possible clock constraints.               possess. A MES is an ordered list of equations with
                                                                  variables on the left hand side and basic timed temporal
Definition 2 (Timed automaton). A timed automaton                 logical formulae on the right. Each equation involves a
T A = (L, L0 , Σ, CX, I, E) is a tuple where:                     variable X , a basic formula φ and a greatest fixpoint (ν )
  • L is a finite set of locations with the initial set of        or a least fixpoint (µ), and the equation is labeled with
                                                                                   ν          µ
    locations L0 ⊆ L.                                             the fixpoint (X = φ or X = φ). For a formal definition
  • Σ is the set of actions and CX is the set of clocks.          of MES syntax and semantics, see [6], [9].


     DIFTS 2011                                              14
Example 2 (Continuation of Example 1). Again consider                   While converting to a canonical form takes a consid-
the timed automaton in Figure 1. The MES                             erable amount of time, it is needed to simplify and stan-
                   ν                                                 dardize the algorithms for the zone operations includ-
              X1 = far ∧ ∀([ − ](X1 ))                    (1)        ing time successor (succ(z)) computations and subset
represents the safety property “the train is always in               checks. For time successor, having the zone in canonical
state 0: far”, read as “the variable X1 is the greatest              form allows the time elapse operation to simply set all
fixpoint of being in state 0: far and for all time advances          single-clock upper bound constraints to < ∞.
(∀), for all next actions ([ − ]), X1 is true.” This is an           B. Clock Zone Data Structures: Difference Bound Ma-
invalid specification for the timed automaton because the            trix (DBM), CRDZone and CRDArray
execution
                                                                       One way to represent a clock zone is a difference
                       2.5                                           bound matrix (DBM), described in Definition 4. See [15],
  (0: far, [xi = 0]) −→ (0: far, [xi = 2.5])
                  approach                        2
                                                                     [17] for a more thorough description.
                   −→        (1: near, [xi = 0]) −→ . . . (2)
                                                                     Definition 4 (Difference bound matrix (DBM)). Let
reaches location 1: near and thus violates the property.             n − 1 be the number of clocks. A DBM is an n × n
                                                                     matrix where entry (i, j) is the upper bound of the clock
  III. DATA S TRUCTURES FOR C LOCK VALUE S ETS                       constraint xi − xj , represented as xi − xj ≤ uij or
A. Clock Zones                                                       xi − xj < uij . The 0th index is reserved for a dummy
                                                                     clock x0 , which is always 0, allowing bounds on single
  This definition of a clock zone is taken from [7], [19].
                                                                     clocks to be represented by the clock differences xi − x0
Definition 3 (Clock zone). A clock zone is a convex                  and x0 − xj . See Figure 2 for a picture of the DBM
combination of single-clock inequalities. Each clock                 structure and Example 4 for a concrete example.
zone can be constructed using the following grammar,
where xi and xj are arbitrary clocks and c ∈ Z:                      Definition 5 (CRDZone). A CRDZone is a sparse sorted
                                                                     linked-list representation of a clock zone. Each constraint
  Z ::= xi < c | xi ≤ c | xi > c | xi ≥ c                            is encoded like a constraint in a DBM, as an upper bound
              | xi − xj < c | xi − xj ≤ c | Z ∧ Z (3)                constraint on xi − xj , labeled as (i, j), with x0 always
                                                                     being 0. The CRDZone has these properties:
                                                                        1) Nodes are in lexicographical order of clock con-
   Clock zones extend clock constraints with inequalities                  straint: (i1 , j1 ) ≺ (i2 , j2 ) iff i1 < i2 or (i1 = i2
of clock differences. These inequalities are used for                      and j1 < j2 ).
model checking even though clock difference inequali-                   2) The (0, 0) node is always given to ensure a univer-
ties are not used in timed automata. Moreover, in general,                 sal head node with an initial value of x0 − x0 ≤ 0.
the representation of a clock zone is not unique.                       3) If a CRDZone node (i, j) is missing then the zone
                                                                           has an implicit constraint: (i, i) : xi − xi ≤ 0 and
Example 3. Let z = 1 ≤ x1 < 3 ∧ 0 ≤ x2 ≤ 5.                                (i, j), i 6= j : xi − xj < ∞.
There is the implicitly encoded constraint x2 − x1 ≤ 4.                 See Figure 3 for a picture of the CRDZone structure
To see this, consider the longer path of constraints (x0             and Example 4 for a concrete example.
is a dummy clock that always has value 0):
                                                                       This lexicographical ordering is the same ordering
                  x2 − x0 ≤ 5         (x2 ≤ 5)
                                                                     used in CDDs and CRDs [2], [18]. While the ordering
             +    x0 − x1 ≤ −1        (1 ≤ x1 )
                  x2 − x1 ≤ 4
                                                                                                        j
   To provide a standardized, or canonical, form for clock                            z              }|           {
                                                                                     (. . . . . .       ...
zone representations, we use shortest path closure [17].
This form makes every implicit constraint explicit. This                            i . . . . . .       ...      
can be implemented in O(n3 ) time using Floyd-Warshall                                  . . . . . . xi − xj ≤ uij
all-pairs shortest path algorithm, described in [24], [25].
Other standard forms exist [18], [20].                               Fig. 2. DBM: a matrix with constraint xi − xj ≤ uij in entry (i, j).


     DIFTS 2011                                                 15
is conjectured to influence performance, this is the only                  the number of nodes looked at during a full traversal.
ordering we implemented. Likewise, having different                        This can speed up traversal-based algorithms such as
implicit constraints, such as making xi − x0 ≤ 0 (for                      intersect and subset check. However, algorithms like
all i) implicit, is conjectured to influence performance.                  clock reset, emptiness check and canonical form use
                                                                           O(1) access of middle nodes in DBMs (the CRDZone
Definition 6 (CRDArray). The CRDArray is an array list
                                                                           and CRDArray do not have O(1) access for all nodes),
implementation of the CRDZone. Thus, instead of using
                                                                           resulting in a performance slowdown for those CRDZone
linked nodes, we use an array to store the nodes with
                                                                           and CRDArray methods. For space, the CRDZone and
the 0th element being the head. We statically allocate
                                                                           CRDArray can store fewer nodes but must store the
the array to hold the maximum number of elements and
                                                                           explicit indices, resulting in more space per node.
store a back-pointer to the back of the array list. See
Figure 4 for a picture of the CRDArray structure and                       IV. O N -T HE -F LY M ODEL C HECKING : C ONVERTING
Example 4 for a concrete example.                                                     TO A PES AND P ROOF S EARCH
                                                      Our model checker takes in a predicate equation
  Using a dynamic allocation instead of our static allo-
                                                   system (PES) (taken from [9], [13]), which is a general
cation for the CRDArray array list is conjectured to save
space at the expense of time.                      framework representing logical expressions that involve
                                                   fixpoints and first order quantifiers. We take a timed
Example 4 (Clock zone in various representations). automaton and a MES and convert it to a PES. Currently
Consider the clock zone from Example 3, which is the PES model checker can only check safety properties,
z = 1 ≤ x1 < 3 ∧ 0 ≤ x2 ≤ 5 ∧ x2 − x1 ≤ 4.         which involve only greatest fixpoints in both the PES
  DBM representation of z :                        and the input MES. For more information on a PES,
                                              
      x0 − x0 ≤ 0 x0 − x1 ≤ −1 x0 − x2 ≤ 0         including its syntax, semantics and how to convert a
    x1 − x0 < 3 x1 − x1 ≤ 0 x1 − x2 < ∞          timed automaton and a MES to a PES, see [6], [9].
      x2 − x0 ≤ 5 x2 − x1 ≤ 4      x2 − x2 ≤ 0        The model checker takes the conclusion sequent (the
  CRDZone representation of z :                    sequent   we wish to prove) and applies proof rules in
                                                   a recursive goal-driven tree-like fashion on the premise
   x0 − x0 ≤ 0 −→ x0 − x1 ≤ −1 −→ x0 − x2 ≤ 0 sequents, trying to prove each premise sequent until it
                                                   reaches a proof rule with no premise (called a leaf) or
                                                   a circularity (a previously seen premise sequent). When
 −→ x1 − x0 < 3 −→ x2 − x0 ≤ 5 −→ x2 − x1 ≤ 4 checking a proof, we will often encounter circularity.
  CRDArray representation of z :                   In general, when the circularity reached is a greatest
                                                   fixpoint, we can stop and declare the proof branch valid.
       [x0 − x0 ≤ 0|x0 − x1 ≤ −1|x0 − x2 ≤ 0       For the formal conditions for circularity and the proof
        |x1 − x0 < 3|x2 − x0 ≤ 5|x2 − x1 ≤ 4]      rules, see [6], [9].

Remark 1 (On DBM vs. CRDZone and CRDArray                                     V. E XPERIMENTS : VARIOUS DATA S TRUCTURE
methods). Due to the sparse implementation and removal                                    I MPLEMENTATIONS
of implicit nodes, the CRDZone and CRDArray can                               We compare the DBM implementation to the CRD-
improve time by reducing the number of nodes, and thus                     Zone and CRDArray implementations. Each implemen-
                                                                           tation uses shortest path closure to compute canonical
                                                                           form. The only difference in the DBM, CRDZone and
            −→ . . . −→ xi − xj ≤ uij −→                     ...           CRDArray versions is the data structure implementation.
                                                                           Remark 2 (On our analysis approach). We ask: what does
Fig. 3. CRDZone: A linked list with nodes in lexicographical order         it mean for an implementation to perform better than
of constraint xi − xj ≤ uij .                                              another? We consider consider better to be measured
                                                                           in the number (or percentage) of examples that one
            [      |      | . . . |xi − xj ≤ uij | . . .]                  system outperforms another in. The larger aim is for
                                                                           any implementation, if we were to know all the examples
Fig. 4. CRDArray: An array list with nodes in lexicographical order        that it would run (including and beyond the experiment
of constraint xi − xj ≤ uij .                                              examples), we would like one implementation to perform


      DIFTS 2011                                                      16
                                                                                               TABLE I
(strictly) better for at least 51% of this hypothetical set.             D ESCRIPTIVE S TATISTICS FOR PAIRED DBM − ( MINUS )
This influences our analysis.                                            CRDZ ONE EXAMPLES , FOR TIME ( S ) AND SPACE (MB).
   Given our meaning of better in Remark 2, we con-
                                                                      Statistic            DBM − CRD-         DBM − CRD-
sider the median, 25% and 75% percentile values as                                         Zone (Time)        Zone (Space)
insights into typical examples and use the histograms                 Mean                 -67.55             34.96
to get a bigger picture of the sample distribution of the             Standard Deviation   428.35             212.65
performance differences for the experiment, and weigh                 25% Percentile       -1.24              0.00
these more heavily than the mean and standard deviation               Median               0.00               1.85
                                                                      75% Percentile       0.06               25.70
values. The mean and the standard deviation provide us
with an alternative picture of the overall performance and
                                                                                              TABLE II
give hints of either a unusual sample distribution (since               D ESCRIPTIVE S TATISTICS FOR PAIRED DBM − ( MINUS )
in a symmetric distribution the mean equals the median)                 CRDA RRAY EXAMPLES , FOR TIME ( S ) AND SPACE (MB).
or the presence of potential outliers.
   The benchmark choice was modeled off of [6], with                  Statistic            DBM − CRDAr-       DBM − CRDAr-
the addition of a model of the generalized railroad                                        ray (Time)         ray (Space)
crossing (GRC) protocol [26]. We also used all the                    Mean                 -112.95            -47.75
                                                                      Standard Deviation   655.65             235.63
protocols in [6], which are the Carrier Sense, Multi-                 25% Percentile       -3.16              -20.54
ple Access with Collision Detection (CSMA/CD), the                    Median               -0.29              -2.81
Fiber Distributed Data Interface (FDDI), Fischer’s Mu-                75% Percentile       0.00               -0.01
tual Exclusion (FISCHER), the Leader Election protocol
(LEADER and LBOUND) and the PATHO Operating                                                    TABLE III
System (PATHOS) protocol, where each of these proto-                   D ESCRIPTIVE S TATISTICS FOR PAIRED CRDZ ONE − ( MINUS )
                                                                         CRDA RRAY EXAMPLES , FOR TIME ( S ) AND SPACE (MB).
cols is described some in [6]. There are 53 benchmarks
that ran on each implementation.                                      Statistic            CRDZone − CR-      CRDZone − CR-
   Experiments were run on a Linux machine with a                                          DArray (Time)      DArray (Space)
3.4 GHz Intel Pentium 4 Dual Processor (each a single                 Mean                 -45.40             -82.71
core) with 4 GB RAM. Time and space measurements                      Standard Deviation   229.06             160.91
(maximum space used) were made using the memtime                      25% Percentile       -2.02              -52.67
                                                                      Median               -0.21              -19.35
(http://www.update.uu.se/∼johanb/memtime/) tool (using                75% Percentile       -0.03              -1.63
time elapsed and Max VSize). The data tables are
in the Appendix.
     VI. S TATISTICS , A NALYSIS AND D ISCUSSION                     the DBM, CRDZone and CRDArray. Figures 5, 6 and
A. Histograms and Descriptive Statistics                             7 have histograms that plot the overall time and space
   Running the different data structure implementations              differences between the DBM, CRDZone and CRDArray
with the same examples yields paired data. Hence,                    implementations. Bin colors and are changed to help
we can take the two implementations and pair them                    more easily find the -0.001 to 0.001 (equal performance,
example-by-example on their time and space differences               since our measurement precision is 0.01 units), and -0.25
to analyze their performance. When we pair the DBM −                 to -0.001 and 0.001 to 0.25 bins (slight differences).
CRDZone samples, we take the DBM measurement and                        We do not use 95% confidence intervals, paired two-
subtract the CRDZone measurement for the same exam-                  sample hypothesis (z ) tests or ANOVA (Analysis of
ple to get a DBM − CRDZone paired data point. For                    Variance) because the independence assumption of the
instance, the MUX-5-a paired point is -0.92s, 1.94MB,                samples (the example benchmarks) does not hold. Fur-
since the DBM point is 1.22s, 14.67MB, and the CRD-                  thermore, we do not use a Wilcoxon signed-rank test
Zone point is 2.14s, 12.73MB. Pairings are likewise done             for the median because the symmetry assumption of the
to obtain the paired samples for DBM − CRDArray and                  distribution is not believed to hold, and thus we cannot
CRDZone − CRDArray. For more information, see a                      analyze the hypothetical benchmark distribution referred
Statistics text such as [27].                                        to in Remark 2. We do use paired sampling since we have
   Tables I, II and III contain descriptive statistics on the        its only requirement—perfect correlation of the samples.
paired difference in example-by-example performance of               More information is in [27].


     DIFTS 2011                                                 17
                                               DBM – CRDZone Time (s)	

                                                                                                                CRDZone – CRDArray Time (s)	

                  16	

                                                        15	

                                                                             16	

                                                15	

                  14	

                                                                                                                                          14	

                                                        12	

       Count	

   12	

                                                                                                                                          12	


                                                                                                                                                      Count	

                  10	

                         8	

                                                                                                             10	

                                 8	

    8	

                   8	

                                         6	

                   6	

                                                                       8	

                         6	

                   6	

                                                4	

                          5	

                                                         6	

                   4	

                 2	

            3	

                                  2	

                                                                4	

                  3	

                   2	

   0	

   1	

                                                                        1	

    0	

   0	

   0	

                           2	

    0	

   1	

                                                 0	

   0	

   0	

   0	

   0	

   0	

   0	

                   0	

                                                                                                                                           0	

                       -5 o - 	

                            To 0	


                              To 	

                             0 T 	

                   -0 To -1	

                  -0 To 25	


                     0. o 0 	

                              o 	

                             25 	


                             1 T 1	


                             To 	

                                   0	


                                   e	


                                                                                                                                                                      -5 o - 	

                                                                                                                                                                           To 0	


                                                                                                                                                                             To 	

                                                                                                                                                                                   	
                                                                                                                                                                  -0 To 1	

                                                                                                                                                                 -0 To 25	


                                                                                                                                                                    0. o 0 	

                                                                                                                                                                             o 	

                                                                                                                                                                            25 	


                                                                                                                                                                            1 T 1	


                                                                                                                                                                            To 	

                                                                                                                                                                                  0	


                                                                                                                                                                              M 	

                                                                                                                                                                                  e	

                          0 T 00


                          00 00

                           -1 -10


                           T 01
                           1 T 01

                           0. .25


                          10 10


                          0 T 00

                                  00


                                                                                                                                                                         0 T 00


                                                                                                                                                                         00 00

                                                                                                                                                                          -1 -10


                                                                                                                                                                          T 01
                                                                                                                                                                          1 T 01

                                                                                                                                                                          0. .25


                                                                                                                                                                         10 10


                                                                                                                                                                         0 T 00

                                                                                                                                                                                 00
                                 or


                                                                                                                                                                                or
                         00 50


                         10 10


                                                                                                                                                                        00 50


                                                                                                                                                                        -1 To -


                                                                                                                                                                        10 10
                                To


                                                                                                                                                                               To
                        00 50


                        -1 -1


                         5 -0.
                        01 .0

                        00 .0


                       50 To 5

                               50


                                                                                                                                                                       00 50


                                                                                                                                                                       -1 -1


                                                                                                                                                                        5 -0.
                                                                                                                                                                       01 .0

                                                                                                                                                                       00 .0


                                                                                                                                                                      50 To 5

                                                                                                                                                                              50
                         -1 o


                                0


                                                                                                                                                                               0
                                o


                                                                                                                                                                               o
                               M
                     .0 -0


                                                                                                                                                                    .0 -0
                    -5 o -


                                                                                                                                                                   -5 o -


                                                                                                                                                                            0
                             o


                                                                                                                                                                            o
                           0


                                                                                                                                                                          0
                         pT


                                                                                                                                                                        pT
              U


                                                                                                                                                             U
                      .2


                                                                                                                                                                     .2
                                          DBM – CRDZone Space (MB)	

                                                                                                               CRDZone – CRDArray Space (MB)	

                                                                       12	

                                                                                                                   18	

                  12	

                                                                              11	

                                                                                             16	

                                                                                                             10	

                  10	

                                                                                                                                          15	

       Count	


                                                                                                                                                      Count	

                   8	

                                                                                                                                                                                                9	

                   6	

                                 5	

                                                                5	

                                 10	

           7	

                                                 4	

                   4	

                                                                       2	

                   2	

                                 1	

   1	

                                                                                                                       5	

                 2	

                   2	

   0	

                                  0	

            0	

   0	

                                        0	

                                   0	

                                 1	

            0	

   0	

   0	

   0	

   0	

   0	

   0	

   0	

                   0	

                                                                                                                                            0	

                       -2 o - 	

                            To 6	


                              To 	

                             6 T 	

                   -0 To -1	

                  -0 To .25	


                     0. o 0 	

                              o 	

                             25 	


                             1 T 1	


                             To 	

                                   4	


                                   e	

                          8 T 48


                          24 24

                           -1 -16


                           T 01
                           1 T 01

                           0. .25


                          16 16


                          6 T 56

                                  48


                                                                                                                                                                      -2 o - 	

                                                                                                                                                                           To 6	


                                                                                                                                                                             To 	

                                                                                                                                                                                   	
                                                                                                                                                                  -0 To 1	

                                                                                                                                                                 -0 To .25	


                                                                                                                                                                    0. o 0 1	

                                                                                                                                                                             o 	

                                                                                                                                                                            25 	

                                                                                                                                                                            1 T 1	


                                                                                                                                                                      25 To 2 	

                                                                                                                                                                                    	
                                                                                                                                                                              M 	

                                                                                                                                                                                  e	

                                                                                                                                                                         8 T 48


                                                                                                                                                                         24 24

                                                                                                                                                                          -1 -16


                                                                                                                                                                          1 T 01
                                                                                                                                                                          0. .25


                                                                                                                                                                         16 o 16

                                                                                                                                                                                  4
                                                                                                                                                                         6 T 56

                                                                                                                                                                                 48
                                 or
                         56 25


                         12 12


                                                                                                                                                                                or
                                To


                                                                                                                                                                        56 25


                                                                                                                                                                        -1 To -


                                                                                                                                                                          T 0


                                                                                                                                                                        12 o 12
                        04 20


                        -1 -1


                        01 .0

                        00 .0


                       25 To 2

                               20


                                                                                                                                                                               To
                         -1 o
                         5 -0


                                0


                                o


                                                                                                                                                                       04 20


                                                                                                                                                                       -1 -1


                                                                                                                                                                       01 .0
                                                                                                                                                                       00 .0


                                                                                                                                                                              20
                               M


                                                                                                                                                                        5 -0


                                                                                                                                                                               0
                     .0 -0
                    -2 o -


                                                                                                                                                                    .0 -0
                             o


                                                                                                                                                                   -2 o -


                                                                                                                                                                            T
                                                                                                                                                                            6


                                                                                                                                                                            o
                           4
                         pT


                                                                                                                                                                          4
                                                                                                                                                                        pT
              U


                      .2


                                                                                                                                                             U


                                                                                                                                                                     .2
Fig. 5. Histograms comparing the DBM − (minus) CRDZone time                                                                                    Fig. 7. Histograms comparing the CRDZone − (minus) CRDArray
(s) (top) and space (MB) (bottom) differences.                                                                                                 time (s) (top) and space (MB) (bottom) differences.

                                               DBM – CRDArray Time (s)	

                                                                               12	

                  12	

                  10	

                                         9	

                                                                           0.25MB less space for 28 such examples and more than
       Count	


                   8	

                         7	

    7	

           7	

                   6	

                                                                              4	

                                                                                                                                               0.25MB space for only 11 examples. Thus (even though
                   4	

                 3	

                   2	

   0	

                                 1	

                                                  1	

   1	

           1	

                                                                                                                     0	

   0	

   0	

        57% is not a large majority), we infer the CRDZone
                   0	

                                                                                                                                               takes less space overall for this benchmark.
                       -5 o - 	

                            To 0	


                              To 	

                             0 T 	

                   -0 To -1	

                  -0 To 25	


                     0. o 0 	

                              o 	

                             25 	


                             1 T 1	


                             To 	

                                   0	


                                   e	

                          0 T 00


                          00 00

                           -1 -10


                           T 01
                           1 T 01

                           0. .25


                          10 10


                          0 T 00

                                  00
                                 or
                         00 50


                         10 10
                                To
                        00 50


                        -1 -1


                         5 - 0.
                        01 . 0

                        00 . 0


                       50 To 5

                               50
                         -1 o


                                0


                                o


                               M
                     .0 -0


                                                                                                                                                  2) DBM vs. CRDArray: The CRDArray performs
                    -5 o -


                             o
                           0
                         pT
              U


                      .2


                                                                                                                                               slower for 64% of the tested examples (at least as slow
                                         DBM – CRDArray Space (MB)	

                                                        14	

                                                                                                                                               for 89%) with a median difference of 0.29s slower and
                  14	

                  12	

                         11	

                                                                       9	

                                                                                                                                               a mean difference of 112.95s slower. Thus we infer the
       Count	


                  10	

                   8	

                   6	

                                         4	

                                                                                                             6	

                              CRDArray performs slower overall for this benchmark.
                   4	

                 3	

                                                         3	

                   2	

                   0	

                          0	

                                 1	

                                                                               0	

                                                                                       1	

                                                                                              0	

                                                                                                                     1	

                                                                                                                            0	

   0	

           The CRDArray takes more space for 77% (at least as
                                                                                                                                               much space for 77%) of the examples with a median
                       -2 o - 	

                            To 6	


                              To 	

                             6 T 	

                   -0 To -1	

                  -0 To 25	


                     0. o 0 	

                              o 	

                             25 	


                             1 T 1	


                             To 	

                                   4	


                                   e	

                          8 T 48


                          24 24

                           -1 -16


                           T 01
                           1 T 01

                           0. .25


                          16 16


                          6 T 56

                                  48
                                 or
                         56 25


                         12 12
                                To
                        04 20


                        -1 -1


                         5 - 0.
                        01 . 0

                        00 . 0


                       25 To 2

                               20
                         -1 o


                                0


                                o


                               M
                     .0 -0
                    -2 o -


                                                                                                                                               amount of 2.81MB more space and mean amount of
                             o
                           4
                         pT
              U


                      .2


                                                                                                                                               47.75MB more. Thus we infer the CRDArray takes more
                                                                                                                                               space overall for this benchmark.
Fig. 6. Histograms comparing the DBM − (minus) CRDArray time
(s) (top) and space (MB) (bottom) differences.                                                                                                    3) CRDZone vs. CRDArray: The CRDArray performs
                                                                                                                                               slower for 77% of the tested examples (at least as slow
                                                                                                                                               for 100%) with a median difference of 0.21s slower and
B. Analysis of Results                                                                                                                         a mean difference of 45.40s slower. Thus we infer the
                                                                                                                                               CRDArray is slower overall for this benchmark.
   1) DBM vs. CRDZone: The CRDZone performs
slower for 45% of the tested examples (at least as slow                                                                                           The CRDArray takes more space for 100% of the
for 74%) with a median difference of 0.00s slower, while                                                                                       examples with a median amount of 19.35MB more space
the CRDZone has a mean difference of 67.55s slower.                                                                                            and a mean amount of 82.71MB more. Thus we infer the
Thus, we infer the CRDZone is either slightly slower or                                                                                        CRDArray takes more space overall for this benchmark.
competitive to the DBM for this benchmark, but due
to insufficient evidence (45% of the examples is not                                                                                           C. Discussion of Results
enough) do not infer that the DBM performs strictly                                                                                              The CRDZone and CRDArray method that converts
faster than the CRDZone.                                                                                                                       zones to canonical form was implemented using array-
   The CRDZone takes less space for 57% of the tested                                                                                          based algorithms, where it temporarily converts the clock
examples (at most as much space for 57%) with a median                                                                                         zone to and from a matrix (the algorithm is similar to
amount of 1.85MB less space and a mean amount                                                                                                  a DBM algorithm for those methods). It is possible that
of 34.96MB less space. The CRDZone takes at least                                                                                              performance can be improved by trying an algorithm that


     DIFTS 2011                                                                                                                           18
does not require copying to and from a matrix. All time          [21], but this experiment compares data structures for
vs. space tradeoffs were taken to save time.                     non-convex sets of clock valuations (unions of clock
   Furthermore, we ran a CRDZone execution twice (first          zones), and the comparison is across different tools with
execution reported) and compared the distribution of             different model checking approaches, and not the same
their differences to get an idea of noise and/or mea-            tool with different data structures.
surement error. The differences in the histograms are
                                                                        VII. C ONCLUSIONS AND F UTURE W ORK
larger than the differences of the noisy implementation,
so thus we suspect the differences in performances are              Here are our conclusions from the experiment:
due to more than just uncertain measurement/noisy data.             1) Time: (DBM ≤t CRDZone) <t CRDArray). For
However, slight differences (≤ 0.25s or ≤ 0.25MB)                       this benchmark, we infer that the DBM is either
may be due to execution noise or examples that require                  competitive with or slightly faster than the CRD-
very little time and/or space. The PATHOS-7-b is one                    Zone and both perform faster than the CRDArray.
such resource-light example, taking at most 0.10s and                   There is insufficient evidence to conclude that the
2.89MB for each implementation. In contrast, the MUX-                   DBM is strictly faster.
7-a example is resource-heavy, being the only example               2) Space: (CRDZone <s DBM) <s CRDArray. For
to takes more than 2000s for each implementation.                       this benchmark, we infer that the CRDZone takes
   When profiling the implementations using gprof                       the least amount of space and the DBM takes less
(http://www.gnu.org/s/binutils/), CRDZones take more                    space than the CRDArray for this experiment.
time than DBMs for clock resets and emptiness checks,               For potential reasons for performance differences and
but the invariant checking method (mostly clock zone             analysis of some theoretical differences, see Remark 1.
intersections) takes less time for CRDZones than DBMs.              Future work is to model check least fixpoint (µ)
   From the data (see the Appendix), we notice that the          equations, to implement all of the proof rules given in
data structure implementation choice has a noticeable            [6], [9] and to model check any PES. Comparing lists of
influence on model checking performance for specific             DBMs to a CRD or CDD to improve the performance of
examples. To see this, consider the examples MUX-7-a,            the expanded implementation is future work, as well as
where the DBM finished 3118.94 seconds earlier than the          comparing different kinds of standard forms and different
CRDZone, and LEADER-100-c, where the CRDZone                     CRDZone node orderings.
checked the example in 55% of the time required for
                                                                                    ACKNOWLEDGEMENTS
the DBM. There are also noticeable differences relative
to the CRDArray.                                                    We thank Dezhuang Zhang for providing the code base
                                                                 [6] and for his insights.
D. Related Work
                                                                                          R EFERENCES
   A different way of modeling programs using discrete-
                                                                  [1] R. Alur, C. Courcoubetis, and D. Dill, “Model-Checking in
time is discussed in [4]. PES are in [9], and they have               Dense Real-Time,” Inf. Comput., vol. 104, pp. 2–34, 1993.
been used to model check various systems in [6], [9],             [2] G. Behrmann, K. G. Larsen, J. Pearson, C. Weise, and W. Yi,
[14]. Difference bound matrices originated from [15]                  “Efficient Timed Reachability Analysis Using Clock Difference
and are used in various studies such as in those in [17],             Diagrams,” in CAV ’99, vol. 1633. Springer Berlin / Heidel-
                                                                      berg, 1999.
[23]. Other tools that can model check timed automata             [3] T. Henzinger, X. Nicollin, J. Sifakis, and S. Yovine, “Symbolic
(safety and liveness properties) include KRONOS [23],                 Model Checking for Real-time Systems,” Inf. Comput., vol. 111,
UPPAAL [16], RED [18] and Rabbit [28]. We built                       pp. 193–244, 1994.
                                                                  [4] L. Lamport, “Real-Time Model Checking Is Really Simple,”
upon Zhang’s implementation using predicate equation                  Correct Hardware Design and Verification Methods, pp. 162
systems in [6], [9], which supports safety properties. A              –175, 2005.
similar experiment involving using a reduced canonical            [5] O. V. Sokolsky and S. A. Smolka, “Local model checking for
form is in [20], which focuses on the influence of                    real-time systems,” in CAV ’95. Springer-Verlag, 1995, pp.
                                                                      211–224.
different standard clock zone forms instead of comparing          [6] D. Zhang and R. Cleaveland, “Fast Generic Model-Checking
list vs. array implementations.                                       for Data-Based Systems,” in FORTE, F. Wang, Ed., vol. 3731.
   A remark on a sparse DBM representation saving                     Springer, 2005, pp. 83–97.
                                                                  [7] R. Alur, “Timed Automata,” in CAV ’99.            London, UK:
space, with neither a mention on time nor experimental
                                                                      Springer-Verlag, 1999, pp. 8–22.
data, is given in [17]. There is an experiment comparing          [8] C. Baier and J.-P. Katoen, Principles of Model Checking.
CDDs to another BDD-like structure used by Rabbit in                  Cambridge, MA: The MIT Press, 2008.


     DIFTS 2011                                             19
                                                                                                    TABLE IV
 [9] D. Zhang and R. Cleaveland, “Fast On-the-Fly Parametric Real-            E XPERIMENT R ESULTS —A E XAMPLES —T IME ( S ): CORRECT
     Time Model Checking,” in RTSS ’05. Washington, DC, USA:                             SYSTEM , CORRECT SPECIFICATION .
     IEEE Computer Society, 2005, pp. 157–166.
[10] J. Bradfield and C. Stirling, “Local Model Checking for Infinite         Example         DBM          CRDZone          CRDArray
     State Spaces,” Theor. Comput. Sci., vol. 96, no. 1, pp. 157–174,
     1992.                                                                    CSMACD-3-a       0.10      0.20 (200%)      0.20 (200%)
                                                                              CSMACD-4-a       3.16      4.48 (142%)      6.50 (206%)
[11] R. Cleaveland and B. Steffen, “A Linear-Time Model-Checking
     Algorithm for the Alternation-Free Modal Mu-Calculus,” Form.             FDDI-20-a        2.04      3.03 (149%)      4.66 (228%)
     Methods Syst. Des., vol. 2, no. 2, pp. 121–147, 1993.                    FDDI-40-a       58.49      79.2 (135%)    126.82 (217%)
[12] E. A. Emerson and C. L. Lei, “Efficient Model Checking in                FDDI-50-a      169.66     230.7 (136%)    370.71 (219%)
     Fragments of the Propositional Mu-Calculus,” in LICS ’86.                MUX-5-a          1.22      2.14 (175%)      2.75 (225%)
     IEEE Computer Society Press, 1986, pp. 267–278.                          MUX-6-a         35.49     74.44 (210%)     98.08 (276%)
[13] J. F. Groote and T. A. Willemse, “Parameterised Boolean                  MUX-7-a       2623.61   5742.55 (219%)   7383.73 (281%)
     Equation Systems,” Theoretical Computer Science, vol. 343,               LEADER-6-a       0.41      0.71 (173%)      0.92 (224%)
     no. 3, pp. 332 – 369, 2005.                                              LEADER-7-a      12.99     25.89 (199%)     34.22 (263%)
[14] D. Zhang and R. Cleaveland, “Efficient Temporal-Logic Query              LBOUND-6-a       0.51      1.02 (200%)      1.32 (259%)
     Checking for Presburger Systems,” in ASE ’05. New York,
                                                                              LBOUND-7-a      17.36     37.07 (214%)     49.64 (286%)
     NY, USA: ACM, 2005, pp. 24–33.
[15] D. L. Dill, “Timing Assumptions and Verification of Finite-              PATHOS-4-a       13.7     35.23 (257%)     50.58 (369%)
     State Concurrent Systems,” in Proceedings of the International           GRC-3-a          0.92      1.63 (177%)      2.12 (230%)
     Workshop on Automatic Verification Methods for Finite State              GRC-4-a        252.05    431.63 (171%)    748.01 (297%)
     Systems. London, UK: Springer-Verlag, 1990, pp. 197–212.
[16] G. Behrmann, A. David, and K. G. Larsen, “A Tutorial on                                         TABLE V
     Uppaal,” in Formal Methods for the Design of Real-Time                  E XPERIMENT R ESULTS —A E XAMPLES —S PACE (MB): CORRECT
     Systems, vol. 3185. Springer Berlin / Heidelberg, 2004, pp.                          SYSTEM , CORRECT SPECIFICATION .
     200–236.
[17] J. Bengtsson and W. Yi, “Timed Automata: Semantics, Al-                  Example        DBM           CRDZone          CRDArray
     gorithms,” in Lecture Notes in Computer Science, vol. 3098.              CSMACD-3-a      2.88       7.55 (262%)     11.02 (382%)
     Springer, 2004, pp. 87–124.                                              CSMACD-4-a    209.97      104.47 (50%)     179.53 (86%)
[18] F. Wang, “Efficient Verification of Timed Automata with BDD-             FDDI-20-a       5.96       9.00 (151%)     13.57 (227%)
     Like Data-Structures,” in VMCAI 2003. London, UK: Springer-
                                                                              FDDI-40-a      27.55      57.24 (208%)    100.30 (364%)
     Verlag, 2003, pp. 189–205.
                                                                              FDDI-50-a      53.91     116.79 (217%)    209.29 (388%)
[19] E. M. Clarke, O. Grumberg, and D. A. Peled, Model Checking.
     The MIT Press, 1999.                                                     MUX-5-a        14.57       12.73 (87%)     18.55 (127%)
                                                                              MUX-6-a        84.05     116.35 (138%)    168.38 (200%)
[20] K. Larsen, F. Larsson, P. Pettersson, and W. Yi, “Efficient
     Verification of Real-Time Systems: Compact Data Structure                MUX-7-a       625.42    1667.94 (267%)   2302.39 (368%)
     and State-Space Reduction,” in RTSS 1997. IEEE Computer                  LEADER-6-a      3.57       6.59 (185%)      7.82 (219%)
     Society, December 1997, pp. 14–24.                                       LEADER-7-a     20.98     104.02 (496%)    133.39 (636%)
[21] D. Beyer and A. Noack, “Can decision diagrams overcome state             LBOUND-6-a      3.93       8.66 (220%)     10.39 (264%)
     space explosion in real-time verification?” in FORTE 2003.               LBOUND-7-a     27.89     157.54 (565%)    199.99 (717%)
     Springer Berlin / Heidelberg, 2003, vol. 2767, pp. 193–208.              PATHOS-4-a     40.73       38.11 (94%)     57.45 (141%)
[22] E.-R. Olderog and H. Dierks, Real-Time Systems: Formal Spec-
                                                                              GRC-3-a        10.48        7.87 (75%)     11.23 (107%)
     ification and Automatic Verification. Cambridge University
                                                                              GRC-4-a       318.22      220.64 (69%)    355.02 (112%)
     Press, 2008.
[23] S. Yovine, “Model Checking Timed Automata,” in Lectures on
     Embedded Systems, European Educational Forum, School on
     Embedded Systems. London, UK: Springer-Verlag, 1998, pp.                                     A PPENDIX
     114–152.
                                                                                             E XPERIMENTAL DATA
[24] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Network Flows:
     Theory, Algorithms and Applications. Prentice-Hall, 1993.                 For the experiments, we use three kinds of examples:
[25] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,                • Valid A Examples (in Tables IV and V): Correct
     Introduction to Algorithms, 2nd ed. MIT Press, 2001.
                                                                                 system implementations with valid safety specifica-
[26] C. L. Heitmeyer, B. G. Labaw, and R. D. Jeffords, “A Bench-
     mark for Comparing Different Approaches for Specifying and                  tions.
     Verifying Real-Time Systems,” Naval Research Laboratory,                  • Invalid B Examples (in Tables VI and VII): A
     Tech. Rep. ADA462244, 1993.                                                 examples with invalid specifications.
[27] J. L. Devore, Probability and Statistics for Engineering and the
                                                                               • Invalid C Examples (in Tables VIII and IX):
     Sciences, 6th ed. Duxbury Press, 2003.
[28] D. Beyer, C. Lewerentz, and A. Noack, “Rabbit: A tool for bdd-              A examples with buggy implementations of the
     based verification of real-time systems,” in CAV ’03. Springer              systems that do not satisfy the A specifications.
     Berlin / Heidelberg, 2003, vol. 2725, pp. 122–125.


      DIFTS 2011                                                        20
                        TABLE VI                                                        TABLE VIII
 E XPERIMENT R ESULTS —B E XAMPLES —T IME ( S ): CORRECT           E XPERIMENT R ESULTS —C E XAMPLES —T IME ( S ): BUGGY
             SYSTEM , INVALID SPECIFICATION .                                SYSTEM , CORRECT SPECIFICATION .


   Example         DBM         CRDZone         CRDArray            Example         DBM          CRDZone         CRDArray
   CSMACD-4-b       0.10     0.10 (100%)     0.20 (200%)           CSMACD-6-c       0.51       0.41 (80%)     0.51 (100%)
   CSMACD-5-b       0.51     0.51 (100%)     0.71 (139%)           CSMACD-7-c       2.03       1.82 (90%)     2.03 (100%)
   CSMACD-6-b       3.35      2.73 (81%)     3.97 (119%)           CSMACD-8-c       9.55       8.42 (88%)     9.55 (100%)
   FDDI-30-b        1.53     1.53 (100%)     2.23 (146%)           FDDI-30-c        0.51       0.41 (80%)      0.41 (80%)
   FDDI-40-b        4.66      4.58 (98%)     6.60 (142%)           FDDI-40-c        1.52       0.92 (61%)      1.01 (66%)
   FDDI-60-b        8.64      5.07 (59%)      5.48 (63%)           FDDI-60-c        6.71       3.98 (59%)      4.17 (62%)
   MUX-20-b         0.41     0.41 (100%)     0.51 (124%)           MUX-6-c        139.02    258.32 (186%)   401.84 (289%)
   MUX-30-b         0.92      0.91 (99%)     1.21 (132%)           LEADER-60-c      6.81       3.96 (58%)      4.06 (60%)
   MUX-40-b         1.93      1.73 (90%)     2.23 (116%)           LEADER-70-c     14.42       8.12 (56%)      8.13 (56%)
   LEADER-10-b      0.10     0.10 (100%)     0.10 (100%)           LEADER-100-c    82.94      45.78 (55%)     45.88 (55%)
   LEADER-20-b      0.10     0.20 (200%)     0.20 (200%)           LBOUND-6-c       0.10      0.10 (100%)     0.20 (200%)
   LBOUND-10-b      0.10     0.10 (100%)     0.20 (200%)           LBOUND-7-c       0.61      0.81 (133%)     1.12 (184%)
   LBOUND-40-b      6.82    17.46 (256%)    29.54 (433%)           LBOUND-8-c      12.48     32.00 (256%)     52.9 (424%)
   PATHOS-7-b       0.10     0.10 (100%)     0.10 (100%)           PATHOS-5-c       0.10      0.10 (100%)     0.10 (100%)
   PATHOS-8-b       0.10     0.10 (100%)     0.10 (100%)           PATHOS-6-c       0.10      0.10 (100%)     0.10 (100%)
   PATHOS-9-b       0.10     0.10 (100%)     0.10 (100%)           PATHOS-7-c       0.10      0.10 (100%)     0.10 (100%)
   GRC-3-b          0.10     0.10 (100%)     0.10 (100%)           GRC-3-c          0.10      0.10 (100%)     0.10 (100%)
   GRC-4-b          0.51     0.61 (120%)     0.82 (161%)           GRC-4-c          0.51      0.81 (159%)     1.02 (200%)
   GRC-5-b          9.75     13.4 (137%)       19 (195%)           GRC-5-c          9.65     13.31 (138%)    18.88 (196%)

                                                                                         TABLE IX
                        TABLE VII
                                                                  E XPERIMENT R ESULTS —C E XAMPLES —S PACE (MB): BUGGY
E XPERIMENT R ESULTS —B E XAMPLES —S PACE (MB): CORRECT
                                                                              SYSTEM , CORRECT SPECIFICATION .
             SYSTEM , INVALID SPECIFICATION .

                                                                  Example           DBM         CRDZone          CRDArray
  Example           DBM         CRDZone         CRDArray
                                                                  CSMACD-6-c        85.32     18.53 (22%)      79.30 (93%)
  CSMACD-4-b         2.88     2.89 (100%)    13.67 (474%)
                                                                  CSMACD-7-c       337.43    191.84 (57%)     320.89 (95%)
  CSMACD-5-b       144.14     72.38 (50%)    123.52 (86%)
                                                                  CSMACD-8-c      1369.07    787.75 (58%)    1338.62 (98%)
  CSMACD-6-b      1134.30    553.21 (49%)    961.90 (85%)
                                                                  FDDI-30-c          4.88      4.60 (94%)      9.93 (203%)
  FDDI-30-b          9.60      9.06 (94%)    19.08 (199%)
                                                                  FDDI-40-c          9.55      6.41 (67%)     19.63 (206%)
  FDDI-40-b         17.19     16.03 (93%)    39.00 (227%)
                                                                  FDDI-60-c         24.07     14.24 (59%)     55.57 (231%)
  FDDI-60-b         27.85     14.53 (52%)    63.52 (228%)
                                                                  MUX-6-c         1607.73   1047.64 (65%)   1723.25 (107%)
  MUX-20-b          19.37     11.15 (58%)     16.96 (88%)
                                                                  LEADER-60-c       29.24     10.79 (37%)     63.66 (218%)
  MUX-30-b          28.28     16.87 (60%)    28.58 (101%)
                                                                  LEADER-70-c       51.18     15.30 (30%)    108.80 (213%)
  MUX-40-b          43.01     21.85 (51%)    42.81 (100%)
                                                                  LEADER-100-c     203.89     40.65 (20%)    431.71 (212%)
  LEADER-10-b        2.88     2.89 (100%)     2.89 (100%)
                                                                  LBOUND-6-c         2.88     2.89 (100%)      4.48 (155%)
  LEADER-20-b        2.88     4.59 (159%)     5.69 (197%)
                                                                  LBOUND-7-c        12.52     10.34 (83%)     14.42 (115%)
  LBOUND-10-b        2.88     2.89 (100%)     3.38 (117%)         LBOUND-8-c        75.66     59.23 (78%)     90.48 (120%)
  LBOUND-40-b       18.29     15.23 (83%)    30.73 (168%)
                                                                  PATHOS-5-c         2.88     2.89 (100%)      2.89 (100%)
  PATHOS-7-b         2.88     2.89 (100%)     2.89 (100%)         PATHOS-6-c         2.88     2.89 (100%)      2.89 (100%)
  PATHOS-8-b         2.88     2.89 (100%)     2.89 (100%)         PATHOS-7-c         2.88     2.89 (100%)      2.89 (100%)
  PATHOS-9-b         2.88     2.89 (100%)     2.89 (100%)
                                                                  GRC-3-c            2.88     2.89 (100%)      2.89 (100%)
  GRC-3-b            2.88     2.89 (100%)     2.89 (100%)         GRC-4-c           58.74     35.94 (61%)     59.47 (101%)
  GRC-4-b           58.74     32.08 (55%)     53.42 (91%)         GRC-5-c          717.29    379.35 (53%)     647.94 (90%)
  GRC-5-b          717.21    379.44 (53%)    648.00 (90%)


                                                                 in parenthesis. Time data is given to the nearest 0.01s
   The experimental data for the 53 example benchmarks           (second) and space data is given to the nearest 0.01MB
is provided in Tables IV, V, VI, VII, VIII and IX,               (Megabyte). Given the percentage rounding, sometimes
with the best entry(ies) in each row bolded and per-             an example with slightly different performance may still
centage change relative to the DBM, to the nearest %,            have a 100% value.


    DIFTS 2011                                              21
      metaSMT: Focus On Your Application Not On
                  Solver Integration
  Finn Haedicke              Stefan Frehse          Görschwin Fey        Daniel Große                                Rolf Drechsler
                                            Institute of Computer Science
                                  University of Bremen, 28359 Bremen, Germany
                           {finn,sfrehse,fey,grosse,drechsle}@informatik.uni-bremen.de
   Abstract—Decision procedures are used as core tech-     check for satisfiability. Furthermore, advanced features
nique in many applications today. In this context, auto-   are available, e.g., computing interpolants which are
mated reasoning based on Satisfiability Modulo Theories    utilized in SAT-based Model Checking [2]. Moreover
(SMT) is very effective. However, developers have to decidelearnt information generated while reasoning can be
which concrete engine to use and how to integrate the      reused very efficiently in consecutive reasoning processes
engine into the application. Even if file formats like SMT-
LIB standardize the input of many engines, advanced        to prune the search space [12]. Usually this can only be
features remain unused and the integration of the engine   done by calling native functions which access the learnt
is left to the programmer.                                 information.
   This work presents metaSMT, a framework that inte-         This work presents metaSMT a publicly available, easy
grates advanced reasoning engines into the program code    to use and powerful tool1 which provides an integration
of the respective application. metaSMT provides an easy to of the native Application Programming Interface (API) of
use language that allows engine independent programming    modern reasoning engines into C++ code. The advantages
while gaining from high performance reasoning engines.     of metaSMT are: (1) engine independence through effi-
   State-of-the-art solvers for satisfiability and other theo-
ries are available for the user via metaSMT with minimal   cient abstraction layers (2) simple use of various decision
programming effort. For two examples we show how           procedures (3) extensibility in terms of input language
metaSMT is used in current research projects.              and reasoning engines (4) customizability in terms of
                                                           optimization and infrastructure (5) translation of the input
                     I. I NTRODUCTION                      language into native engine calls at compile time.
   In recent years, formal methods have become attractive     The remaining work is structured as follows: Section II
to solve complex computational hard problems. Decision     gives a basic introduction into SMT and the programming
procedures are applied in many applications, like e.g.,    methods    used in metaSMT. Afterwards an example
Model Checking [1], [2], Synthesis [3], [4] and, Automatic of a metaSMT-based application is given before in
Test Pattern Generation (ATPG) [5].                        Section IV the architecture of metaSMT is described.
   Despite the successful research and application of Section V describes how this architecture is implemented
decision procedures, the increasing complexity of soft- in metaSMT and Section VI gives an empirical evaluation
ware and hardware systems demands for more effective of metaSMT including its use in current research projects.
reasoning engines to overcome complexity issues. In The work closes with conclusions.
the last years, solvers for Satisfiability Modulo Theo-                        II. P RELIMINARIES
ries (SMT) have been developed. Different theories are
combined to formulate the problem. Various works gave         This section provides background information. How-
empirical evidence that SMT reasoning engines increase     ever, basic knowledge of C++ is assumed.
the efficiency of formal methods [6], [7], [8].            A. Satisfiability Modulo Theories
   The performance of SMT reasoning engines remains an
active research topic. Annual SMT competitions [9], [10]      Boolean satisfiability is a decision problem, also known
show their advances. However, SMT reasoning engines as the SAT problem. The problem asks whether there
have different strengths on different problem instances. exists an assignment of Boolean variables such that the
Therefore, evaluating different engines with respect to a Boolean function evaluates to true. The problem has
given problem instance allows to find the best performing been proven NP-complete [13]. In spite of the huge
engine.                                                    complexity of the problem, sophisticated algorithms
   When using SMT in a concrete algorithm, the most and clever heuristics help to solve instances with many
common way is to generate a problem instance in thousands variables and clauses very efficiently. Usually,
SMT-LIB format [11]. Taking a user created SMT- SAT solvers work on a Conjunctive Normal Form (CNF)
LIB file as input, an SMT solver decides whether the of a Boolean function that is a disjunction of conjunctions
instance is satisfiable or unsatisfiable. However, many of literals, where each literal is variable or its negation.
solvers additionally have custom native interfaces. These     Satisfiability Modulo Theories is also a decision
interfaces are used to pass the instance to the engine and problem    but with more complex theories rather than
                                                           only propositional logic. A detailed introduction is given
  This work was supported in part by the German Federal Ministry
                                                                          1
of Education and Research (BMBF) within the project SANITAS                 Available online at http://www.informatik.uni-bremen.de/agra/eng/
under contract no. 01M3088.                                             metasmt.php


     DIFTS 2011                                                    22
      Listing 1. SMT instance for a · b = 21466342967             B. Boost.Proto
  (benchmark factorization.smt                                       The Boost project [16] is a collection of libraries that
    :logic QF_BV
    :extrafuns ((a BitVec[32]))                                   cover many features not included in the C++ standard
    :extrafuns ((b BitVec[32]))                                   library. In particular, Boost.Proto [17] provides tools to
                                                                  integrate Domain Specific Embedded Languages (DSEL)
      :assumption (not (= a bv1[32]))                             into C++. Given a programming language, a DSEL in
      :assumption (not (= b bv1[32]))
                                                                  that language is dedicated to a domain, e.g., parsing [18]
      :formula (=                                                 or vector arithmetic [19]. The DSEL provides a syntax
         bv21466342967[64]                                        designed for this domain and, therefore, is easier to handle
         (bvmul
           (zero_extend[32] a)                                    than the original language. Due to space constraints an
           (zero_extend[32] b)                                    in-depth description of Boost.Proto is omitted, the reader
      ))                                                          is referred to the Boost documentation.
  )                                                                  Technically, this work uses Boost.Proto to implement
                                                                  a domain specific language for SMT logics in metaSMT.
in [14]. Already available SMT-solvers handle complex
formulas. In addition to the logics the SMT-LIB standard                           III. M OTIVATING E XAMPLE
also specifies a textual format that is commonly used for              Before the subsequent sections give a complete descrip-
input files of the solvers. Listing 1 shows an example of          tion of metaSMT this section demonstrates how metaSMT
such an SMT file: Two variables, a, b are declared as bit-         is used in a complete example application.
vectors and constrained to be the two factors of a product             The usual integration of reasoning engines iteratively
resulting in the 64 bit number 21, 466, 342, 967. When             calls API functions to construct the problem instance.
called with this input, an SMT-QF BV solver outputs                However, this reduces the readability of the source code
satisfiable and e.g., the assignment a = 740, 218, 723 and         and makes it difficult to understand the program. A typical
b = 29.                                                            example can be constructed using the Boolector API for
   Moreover, in addition to the SMT-LIB format many                the C programming language. Figure 1 (a) shows the
solvers provide an Application Programming Inter-                  code to construct the simple constraint c = a · b using the
face (API) that exposes features like incremental solving,         Boolector C API (not including memory management).
where the SMT instance can be changed after the                   The same expression, written quite concise in the SMT-
satisfiability check. Learnt information about the instance        LIB format is shown in Figure 1 (b). The goal of metaSMT
is kept and reused. Consequently, using the API may                is to allow this compact syntax to be used in C++
increase the overall performance.                                  programs. Due to the limitations of C++ this requires
                                                                   adaptions. Most notably, the expressions cannot easily be
   An SMT solver can be utilized within an application in         written in a symbolic expression (S-expression) syntax as
different ways. Each of the following options has certain          in the SMT-LIB format, where each function is enclosed
advantages and disadvantages:                                      in parenthesis. The syntax for calling functions is used
 1) Generate an instance file according to SMT-LIB spec-           instead. Moreover, the solver is passed into the expression
    ification and call the solver. Different SMT solvers           using the context e.g., as btor_ctx in Figure 1 (c).
    can easily be evaluated. However, the instance                     In order to illustrate the programming interface of
    generation and result retrieval requires handling of           metaSMT an example written in C++ code is presented
    files and text within the application.                         in Listing 2. The listing shows the factorization of an
 2) Use the solver specific API to call an SMT solver’s            integer into two integers. More precisely, given an integer
    functions directly. In particular, incremental satis-          in bit-vector representation ~c ∈ B2n , compute two integers
    fiability can be exploited. But the application is            ~a, ~b ∈ Bn , such that ~a · ~b = ~c. To enforce non-trivial
    restricted to a specific solver.                               factorization, both integers a and b may not be 1. As
 3) Introduce an abstraction layer specific to the appli-          the bit-vectors ~a and ~b are zero-extended, no overflows
    cation. This technique combines high performance               are possible and as a result a valid assignment to the
    using incremental satisfiability with the ability to           constraint either gives a valid factorization of ~c or proves
    evaluate different solvers. However a custom layer             that it is prime. The integer value of ~c is randomly
    is not portable to different applications.                     chosen and changed in each iteration of a loop for
                                                                  10,000 iterations. A similar example is reconsidered in
   This work separates the programming model from                  the empirical evaluation.
the reasoning engine. With metaSMT the application                     The first line in Listing 2 defines the solver context,
specifies the instance in a simple, common notation. Many         which is not explained here but described later in this
reasoning engines are available without modifications of          work. The context specifies which reasoning engine to use
the algorithm.                                                     and how the input is handled by metaSMT. The next line
   The Java package jSMTLIB [15] provides an interface             is a user parameter, which defines the bit-vector width of
for the usage of different SMT solvers. The package reads,         the operands. The algorithm is therefore scalable to an
checks and generates SMT-LIB files and executes the                abitrary bit-width. In lines 4-6 the bit-vectors a, b and
respective SMT-solver executables. However, jSMTLIB                c are declared and initialized.
currently does not provide an embedded language for                    Lines 8 and 9 constrain the operands to be different
instance generation directly from the application.                 from 1 in bit-vector representation. In line 11 the


      DIFTS 2011                                             23
         boolector_assume( btor,                   :assumption                              assumption( btor_ctx,
           boolector_eq(btor, c,                       (= c (bvmul a b))                        equal(c, bvmul(a, b)) )
            boolector_mult(btor, a, b)
           ))
              (a) Boolector API calls                    (b) SMT-Lib format                       (c) metaSMT C++ Code
                                               Fig. 1.   Examples for c = a · b


        Listing 2. metaSMT factorization and prime test                            F RONTEND (C++)
 1 Context ctx;                                                               QF BV      Core      Array
 2 const unsigned width = <parameter>;
 3
 4 bitvector a = new_bitvector(width);
 5 bitvector b = new_bitvector(width);
 6 bitvector c = new_bitvector(width);                                                       M IDDLE - END
 7
 8 assertion(ctx, nequal(a, bvuint(1,width)) );
 9 assertion(ctx, nequal(b, bvuint(1,width)) );                           DirectSolver            BitBlast    GraphSolver
10
11 assertion( ctx, equal( zero_extend(width, c)                           SAT Clause              SAT Aiger      Groups
       , bvmul( zero_extend(width, a),
       zero_extend(width, b)) ));
12
13 for (unsigned i=0; i < 10000; ++i) {
14   unsigned r = random_number ( 2, 2ˆwidth -                                                 BACKEND
          1 );
15   assumption( ctx, equal(c, bvuint(r, 2*                               SWORD              Z3      MiniSAT       CUDD
          width)) );
16
17   if( solve( ctx ) ) {                                                      Boolector              PicoSAT     AIGER
18     unsigned a_value = read_value( ctx, a );
19     unsigned b_value = read_value( ctx, b );
20
21     printf("factorized %d into %d * %d\n", r                                                Solver API
             , a_value, b_value);
22   } else {
23     printf("%d is prime.", r);                                                 Fig. 2.    metaSMT layer Architecture
24   }
25 }                                                               the input languages, defined in the SMT-LIB format
                                                                   (e.g., Core Boolean logic, QF BV). The middle-end
multiplication ~a · ~b = ~c is constrained. However the            layer provides translations, intermediate representation
multiplication is done in double width to avoid overflows.         and optimizations of the input expressions. Optionally,
   These constraints are identical for each iteration of           expressions can directly be passed to the the backend
the loop starting in line 13, therefore they are declared          layer where the solvers are integrated via their native API.
outside the loop as an assertion, which is permanent.              Various configurations of middle-ends with backends are
   Inside the loop, in lines 14-15 c is set equal to a random      possible. The frontends allow to combine each translation
number from 2 to 2width − 1 using an assumption,                   middle-end with any compatible backend. However, not
which is only valid for the next satisfiability check of the       every backend supports every logic. Therefore, some
solver, i.e., for one loop iteration.                              middle-ends supply emulation or translations to other
   After setting up the SMT instance, the satisfiability           logics, e.g, a bit-vector expression can be translated into
check is performed in line 17. If the instance is sat-             a set of Boolean expressions.
isfiable, the values of a and b are determined using                  The frontends are independent from the underlying
read_value. Both operands are printed out in line 21.              two layers and have no semantics attached. To evaluate
Otherwise the instance is unsatisfiable, the else branch           frontend expressions, a context is used that defines their
is executed, which outputs c is prime.                             meaning. The context is the combination of at least
                                                                   one middle-end and one backend, where the middle-end
                    IV. A RCHITECTURE                              defines how the input language is mapped to function
   In the following sections the architecture of metaSMT           calls of the backend.
is described. At first the basic layers are introduced. Then       B. Frontends
each layer is described in detail. The terms frontend for
the input languages, middle-end for the intermediate layer            The frontends define the input languages for metaSMT.
and backend for the solvers are taken from compiler                This includes Core Boolean logic and SMT QF BV as
design to denote the metaSMT layers.                               well as a version of Array logic over bit-vectors. Each
                                                                   frontend defines its own set of available functions as well
A. metaSMT Layers                                                  as public datatypes.
   metaSMT consists of three basic layers depicted in                 The Core Boolean logic defines the public datatype
Figure 2. The frontend layer provides primitives of                predicate which describes propositional variables.


     DIFTS 2011                                               24
Furthermore, Boolean constants are available, i.e., true           4) BitBlast: This emulation of a QF BV bit-vector
and false. This logic also defines primitive Boolean            backend uses only Core Boolean logic operations to allow
functions, e.g., Or, And. The frontend creates a static         the transparent use of SAT or BDD solvers with bit-vector
syntax tree for the expression described in the code. This      expressions. The translation is performed in a standard
syntax tree is passed to the middle-end.                        way: Given only the Core Boolean logic, each bit-vector
C. Middle-ends                                                  is transformed into a vector of Boolean variables. The
                                                                bitwise operations can be applied easily, e.g., an exclusive-
   The core of metaSMT are basic optimizations and              or over two bit-vectors is a bitwise exclusive-or for
translations from the frontend to the backend. While the        each pair of Boolean variables. The bit-vector predicates
frontends provide languages and the backends provide            (equal, less-than, etc.) are mapped to a sequence
solver integrations, the middle-ends allow the user to          of Boolean predicates, e.g., a conjunction of exclusive-
customize metaSMT, i.e., on how the input language is           nors for equal. Arithmetic operations are reduced to an
mapped to the backend. Even in the middle-end itself,           equivalent Boolean expression.
several modules can be combined.
   1) DirectSolver: To enable a low-overhead translation        D. Backends
from a frontend to a backend the DirectSolver                      The respective solvers and other constraint solving
is provided. All the elements of the input expression           techniques are integrated as backends. For each reasoning
are directly evaluated in the backend. Variables are            engine a dedicated backend is created that maps from the
guaranteed to be constructed only once and are stored           internal metaSMT API to the API of the engine. Backends
in a lookup table. For example, given a multiplication          do not have an explicit interface to inherit from. They
operation in QF BV logic directly corresponds to a              implement the required methods for the languages they
multiplication operation in the SMT solver Boolector.           support using C++ template mechanisms to integrate
   The direct middle-end is very lightweight and allows         them into a context. This allows the compiler to optimize
the compiler to inline all function calls. For a modern         the code and, in the case of DirectSolver, produces
compiler the resulting executable should perform equally        code that is close to a hand-coded implementation using
well to a hand-written application using the same backend.      the same API.
   2) GraphSolver: Instead of adding the frontend ex-              This section gives an overview of the backends inte-
pressions directly to the solver, they are first inserted       grated into metaSMT. They are grouped by the input
into a directed graph. The graph models the explicit            language they support. The compatibility of the solvers
syntax tree of the expression as a Directed Acyclic             is also summarized in Table I.
Graph (DAG). Formally a node in the graph is a tuple               1) Core Boolean logic backends: Several core logic
(Operation, Static Arguments) where the SMT command             backends as well as higher level backends are available.
and its static arguments are captured (e.g. extract and         Core logic is directly supported by backends that accept
the range to extract). The edges point from an operation        all common Boolean operations. For example, the Binary
to the SMT expressions used in this command. A label            Decision Diagram (BDD) package CUDD [20] supports
on the edges stores the position of the subexpression in        all Boolean operations and is integrated in metaSMT.
the command. Each time a new expression is evaluated            Furthermore, with some minor transformations based on
it is first searched in a lookup table before a new node is     De-Morgan And-Inverter-Graphs (AIGs) are also able to
created, when the node is not found. When the instance          handle Boolean operations. Those AIGs are internally
is checked for satisfiability, the graph is traversed and       represented by the AIGER package [21]. SAT solvers
evaluated in the backend.                                       can receive Boolean logic expressions either via the
   The graph-based translation provides a way to auto-          SAT Clause adapter that creates one or more clauses
matically detect common subexpressions and efficiently          per logic operation or via the SAT Aiger adapter, that
handle them to create smaller SMT instances which po-           builds an AIG for the expression using the AIGER
tentially increases performance of the reasoning process.       backend. Afterwards, the AIG is translated into a set
This is especially useful if the user wants to automate this    of clauses. This infrastructure allow the usage of any
process, but either does not want to manually optimize the      SAT solver supporting CNF as input language either by
SMT instance or does not know the instance in advance           an API or externally through files. PicoSAT [22] as well
because it is created dynamically inside the program.           as MiniSAT [23] are directly supported as Core logic
   3) Groups: This middle-end provides an interface and         backends via their APIs. Other solvers are supported by
implementation of constraint groups for solvers that do         generating CNF files and calling the executable of the
not have native support for groups. A group is a set            SAT solvers.
of constraints that belong together. The user can create           Furthermore, all SMT QF BV backends natively sup-
groups, add expressions to them and delete them at any          port Core logic as a subset of the language.
time. The solver will then disable all expressions in              2) SMT QF BV backends: Native SMT bit-vector
the group. Groups are emulated using guard variables            solvers like Boolector [24], SWORD [25] and Z3 [26]
and temporary assumptions, e.g., the expression x ∧ y           are directly connected through their API for QF BV
in group 1 is transformed to g1 → (x ∧ y) using the             support. Furthermore, the BitBlast middle-end provides an
guard variable g1 and an implication. Depending on the          emulation for QF BV using only basic logic operations.
solver deleting a group can either lead to the removal          This emulation permits using QF BV expressions in
of the constraints or to the constraint just being disabled     solvers that do not support them natively but support
permanently.                                                    Core Boolean logic e.g., CUDD or SAT-solvers.


     DIFTS 2011                                            25
                        TABLE I
                  BACKEND COMPATIBILITY                               DirectSolver                                 Boolector
 BACKEND            C ORE     QF BV       A RRAY (BV)   SAT
 AIGER [21]          yes      emulated         no        no
 Boolector [24]      yes        yes           yes        no
 CUDD [20]           yes      emulated         no        no
 MiniSAT [23]      emulated   emulated         no       yes           GraphSolver                                  Boolector
 PicoSAT [22]      emulated   emulated         no       yes
 SWORD [25]          yes        yes            no        no
 Z3 [26]             yes        yes           yes        no
 BitBlast            yes        yes            no        no
 SAT Aiger           yes      emulated         no        no
 SAT Clause          yes      emulated         no        no           DirectSolver BitBlast SAT Clause MiniSAT

                     equal


     qf_bv_var&                                                       GraphSolver BitBlast SAT Aiger MiniSAT
     {c,width}                  bvmul


                    qf_bv_var&           qf_bv_var&
                    {a,width}            {b,width}                            Fig. 4.   Data flow in different contexts

    Fig. 3.   Syntax Tree for equal(c, bvmul(a, b)).
                                                                           Listing 3. metaSMT command grammar
  3) SMT QF ABV backends: In addition to Core                       command      ::= assert_cmd | assume_cmd
Boolean logic and bit-vector logic the Boolector and                                | eval_cmd
Z3 backends also support arrays in the form of QF QBV                               | solve_cmd | result_cmd
                                                                    assert_cmd ::= ’assertion(’ context ’,’
logic. Therefore metaSMT supports declaring and work-                                  expression ’);’
ing with arrays over bit-vectors.                                   assume_cmd ::= ’assumption(’ context ’,’
                                                                                       expression ’);’
                                                                    eval_cmd     ::= ’evaluate(’     context ’,’
                  V. I MPLEMENTATION                                                   expression ’);’
                                                                    solve_cmd ::= ’solve(’ context ’);’
  This section describes how the architecture is imple-             result_cmd ::= ’read_value(’ context ’,’
mented in metaSMT and how metaSMT is integrated in                                     variable ’);’
C++ programs.                                                       variable      ::= boolean_variable
                                                                                     | bitvector_variable
                                                                    expression ::= <expression in metaSMT DSEL>
A. Syntax and Semantics
   For the evaluation of metaSMT expressions a context               Figure 4 gives some example contexts and visualizes
is used which defines syntax and semantics. The context           the data flow inside. This illustrates how different
concept and different kinds of contexts are described in          contexts can change the evaluation of a constraint. The
this section.                                                     first context defines a solver using Boolector without
   The syntax component is provided by Boost.Proto. An            intermediate representation (DirectSolver). The context
expression like equal(c, bvmul(a, b)) is created                  directly supports Core Boolean logic and QF BV. In
from the custom Boost.Proto functions equal and                   contrast, in the last example, MiniSAT is used. QF BV
bvmul as well as the variables a, b and c. From the               as well as Core Boolean logic are emulated for this clause
expression the syntax tree in Figure 3 is created. The            based backend. Furthermore this context uses a graph
nodes are labeled with the C++ type and strings inside            and an AIG as intermediate representations.
the curly braces denote the content of the respective                The GraphSolver-based and AIGER-based context first
nodes. For metaSMT the tree is used as static type of the         create an internal representation and pass the complete ex-
expression. The expression and the syntax tree are data,          pression directly before solving. When using approaches
i.e., they neither have semantics attached nor trigger any        without intermediate representation, the requests are
actions.                                                          forwarded to the next layer until they reach the backend.
   The semantics for the expression is introduced by              Only explicit transformations are applied before passing
the metaSMT context, that defines how the syntax tree             the expression (e.g., BitBlast, SAT Clause).
is evaluated and transformed for a specific solver. The
evaluation of Boost.Proto-based expressions is performed          B. Usage and API
in the metaSMT translation middle-end (e.g., GraphSolver
or DirectSolver) so that the backends do not need to                The example from Listing 2 contains most of the
handle Boost.Proto expressions directly. This reduces the         core commands of metaSMT. These are summarized in
overhead to implement new backends.                               Listing 3.


     DIFTS 2011                                              26
Listing 4. Programmatic constraint construction using temporary          Listing 5. Using a shared graph for different contexts
Variables                                                           1 GraphSolver_Context<SWORD> sword;
1 bitvector x = new_bitvector(8);                                   2 GraphSolver_Context<Boolector> btor(sword);
2 for( ... ) {                                                      3
3     bitvector tmp = x;                                            4 GraphSolver_Context<SWORD>::result_type x =
4     x = new_bitvector(8);                                               evaluate(sword, bvuint(0, 8));
5     assertion(ctx, equal( x, bvmul(tmp, ...)));                   5
6 }                                                                 6 for( ... ) {
7 ...                                                               7   x = evaluate(sword, equal( x, bvmul(x,
8 solve(ctx)                                                                  ...)));
                                                                    8 }
                                                                    9 assertion(sword, x);
   The first three functions accept frontend expressions,          10 assertion(btor, x);
however they have different effects. The functions                 11 solve(sword) == solve(btor);
assertion and assumption create the constraint
instance where the first adds a constraint permanently to              When a GraphSolver is constructed using the copy
the (incremental) solver context while the latter adds the             constructor, a shared graph is internally used by the
constraint for the next call of the solver only. In both               contexts. The newly created solver also copies all
cases the expression needs to have a Boolean result. The               assertions and assumptions, so that both solvers have
third function evaluate does not change the instance                   the same internal state. In this setup the results of
but returns a context specific representation of the input             evaluate can be shared among the solvers. Each
expression only.                                                       backend will only evaluate the parts of the graph that
   To query a context for satisfiability, the solve                    are required as parts of assertions or assumptions. The
function is provided. The result is a Boolean value directly           application of evaluate is demonstrated in Listing 5.
representing SAT (true) or UNSAT (false). After a call to              This can be used for example when building multiple
solve the assumptions are discarded while the assertions               instances from the same base. At a specific point the
are still valid for the subsequent calls.                              context can be copied and from there both contexts can
   Getting a SAT result for solve(ctx), i.e., the                      diverge into seperate instances.
instance is satisfiable, a model is generated. The model
can be retrieved with the read_value function. The                                 VI. E MPERICAL E VALUATION
function takes a context and a variable and returns the                  This section presents two different applications of
assignment of this variable in the given context. The                  metaSMT. Furthermore, a comparison of metaSMT with
result of read_value is automatically convertible to                   the native API of an SMT solver is presented. The
many C++ datatypes, including strings, bit-vectors (vector             experiments have been performed on a system with AMD
of bool, tribool, bitset) or integers.                                 Opteron 2222 SE processor, 32 GB RAM and a Linux
   In addition to these core commands, custom middle-                  operating system. In the following, the run times are
ends may provide additional extensions. The Group                      always given in CPU seconds.
middle-end for example provides functions to add groups,
change the active group and delete groups. These func-        A. Exact Synthesis
tions cannot be used in any other context.                       This section presents examples from exact synthesis of
                                                              reversible circuits [28], [29]. A reversible circuit computes
C. Expression Creation                                        a bijective function, i.e., there is a unique input and output
   Typically it is necessary to create the metaSMT ex- mapping. Those circuits purely consist of reversible gates
pression at run time, e.g., in a loop. As metaSMT syntax – here, the universal Toffoli gate and basic quantum gates
trees are statically typed, an extension of the syntax tree are considered. The synthesis problem searches for a
is not possible. To work around this limitation, metaSMT reversible circuit consisting of reversible gates which
provides two options. The first option is to create a partial computes the function specified by a truth table. There
expression and constrain equality to a temporary variable are several exact approaches to synthesize those circuits.
that is later reused to create the complete expression. We considered the approach from [28], which translates
This would allow strict grammar checking but introduces the problem into a decision problem and asks for a circuit
a temporary variable and a constraint, see Listing 4.         realization for a given number of gates. The size of the
   The second option is the use of the evaluate(Ctx, problem instances grows exponentially with number of
Expr) function and the context’s result_type. The input variables, because the entire truth table has to be
function takes a context and a frontend expression taken into account. This usually results in hard problem
and returns the context specific representation of the instances even for small truth tables.
expression. The result of the evaluation is of the backend       The underlying problem formulation has been encoded
specific type Ctx::result_type. This expression in QF BV and an incremental algorithm searches for
can be stored and later be used in other expressions. a valid circuit implementation. Using metaSMT six-
Note however that the return value is solver specific and teen different configurations have been evaluated. The
therefore not portable or reusable in other contexts, not configurations consist of four internal API backend
even contexts of the same type.                               solvers, i.e. Boolector, Z3, MiniSAT, and PicoSAT.
   A powerful exception to this rule is the Additionally, metaSMT is used to generated CNF files
result_type of a GraphSolver-based context, to run the external solvers PicoSAT, MiniSAT, Pre-
where the result is a node in the internal graph. coSAT, and Plingeling [30]. All eight backends are used


     DIFTS 2011                                                   27
                                   Direct<Boolector>
                                          Direct<Z3>
                                   Graph<Boolector>
               10000                      Graph<Z3>
                                      Dir<MiniSAT>
                                       Dir<PicoSAT>
                                    Graph<MiniSAT>
               1000                 Graph<PicoSAT>
                                  Direct<Plingeling*>
                                  Graph<Plingeling*>
                                  Direct<MiniSAT*>
   time [s]


                 100              Graph<MiniSAT*>
                                   Direct<PicoSAT*>
                                   Graph<PicoSAT*>
                                   Direct<PreoSAT*>
                                   Graph<PicoSAT*>
                  10


                   1


                 0.1
                                 dec                    per                      tof                      4m                         gra                          fre                   alu                  gra                  mi                          tof                  hw
                                    od                     e   s_c                  fol                        od                          yco                       dk                    -v0                  yco                  lle                         fol                 b4
                                      24                          o                       i_c                    5-v                             de6                    in_                   _in                  de6                      r_c                     i_d                     _co
                                        -v0                           mp                     om                        0_                              _co                 com                   com                  _co                         om                   ou                         mp
                                            _in                         let                     ple                      inc                                                  ple                                                                   ple                   ble                       let
                                               com                         e_4                     te_                      om                            mp                     te_                ple                  mp                            te_                   _co                       e_2
                                                   ple                                                1                        ple                           let                    3                  te_                  le   te_                      5                     mp                        0
                                                                                                                                  te_                            e_19                                     10                           19                                          let
                                                         te_                                                                            8                                                                                                                                              e_
                                                               15                                                                                                                                                                                                                        2

                       Fig. 5.     Run times of reversible circuit synthesis for hard instances using a metaSMT-based version of RevKit [27]


                                                                                                                                                                                               TABLE II
with the DirectSolver middle-end as well as the                                                                                                                         R ESULT FOR MUTATION TESTING USING DIFFERENT CONTEXTS
GraphSolver middle-end.
                                                                                                                                                                              Name                                                Time [s]
   For the synthesis 11 specifications were used which are                                                                                                                                             Direct                  Graph        Direct                                       Graph
publicly available from [31]. A timeout of 3,600 seconds                                                                                                                                            Boolector               Boolector MiniSAT                                          MiniSAT
was applied. The results are presented in Figure 5. On                                                                                                                        isl                        0.05                    0.22        0.34                                          0.47
the x-axis the respective benchmark is shown, whereas                                                                                                                         min                        0.34                    0.48        0.82                                          0.65
the y-axis denotes the run time in seconds for each                                                                                                                           mid                        4.73                    5.07        7.87                                          4.36
                                                                                                                                                                              fmin3                      5.02                    5.85        4.45                                          2.55
configuration in logarithmic scale. Externally called                                                                                                                         fmin5                    21.08                   25.96        14.92                                          9.56
solvers are marked with ∗.                                                                                                                                                    fmin10                  310.52                  260.38       997.65                                       550.94
   From Figure 5 it is clear that no single solver                                                                                                                            tri                     207.69                  193.99     1596.99                                        652.64
dominates the benchmarks. For example, for the
benchmarks from decode24-v0_incomple_5 to                                                                                                                         Boolector contexts and the graph-based MiniSAT perform
graycode6_complete_19 Boolector performs much                                                                                                                     better. However, for difficult instances with a run time
better than any other solver. But for the benchmarks from                                                                                                         over 1 minute, the graph-based Boolector context is
miller_complete_5, Boolector is outperformed by                                                                                                                   fastest while MiniSAT-based contexts require significantly
the SAT solvers. MiniSAT as well as PicoSAT are evalu-                                                                                                            more time. For these harder instances the GraphSolver
ated as internal and external versions. The accumulated                                                                                                           middle-end outperforms the direct variant of the same
run times of the solvers MiniSAT and PicoSAT over all                                                                                                             backend. This effect is most likely due to the removal
benchmarks are 18,790 seconds for the internal version                                                                                                            of redundancies in the graph. For MiniSAT this amounts
and 8,989 seconds for the external version. Surprisingly,                                                                                                         to run time reductions of around 50%. With metaSMT
the externally called solvers are much better here than the                                                                                                       as abstraction layer it is easy to evaluate the effects of
internal called solvers, though the internal solvers may                                                                                                          different contexts or optimizations. When changes are
benefit from learnt information. Overall, the externally                                                                                                          only in the abstraction layer no application code needs
called PrecoSAT solver needs 326.24 seconds over all                                                                                                              to be changed and only little effort is required.
benchmarks and Boolector needs 3,285.05 seconds.
   To summarize, all the presented results can be achieved                                                                                                        C. Comparison with direct API
very easily by using metaSMT. Only one parameter needs         For the factorization algorithm from Listing 2 a hand
to be switched to run a different solver.                    coded implementation using the Boolector C API is com-
                                                             pared to a metaSMT DirectSolver-based implementation
B. Mutation Testing                                          with the Boolector backend. The resulting application is
  Given a program, by mutation several faulty variants available as part of the metaSMT package.
of the program are created and combined into a single          The experiment has the following setup: The algorithm
program called meta-mutant. Using metaSMT the meta- from Listing 2 was changed to work on sequences instead
mutant is encoded into a set of constraints. Each satisfying of generating random numbers. A sequence of 10,000
assignment yields a test case that discovers at least one random 64 bit numbers is generated and the algorithm it
fault. Experiments of [32] where executed on a set of applied to it 10 times. The same sequence is used for both
metaSMT contexts. The results are shown in Table II. the hand coded and the metaSMT-based implementation
Comparing the Boolector backend with the MiniSAT of the algorithm. The complete experiment was repeated 5
backend using DirectSolver as well as GraphSolver times with Boolector being executed first and 5 times with
middle-ends.                                                 metaSMT being executed first. Altogether each solver was
  The results significantly vary with the difficulty of forced to solve 1,000,000 constraints of 64 bit numbers
the instance. For easy instances the directly connected factorized into two 32 bit numbers.


              DIFTS 2011                                                                                                                                     28
   The results showed no significant difference caused by [11] S. Ranise and C. Tinelli, “The Satisfiability Modulo Theories
the metaSMT abstraction layer: 1, 736s for plain Boolec-       Library (SMT-LIB),” http://www.smtlib.org, 2006.
                                                          [12] O. Shtrichman, “Pruning techniques for the SAT-based bounded
tor compared to 1, 729s for metaSMT with Boolector,            model checking problem,” in CHARME, ser. LNCS, vol. 2144,
i.e., 1.7 seconds for 10,000 factorizations.                   2001, pp. 58–70.

D. Other applications                                      [13] S. Cook, “The complexity of theorem proving procedures,” in 3.
                                                                ACM Symposium on Theory of Computing, 1971, pp. 151–158.
   In addition to the aforementioned projects metaSMT is [14] C. Barrett, R. Sebastiani, S. A. Seshia, and C. Tinelli, Satisfia-
also used in a Constraint Random Stimuli generation Li-         bility Modulo Theories, ser. Frontiers in Artificial Intelligence
                                                                and Applications. IOS Press, February 2009, vol. 185, ch. 26,
brary. In the library elements of the SystemC Verification      pp. 825–885.
Library [33] and techniques from [34] are combined.        [15] D. R. Cok, “jSMTLIB: Tutorial, validation and adapter tools
                                                                               for smt-libv2,” in NASA Formal Methods, ser. LNCS, 2011, vol.
                    VII. C ONCLUSIONS                                          6617, pp. 480–486.
                                                                         [16] “Boost C++ libraries,” http://www.boost.org/.
   metaSMT is a library that abstracts details of reasoning              [17] E. Niebler, “Proto: A compiler construction toolkit for DSELs,”
engines. Based on metaSMT very little programming                              in Proceedings of the 2007 Symposium on Library-Centric
                                                                               Software Design, ser. LCSD ’07. New York, NY, USA: ACM,
effort is required to integrate formal methods into a user‘s                   2007, pp. 42–51.
application. Once this has been done, a wide range of                    [18] J. de Guzman and D. Nuffer, “The Spirit library: Inline parsing
solvers as well as optimization techniques can be selected.                    in C++,” C/C++ User Journal, vol. 21, no. 9, 2003.
   Various research projects already integrate metaSMT.                  [19] T. L. Veldhuizen, “Arrays in Blitz++,” in Proceedings of the
                                                                               Second International Symposium on Computing in Object-
Future work on metaSMT includes the development                                Oriented Parallel Environments, 1998, pp. 223–230.
of the following features: New frontend logics will                      [20] F. Somenzi, CUDD: CU Decision Diagram Package Release
complete the support for SMT logics (e.g. uninterpreted                        2.4.1. University of Colorado at Boulder, 2009.
functions, integer arithmetic), while new middle-ends                    [21] “Aiger,” http://fmv.jku.at/aiger/.
                                                                         [22] A. Biere, “Picosat essentials,” JSAT, vol. 4, no. 2-4, pp. 75–97,
will increase solving performance (e.g. portfolio or multi-                    2008.
threaded contexts) and new backends will provide access                  [23] N. Eén and N. Sörensson, “An extensible sat-solver,” in SAT,
to additional SMT solvers.                                                     ser. LNCS, E. Giunchiglia and A. Tacchella, Eds., vol. 2919.
                                                                               Springer, 2003, pp. 502–518.
                                                                         [24] R. Brummayer and A. Biere, “Boolector: An efficient SMT
              VIII. ACKNOWLEDGMENTS                                            solver for bit-vectors and arrays,” in Tools and Algorithms for
  We would like to thank Hoang M. Le, Heinz Riener                             the Construction and Analysis of Systems, 2009, pp. 174–177.
                                                                         [25] R. Wille, G. Fey, D. Große, S. Eggersglüß, and R. Drechsler,
and Fereshta Yazdani for the helpful discussions, their                       “Sword: A SAT like prover using word level information,” in
proposals for improvements, testing and contributions to                       VLSI of System-on-Chip, 2007, pp. 88–93.
metaSMT.                                                                 [26] L. M. de Moura and N. Bjørner, “Z3: An efficient smt solver,”
                                                                               in TACAS, ser. LNCS, C. R. Ramakrishnan and J. Rehof, Eds.,
                                                                               vol. 4963. Springer, 2008, pp. 337–340.
                         R EFERENCES                                     [27] M. Soeken, S. Frehse, R. Wille, and R. Drechsler, “RevKit: A
 [1] A. Biere, A. Cimatti, E. Clarke, and Y. Zhu, “Symbolic model              toolkit for reversible circuit design,” in Workshop on Reversible
     checking without BDDs,” in Tools and Algorithms for the                   Computation, 2010, pp. 69 – 72.
     Construction and Analysis of Systems, ser. LNCS, vol. 1579.         [28] D. Große, R. Wille, G. Dueck, and R. Drechsler, “Exact
     Springer Verlag, 1999, pp. 193–207.                                       multiple-control toffoli network synthesis with SAT techniques,”
 [2] K. McMillan, “Interpolation and SAT-based model checking,”                Computer-Aided Design of Integrated Circuits and Systems, IEEE
     in Computer Aided Verification, ser. LNCS. Springer Berlin /              Transactions on, vol. 28, no. 5, pp. 703 –715, May 2009.
     Heidelberg, 2003, vol. 2725, pp. 1–13.                              [29] R. Wille, D. Große, M. Soeken, and R. Drechler, “Using higher
 [3] E. Arbel, O. Rokhlenko, and K. Yorav, “SAT-based synthesis                levels of abstraction for solving optimization problems by
     of clock gating functions using 3-valued abstraction,” in Formal          boolean satisfiability,” in Symposium on VLSI, 2008. ISVLSI
     Methods in Computer-Aided Design, 2009, 2009, pp. 198 –204.              ’08. IEEE Computer Society Annual, 2008, pp. 411 –416.
 [4] F. Haedicke, B. Alizadeh, G. Fey, M. Fujita, and R. Drechsler,      [30] A. Biere, “Lingeling, plingeling, picosat and precosat at
     “Polynomial datapath optimization using constraint solving and            SAT race 2010,” Tech. Rep., 2010. [Online]. Available:
     formal modelling,” in Computer-Aided Design (ICCAD), 2010                 http://fmv.jku.at/papers/Biere-FMV-TR-10-1.pdf
     IEEE/ACM International Conference on, 2010, pp. 756 –761.           [31] R. Wille, D. Große, L. Teuber, G. W. Dueck, and R. Drech-
 [5] R. Drechsler, S. Eggersgluss, G. Fey, A. Glowatz, F. Hapke,               sler, “RevLib: An online resource for reversible functions and
     J. Schloeffel, and D. Tille, “On acceleration of SAT-based ATPG           reversible circuits,” in Int’l Symp. on Multi-Valued Logic, 2008,
     for industrial designs,” Computer-Aided Design of Integrated              pp. 220–225, RevLib is available at http://www.revlib.org.
     Circuits and Systems, IEEE Transactions on, vol. 27, no. 7, pp.     [32] H. Riener, R. Bloem, and G. Fey, “Test case generation from
     1329 –1333, 2008.                                                         mutants using model checking techniques,” in International
 [6] M. K. Ganai and A. Gupta, “Accelerating high-level bounded                Conference on Software Testing, Verification and Validation
     model checking,” in International Conference on Computer-                 Workshops, 2011, pp. 388 – 397.
     aided design. New York, NY, USA: ACM, 2006, pp. 794–801.            [33] SystemC Verification Standard Specification Version 1.0e, Sys-
 [7] P. Bjesse, “A practical approach to word level model checking             temC Verification Working Group.
     of industrial netlists,” in International Conference on Computer    [34] R. Wille, D. Große, F. Haedicke, and R. Drechsler, “SMT-
     Aided Verification, 2008, pp. 446–458.                                    based stimuli generation in the SystemC verification library,”
 [8] A. Armando, J. Mantovani, and L. Platania, “Bounded model                 in Advances in Design Methods from Modeling Languages
     checking of software using SMT solvers instead of SAT solvers,”           for Embedded Systems and SoC’s: Selected Contributions on
     Int. J. Softw. Tools Technol. Transf., vol. 11, pp. 69–83, 2009.          Specification, Design, and Verification from FDL 2009, ser.
 [9] “SMT-COMP 2009,” http://www.smtcomp.org/2009, 2009.                       Lecture Notes in Electrical Engineering, D. Borrione, Ed.
[10] “SMT-COMP 2010,” http://www.smtcomp.org/2010, 2010.                       Springer Netherlands, 2010, vol. 63, pp. 227–244.


      DIFTS 2011                                                    29
A Study of Sweeping Algorithms in the Context of
               Model Checking
                                 Zyad Hassan, Yan Zhang, and Fabio Somenzi
                               Dept. of Electrical, Computer, and Energy Engineering
                                         University of Colorado at Boulder
                                                 Boulder, CO 80309


   Abstract—Combinational simplification techniques have           niques benefit interpolation considerably. The recently
proved their usefulness in both industrial and academic            introduced ic3 [1] incrementally discovers invariants
model checkers. Several combinational simplification al-           that hold relative to stepwise reachability information.
gorithms have been proposed in the past that vary in               Designing a model checking flow that involves ic3
efficiency and effectiveness. In this paper, we report our
                                                                   requires understanding how combinational simplification
experience with three algorithms that fall in the com-
binational equivalence checking (sweeping) category. We
                                                                   algorithms affect it.
propose an improvement to one of these algorithms. We                 This paper makes the following contributions:
have conducted an empirical study to identify the strengths           • We carry out a comparative study of the different
and weaknesses of each of the algorithms and how they                   sweeping methods.
can be synergistically combined, as well as to understand             • We propose a BDD-based cut sweeping method that
how they interact with ic3 [1].
                                                                        is more effective than the original cut sweeping.
                                                                      • We propose a combined sweeping approach in
                   I. I NTRODUCTION
                                                                        which more than one sweeping method is applied.
   Combinational simplification eliminates redundancies                 We show that the combined approach can achieve
and increases sharing of logic in a design. It has been                 more simplification than any of the methods can
successfully employed in logic synthesis, equivalence                   achieve individually.
checking, and model checking.                                         • We perform an empirical study of the effect of
   In the model checking context, combinational sim-                    sweeping on ic3.
plification often dramatically improves the performance
                                                                      The rest of the paper is organized as follows. Section
of the proof engines. This has made it into a primary
                                                                   II contains preliminaries. In Section III, we introduce
component in both industrial [2] and academic [3],
                                                                   the BDD-based cut sweeping method. In Section IV,
[4] model checkers. Several combinational simplification
                                                                   we explain the rationale behind the combined sweeping
algorithms have been proposed in the past, such as
                                                                   approach. In Section V we present the experimental
DAG-aware circuit compression [5], [6] and sweeping
                                                                   results and in Section VI we conclude.
methods [7]–[9]. Sweeping methods merge functionally
equivalent nodes. They include BDD sweeping [7], SAT                                  II. P RELIMINARIES
sweeping [8], [10], and cut sweeping [9].
   When designing a model checker, the strengths and               A. AND-inverter-graph
weaknesses of each of the sweeping methods should                     The input and output of our sweeping algorithms
be taken into account. To the best of our knowledge,               are AND-inverter-graphs (AIGs). An AIG is a directed
no studies have been carried out so far to evaluate                acyclic graph that has four types of nodes: source nodes,
and compare the different sweeping methods, with the               sink nodes, internal nodes and the constant TRUE node.
exception of a limited study reported in [9].                      A primary input (PI) is a source node in an AIG. A
   The effect of combinational simplification on several           primary output (PO) is a sink node and has exactly one
model checking engines has been studied in the past.               predecessor. The internal nodes represent 2-input AND
In [8], SAT sweeping has been shown to positively                  gates. A node v is a fanin of v0 if there is an edge (v, v0 );
affect bounded model checking (BMC) [11]. In [12], it              it is a fanout of v0 if there is an edge (v0 , v). Left(v)
is shown that combinational redundancy removal tech-               and Right(v) refer to the left and right predecessors of v .


     DIFTS 2011                                               30
Fanin(v) and Fanout (v) denote the fanins and fanouts               point is that the original BDD is kept for equivalence
of node v . An edge in an AIG may have the INVERTED                 checking, but is not used to produce new BDDs. The
attribute to model an inverter. The Boolean function of             algorithm is complete if the threshold is so large that
a PI is a unique Boolean variable. For an edge, it is               no auxiliary variable is introduced. In practice, however,
the function of its source node if the edge does not have           this can be prohibitive.
the INVERTED attribute, and the complement otherwise.
For an internal node, it is the conjunction of the functions        C. SAT Sweeping
of its incoming edges. The Boolean function of a PO is                 The advancements in SAT solver technology over the
that of its incoming edge.                                          past two decades has proliferated SAT-based reasoning
   A path from node a to b is a sequence of nodes                   methods. SAT sweeping is one such method proposed
hv0 , v1 , v2 , . . . , vn i, such that v0 = a, vn = b and          by Kuehlmann [8] for combinational redundancy identi-
vi ∈ Fanin(vi+1 ), 0 ≤ i < n. The height of a node v ,              fication. SAT sweeping queries the SAT solver to prove
h(v), is the length of the longest path from a PI to v . The        or refute the equivalence of two nodes in the circuit.
fanin (fanout) cone of node v is the set of nodes that have         The basic SAT sweeping algorithm works as follows.
a path to (from) v . Two nodes are functionally equivalent          First, the circuit is simulated with random inputs, and
(complementary) if they represent the same (comple-                 candidate equivalence classes are formed, where nodes
mentary) Boolean function(s). Functionally equivalent               having the same simulation values are placed together.
(complementary) nodes can be merged transferring the                Next, for each two nodes belonging to the same class,
fanouts of the redundant nodes to their representative.             the SAT solver is queried for their equivalence. If the
To simplify our presentation, in what follows we delib-             SAT solver reports they are equivalent, one of them is
erately ignore detection of complementary functions.                merged into the other. Otherwise, the counterexample
                                                                    provided is simulated to break this equivalence class,
B. BDD Sweeping                                                     and possibly rule out other candidate equivalences as
   BDD sweeping identifies two nodes as equivalent if               well. This process is repeated until a resource limit is
they have the same BDD representation. The original                 reached, or until all classes become singletons, indicating
algorithm of Kuehlmann and Krohm [7] works in two                   the absence of further equivalences.
stages: equivalence checking and false negative detec-                 In our implementation of SAT sweeping, several
tion. In the first stage, it builds BDDs for each node              heuristics were applied. We mention each of them briefly.
and merges nodes having the same BDD. The algorithm                    1) Simulating Distance-1 Vectors: This heuristic was
introduces an auxiliary BDD variable (cut point) for each           proposed in [13]. Instead of just simulating the coun-
node that has been merged, which potentially leads to               terexample to equivalence derived by the SAT solver,
false negatives. In the second stage, it takes the BDDs             all distance-1 vectors, that have a single bit flipped, are
of the output functions and for each of them, replaces              simulated as well. Only the bits corresponding to the
the auxiliary variables with their driving functions. The           inputs that are in the fanin cone of the two nodes being
algorithm is complete in the sense that it can find all             checked for equivalence are flipped. We have found this
the equivalences in the circuit given a sufficiently large          heuristic to be very effective in practice. In [13], this
limit on the BDD sizes. However, a large limit often hurts          process is repeated for vectors that were successful in
efficiency. In this paper, we intend to use BDD sweeping            breaking up equivalence classes until convergence. In our
in conjunction with SAT sweeping which is complete                  implementation, we only simulate the distance-1 vectors
and avoids inherent BDD inefficiencies [8]. For that, we            for the original counterexample: for the benchmark suite
employ a version of BDD sweeping that is incomplete                 we experimented with, recurring on successful vectors is
yet faster than the original.                                       too expensive for the number of refinements it achieves.
   The algorithm we adopt iterates the following steps                 2) Clustering: Simulating distance-1 vectors often
on each node v of the AIG in some topological order.                results in considerable refinement of the equivalence
It builds a BDD for v and checks if there exists another            classes. This is desirable, since an equivalence class
node that has the same BDD. If so, it merges these two              is often broken up more cheaply by simulation than
nodes and continues. Otherwise, if the BDD size exceeds             by the SAT solver. Moreover, we have observed that
a given threshold, the algorithm introduces an auxiliary            with distance-1 vector simulation, it becomes very likely
BDD variables for v to be used in place of the original             that nodes remaining in an equivalence class are indeed
BDD when dealing with the fanouts of v . An important               equivalent. Therefore, rather than checking the equiva-


     DIFTS 2011                                                31
lence of two nodes at a time, we check the equivalence              in Φ(v) is associated with a truth table. Next, it searches
of all nodes in an equivalence class using a single SAT             for a node equivalent to v by looking for a cut equivalent
query. If they are all indeed equivalent, we find that using        to some cut in Φ(v). If it succeeds, the two nodes are
a single SAT query rather than n − 1 queries where n is             merged. Otherwise, a heuristic is applied to prune Φ(v)
the number of nodes in the class.                                   to at most N cuts. After pruning, the algorithm stores
   3) Height-Based Merging: When two nodes are                      Φ(v) as the cut-set of v and builds a cross-reference
proved equivalent, we merge the node with a larger                  between each of its cuts and v .
height into the one with a smaller height, instead of                  The heuristic for pruning, which we call the quality
merging based on a topological order as in [13]. The                heuristic, computes a value q for each cut:
intuition being that a node having a larger height often                                    X         1
                                                                                  q(C) =                       .            (3)
has a larger fanin cone, which suggests that merging it                                         | Fanout (n)|
                                                                                           n∈C
would lead to a larger reduction. Nodes coming later in a
                                                                    The cuts with the smallest values of q are kept. The
topological order do not necessarily have a larger height
                                                                    intuition of the quality heuristic is two-fold. First, it tries
than nodes coming earlier.
                                                                    to localize the equivalence and thus favors smaller cuts.
D. Cut Sweeping                                                     Second, it normalizes cuts by attempting to express them
                                                                    with the same set of variables. The chosen variables are
   Cut sweeping [9] is a fast yet incomplete approach
                                                                    those that have a large fanouts, i.e., that are shared by
for combinational equivalence checking. It iteratively
                                                                    many other nodes.
computes cuts for each AIG node and compares the
                                                                       A good truth-table implementation is critical to the
functions associated to the cuts.
                                                                    performance of cut sweeping. In [9], truth tables are
   A cut is defined with respect to an AIG node, called
                                                                    implemented as bit vectors. An advantage of bit vectors
root. A cut C(v) of root v is a set of nodes, called leaves,
                                                                    is the constant-time cost of Boolean operations. On the
such that any path from a PI to v intersects C(v). A cut-
                                                                    other hand, bit interleaving is required to extend the bit
set Φ(v) consists of several cuts of v . For cut-sets Φ(v1 )
                                                                    vectors to the same length so that the corresponding bits
and Φ(v2 ), the merge operator ⊗ is defined as
                                                                    represent the same minterm1 .
 Φ(v1 ) ⊗ Φ(v2 ) = {C1 ∪ C2 | C1 ∈ Φ(v1 ), C2 ∈ Φ(v2 )} .               III. BDD- BASED C UT S WEEPING
                                                        (1)
Assume k ≥ 1. A k-feasible cut is a cut that contains           Representing functions having a small number of
at most k leaves. A k-feasible cut-set is a cut-set that    inputs   using bit vectors is very efficient. However, the
contains only k-feasible cuts. The k-merge operator, ⊗k ,   number    of bits required grows exponentially with the
creates only k-feasible cuts. Cut enumeration recursively number of variables, which can easily lead to memory
computes all k-feasible cuts for an AIG. It computes the blow-up. As an alternative, BDDs, which are more suit-
k -feasible cut-set for a node v as follows:                able for large functions, can also be used to represent cut
         (                                                  functions. Furthermore, the strong canonicity of BDDs
           {{v}}                                 v ∈ PI     makes it trivial to check for equivalence. The use of
Φ(v) =
           {{v}} ∪ Φ (Left (v)) ⊗k Φ (Right (v)) v ∈ AND , BDDs also enables a heuristic which we describe below.
                                                        (2)     The proposed algorithm differs from the original one
where PI and AND refer to the set of PIs and 2-input in two aspects. First, we introduce a new parameter s, the
AND gates respectively. Note that cuts are not built for maximum size of a BDD, to replace k. That is, instead of
POs because they are never merged.                          k -feasible cuts, we keep cuts whose functions contain at
   The function of a cut is the Boolean function of the most s BDD nodes. Node count, as opposed to number
root in terms of the leaves. It can be represented in of inputs, is a more natural characterization of BDD size.
different ways, for instance, using bit vectors or BDDs.        The second difference comes from the pruning heuris-
Two cuts are equivalent if their cut functions are equal. tic. We define the height h of a cut C as the average
Hence, two nodes are functionally equivalent if their cut- height of its nodes:
sets contain equivalent cuts.                                                               X h(v)
   Cut sweeping is parametrized by k and N , the maxi-                           h(C) =                 .                (4)
                                                                                           v∈C
                                                                                                  |C|
mum cut size and the maximum cut-set size, respectively.
For each node v in some topological order of the AIG,          1
                                                                 A good reference of bit-interleaving can be found at http://
the algorithm builds a k-feasible cut-set Φ(v). Each cut graphics.stanford.edu/∼seander/bithacks.html.


     DIFTS 2011                                                32
A smaller h indicates that the leaves in the cut are closer          cut sweeping only keeps those below the threshold. The
to the PIs. The height heuristic keeps at most N cuts                quality heuristic tries to attract cuts on certain nodes,
choosing the ones with smallest values of h.                         which is similar to the placement of auxiliary variables in
                                                                     BDD sweeping. Nevertheless, the number of “attractors”
     a
                p
                                     a
                                            r
                                                                     in the quality heuristic tends to be much larger than in
     b                               c
                                                      g
                                                                     BDD sweeping.
                            f
     c                               b
                q                           s
     d                               d                                       IV. C OMBINING S WEEPING M ETHODS
      Fig. 1.       Two implementations of a 4-input AND gate           The idea of combining several simplification algo-
                                                                     rithms is not new. Many existing model checkers iterate
    A motivating example for the new heuristic is in                 several simplification algorithms before the problem is
Figure 1, which shows two different 4-input AND gates.               passed to the model checking engines. However, we are
Nodes a, b, c, and d are PIs. Nodes p, q , r , and s are             unaware of any studies that have been carried out to
internal nodes. Nodes f and g can only be merged if                  identify the best way they could be combined. In this
their cut-sets both contain {a, b, c, d}. However, if the            section we attempt to give general guidelines to which
internal nodes have many more fanouts than the PIs, the              a combined approach should adhere. We support our
quality heuristic may select cuts containing the internal            claims by empirical evidence collected in the experi-
nodes instead, causing the merge to be missed.                       ments reported in Section V.
    As mentioned before, the quality heuristic tries to                 The problem we address is as follows: given a time
normalize the cuts on certain “attractors.” This reduces             budget for combinational simplification, how should it
the possibility that equivalent functions are represented            be allotted to the different algorithms? The sweeping
differently. However, this might also lead to the loss               algorithms discussed in previous sections vary in their
of the opportunity to find equivalences that cannot be               completeness and speed, with cut sweeping being the
expressed by those “attractors,” as in Figure 1.                     most incomplete method yet the fastest of the three meth-
    On the other hand, the height heuristic tries to push the        ods, SAT sweeping being a complete, yet the slowest,
cut boundary as far as possible. A supporting argument               and BDD sweeping lying in between, both in terms of
is that, if a node is employed in equivalent cuts, then              completeness and speed.
replacing it with its predecessors preserve equivalence.                Possible solutions include allocating the whole time
Furthermore, new merges that are otherwise undiscover-               budget to a single algorithm, or dividing it among two or
able (consider other equivalences that require a and b in            more algorithms. The fact that some methods are better
the above example) may be found. The height heuristic                in approaching certain problems than others, suggests
does not attract cuts to certain nodes, which may result             that more than one method should be applied. If two or
in different cuts for equivalent nodes. As shown in the              more methods are to be applied in sequence, the intuition
experiments, the effectiveness of the height heuristic               suggests that the lower effort methods should precede
reduces as the height of nodes increases.                            the higher effort ones. The advantages of doing so are
    The two heuristics have their own strengths and weak-            two-fold. First, although the higher-effort methods are
nesses. A natural question is whether it is possible to              likely to discover the merges that a lower-effort method
combine them to benefit from their individual strengths.             discovers, in general, it will take them more time to do
We can choose a few cuts with each heuristic. This may               so. Second, preceding higher-effort methods by lower-
lead to more merges but may also worsen the efficiency               effort methods is beneficial in having them present a
if it significantly increases the number of cuts. To prevent         smaller problem that is easier to handle.
such an increase, a combined heuristic only records                     Finally, the percentages of total time that should be
height cuts for the lower nodes, while it keeps both types           allotted to each of the methods to yield the maximum
of cuts for the others.                                              possible reduction is studied in Section V.
    There is some connection between cut sweeping with
each of the two heuristics and BDD sweeping. With                                          V. R ESULTS
the height heuristic, cut sweeping tries to build cuts as
large as possible, as BDD sweeping does. However, BDD                  The experiments are divided into three parts. The first
sweeping can store cuts that exceed the threshold while              part compares different variations of the cut sweeping


     DIFTS 2011                                                 33
algorithm. The second part shows the results of the com-                     proved significantly with s. This means that with s, each
bined sweeping methods, and the third part is concerned                      cut has a larger chance of resulting in a merge.
with the effect of sweeping on ic32 .                                           Second, while “k-quality” and “k-combined-1” have
   We use the benchmark set from HWMCC’10 [14], a                            very close sweeping times, the latter achieves 19.8%
set of 818 benchmarks in total. The experiments are run                      more merges. Furthermore, the decrease in the “Height”
on a machine with a 2.8GHz Quad-Core CPU and 9GB                             column reveals that the height cuts indeed lead to
of memory. We use CUDD [15] as our BDD package                               merges. Although “s-quality” is more effective than the
for all the BDD-related algorithms.                                          two above methods, it is less efficient due to the larger
                                                                             cut sizes.
A. BDD-based Cut Sweeping                                                       For the methods with s (excluding “s-quality”), we
   Variations of cut sweeping are applied to the                             observe that “s-height N = 1” is the fastest and produces
HWMCC’10 benchmark set. The differences between                              a good number of merges. Increasing the number of
variations are two-fold. First, either the number of vari-                   height cuts to two triples the run time without gaining
ables, k, or the number of BDD nodes, s, is used to                          many more merges. Comparing it with “s-combined-1”,
drop over-sized cuts. Second, we experiment with several                     an improvement on the merges is shown by the latter.
heuristics for pruning cut-sets: the quality heuristic,                      This indicates that maintaining one height cut and one
the height heuristic, and two combined heuristics. The                       quality cut works better than two height cuts. For “s-
naive combined heuristic (“combined-1”) chooses one                          combined-2”, the number of merges is between the two
cut based on the height heuristic and the others based                       above methods, but with lower run time. Furthermore,
on the quality heuristic. The other heuristic (“combined-                    the numbers of generated cuts and kept cuts are even
2”) sets a threshold on the node height (350 in our                          comparable to “s-height N = 1”. That is, even though
experiments). For nodes that are below the threshold,                        we keep three cuts for those nodes with height larger
it only keeps a height cut. For higher nodes, it produces                    than 350, on average we compute only a few percent
cut-sets consisting of a height cut and two quality cuts.                    more cuts than we do in the case of one cut per node.
We denote a method by a k or an s followed by the the                           The “Height” values of the three methods confirm the
heuristic name. All the variations use BDDs to represent                     assumption made in Section III: most merges produced
the cut functions.                                                           by the height heuristic come from cuts close to the
   The results are shown in Table I; they are aggregated                     PIs. When the two heuristics are combined, a significant
over the 818 benchmarks. Based on experiments, both                          increase on the “Height” value is observed. In Figure
the threshold of BDD sweeping and s in BDD-based                             2, we show the number of merges found by “s-height
cut sweeping are set to 250. The total number of AIG                         N = 1” and “s-combined-2” on nodes within different
nodes before sweeping is 7.22M. “Final” is the size of                       height ranges. The plot is normalized to “k-quality” and
AIGs after sweeping. “Generated” and “Kept” are the                          has bin size of 50, i.e., a point at (2, 1) indicates that the
number of cuts generated and kept by the corresponding                       method finds the same number of merges as “k-quality”
methods. For an individual benchmark, its “height” is the                    for nodes with height from 100 to 149. Obviously, the
average height of all merged cuts. The “Height” column                       height heuristic works better on smaller height nodes,
is computed by taking the average of the “height” of                         while the quality heuristic catches more on larger height
all the benchmarks. A smaller value indicates that more                      ones. The combined heuristic takes the advantage of
merges are found by cuts that are close to the PIs. Note                     both and produces an even better profile on nodes with
that since we use BDDs, the results in terms of efficiency                   larger heights. Note that although the height heuristic
of bit-vectors based methods may not be as good as in                        works worse on the nodes with larger heights, it can
[9]. Therefore, when dealing with them, we just compare                      still get more merges. This may be due to the fact that
the effectiveness.                                                           in this benchmark set, a large percent of equivalences
   Results indicate that the resulting AIGs are con-                         are located at lower heights.
sistently smaller with s than with k. There are                                 In our setup, cut sweeping is intended for usage as a
a few interesting observations. First, the ratios                            fast method. Thus we consider “s-height N = 1” and
GeneratedCuts /Merge and KeptCuts/Merge are im-                              “s-combined-2” to be the best variants. Compared to
                                                                             BDD sweeping, those two variants are faster because
  2
    Detailed results for first and second parts can be found at http:        they create fewer BDD nodes than BDD sweeping, but
//eces.colorado.edu/∼hassanz/sweeping-study                                  are less effective since BDD sweeping may keep BDDs


      DIFTS 2011                                                        34
                                                     TABLE I
   R ESULTS OF CUT SWEEPING , BDD SWEEPING AND SAT SWEEPING ON THE HWMCC’10 SET. B Y DEFAULT, k = 8 AND s = 250.

                                                       Method              Final (×106 )             Merges (×105 )        Time (s)   Generated (×107 )   Kept (×107 )   Height
                                                  k-quality N = 5              6.82                      2.62               123.26          5.83             1.92        10.20
                                               k-combined-1 N = 5              6.75                      3.14               129.64          5.84             1.90         8.57
                                                  s-quality N = 5              6.71                      3.31               536.75          7.63             2.32        12.11
                                                  s-height N = 1               6.55                      4.07                58.99          1.07             0.54         3.19
                                                  s-height N = 2               6.51                      4.20               181.52          2.18             0.99         2.94
                                               s-combined-1 N = 2              6.48                      4.42               181.21          2.29             1.02        12.86
                                                    s-combined-2               6.52                      4.28                74.64          1.10             0.54        12.72
                                                  BDD Sweeping                 6.34                      5.61               112.74            –                –           –
                                                   SAT Sweeping                6.10                      6.37               2149.4            –                –           –


                                      2
                                     10                                                                                    spent in sweeping. Data are aggregated over the 818
                                                                                        Height Heuristic
                                                                                        Combined Heuristic                 benchmarks. The base case for our comparisons is the
                                                                                                                           pure SAT sweeping case in which SAT sweeping gets
      Ratio over Quality Heuristic


                                      1
                                                                                                                           the whole budget. The time budget used in our study is
                                     10
                                                                                                                           10 seconds.
                                                                                                                              We consider the following policies: (a) allocating
                                                                                                                           the budget to two methods, (b) allocating it to three
                                      0
                                     10                                                                                    methods, and (c) allocating the whole budget to SAT
                                                                                                                           sweeping. For (a) and (b), we consider all the different
                                                                                                                           permutations of assigning integer time values to each
                                      −1
                                                                                                                           method, such that they sum up to 10 seconds. Note that if
                                     10
                                           0        20    40          60
                                                               Node Height (/50)
                                                                                   80          100           120           a sweeping algorithm times out, what it has achieved thus
                                                                                                                           far is used in what follows. In all cases, a set of light-
Fig. 2.                               Number of merges on nodes within different height ranges
                                                                                                                           weight sequential simplification algorithms are applied
                                                                                                                           before sweeping. This set of algorithms includes cone-
                                                                                                                           of-influence reduction, stuck-at-constant latch detection,
that exceed the threshold.                                                                                                 and equivalent latch detection. The total number of
                                                                                                                           AIG nodes for all 818 benchmarks measured after the
B. Combined Sweeping Methods                                                                                               sequential simplification step is 6.1M.
   In this section, we show experimental evidence that                                                                        Results are presented in Table II. The first column lists
supports the guidelines of combining sweeping methods                                                                      the methods, where the number before each sweeping
presented in Section IV. In particular, we try several pos-                                                                method indicates the number of seconds given to it.
sibilities of allotting the budget to the different sweeping                                                               The second column shows the number of AIG nodes
algorithms with the purpose of identifying empirically if                                                                  removed. The third column shows the total time spent in
they should be combined, and if so, in which way. In                                                                       sweeping. The methods are listed in order of decreasing
what follows we use the “s-height N = 1” variant of                                                                        reduction. The last row is for pure SAT sweeping. We
cut-sweeping since it is the fastest, and we simply refer                                                                  only show the best three setups in terms of reduction for
to it as cut sweeping.                                                                                                     each of the possible orders of the method sequences.
   In our combined approach, SAT sweeping is always                                                                           Several observations can be made. First, when it
run last since it is the only complete method of the three,                                                                comes to running two methods in sequence, BDD
and should thus be given whatever time is left to find                                                                     sweeping combined with SAT sweeping outperforms cut
equivalences not discovered by the other methods. Also,                                                                    sweeping combined with SAT sweeping. The method
the time not used by any of the preceding methods is                                                                       that achieves maximum reduction (8 seconds of BDD
passed to SAT sweeping. For instance, if cut sweeping                                                                      sweeping followed by 2 seconds of SAT sweeping)
is given a time budget of 4 seconds and only uses 3 of                                                                     removes 56K more nodes than pure SAT sweeping (7.7%
them, SAT sweeping gets to run for one extra second.                                                                       more reduction). Second, more reduction is achievable
   We compare the reduction measured in terms of the                                                                       by running three methods in sequence. As suggested in
number of AIG nodes removed, and the total time                                                                            Section IV, ordering the methods by increasing effort


          DIFTS 2011                                                                                                  35
                        TABLE II                                                             TABLE III
      E FFECT OF BUDGET ALLOCATION ON REDUCTION .                           E FFECT OF SWEEPING ON ic3’ S PERFORMANCE .

           Method         Reduction   Total Sweeping Time (s)                      Solves      Solves     Runtime (s)   Runtime (s)
    4 Cut, 5 BDD, 1 SAT    801,932              518                       Seed       (No        (With        (No          (With
    2 Cut, 5 BDD, 3 SAT    801,137              516                      Index    Sweeping)   Sweeping)   Sweeping)     Sweeping)
    6 Cut, 3 BDD, 1 SAT    801,119              522                         1        693         698        96,297        91,762
    4 BDD, 1 Cut, 5 SAT    794,052              517                         2        689         699        95,629        90,341
    8 BDD, 1 Cut, 1 SAT    793,921              515                         3        691         699        95,050        92,714
    7 BDD, 2 Cut, 1 SAT    793,814              519                         4        696         697        93,691        91,141
       8 BDD, 2 SAT        793,226              500                         5        693         698        95,007        89,656
       7 BDD, 3 SAT        793,068              503                         6        690         695        96,270        91,559
       5 BDD, 5 SAT        792,797              508                         7        693         699        94,784        92,056
        1 Cut, 9 SAT       772,563              512                         8        690         701        94,351        90,837
        6 Cut, 4 SAT       771,070              513                         9        693         693        95,491        92,847
        3 Cut, 7 SAT       769,483              511                        10        690         693        95,124        93,048
           10 SAT          736,594              619                     Average     691.8       697.2       95,169        91,596


(or equivalently by increasing degree of completeness)               and after sweeping) in the sweeping case. We compare
achieves more reduction than otherwise. Here, the best               the number of solves, and the aggregate runtime among
method (4 seconds, 5 seconds, and 1 second for cut,                  all benchmarks.
BDD and SAT sweeping, respectively), has an edge                        It is important to note that the ic3 algorithm has a
of 65K nodes over pure SAT sweeping (about 8.9%                      random flavor. In particular, the order by which gener-
more reduction). Third, in terms of sweeping time, it                alization (dropping literals) is attempted is randomized.
is clear that a large drop occurs (> 100 seconds) when               Also, since the algorithm is SAT-based, randomization
two or three methods are combined versus pure SAT                    occurs in the SAT solver decisions. To have reliable
sweeping, which is due to the often smaller time needed              experimental results, each experiment is repeated with
by BDD and cut sweeping to discover equivalences than                10 different seeds, and the results are averaged over the
SAT sweeping. Given an overall model checking budget,                different seeds.
smaller sweeping time allows more time for the model                    Results are shown in Table III. The first column shows
checking engine, which is desirable.                                 the seed index, the second and third columns show the
   The question of whether such difference has a consid-             number of solves without and with sweeping, and the
erable effect on the performance of the model checking               fourth and fifth columns show the aggregate runtime
engine is answered in the next section.                              without and with sweeping.
                                                                        The results confirm a positive effect of sweeping on
C. Effect on ic3
                                                                     the performance of ic3. On average, five more solves
   The recently developed model checking algorithm,                  are achieved with sweeping, and the aggregate runtime
ic3 [1], has been regarded as the best standalone model              drops by 3.8%.
checking algorithm developed up till now [16]. As the                   The enhancement in the performance of ic3 in pres-
interaction of combinational simplification methods with             ence of sweeping can be attributed to two factors. First,
different model checking algorithms has been studied in              reduction in the number of gates caused by sweeping can
the past, we here aim to study how they interact with                result in the reduction in the SAT solver time. Second,
ic3. In particular, we would like to empirically find                simplification often results in dropping of latches (e.g.,
out if ic3 benefits from preprocessing the netlist with              if it merges the next-state functions of two latches). For
a simplification algorithm or not, and if it does, how               example, for the benchmark set used in our experiments,
sensitive it is to the magnitude of reductions achieved              sweeping reduces the aggregate number of latches from
through simplification.                                              279,161 to 269,091 (3.6% decrease). This reduces the
   In the first experiment, we compare two runs of ic3,              amount of work done by ic3 in generalization of
one that is preceded by SAT sweeping, and one that                   counterexamples to induction.
is not. The experimental setup is as follows. A total                   We now repeat the previous experiment, but this time
timeout of 10 minutes is used. The budget for SAT                    we compare the number of solves and the aggregate
sweeping is 10 seconds. The light-weight sequential                  runtime between pure SAT sweeping and the empirically
simplification algorithms referred to in Section V-B are             optimum combined sweeping scheme of Section V-B.
applied once in the no-sweeping case, and twice (before              The purpose of this experiment is to understand how


     DIFTS 2011                                                 36
                      TABLE IV
     O PTIMUM SWEEPING VERSUS PURE SAT SWEEPING .                    reviewers for their insightful comments regarding cut
                                                                     sweeping’s pruning heuristics that prompted us to try
              Solves       Solves   Runtime (s)   Runtime (s)        the “combined-2” heuristic.
               (Pure     (Optimum      (Pure       (Optimum
     Seed       SAT      Sweeping       SAT        Sweeping
    Index    Sweeping)    Scheme)   Sweeping)       Scheme)                                    R EFERENCES
       1        698         696       91,762         91,567
       2        699         696       90,341         91,113
                                                                      [1] A. R. Bradley, “SAT-based model checking without unrolling,”
       3        699         697       92,714         92,373               in Proc. Int. Conf. on Verification, model checking, and abstract
       4        697         702       91,141         90,478               interpretation, 2011, pp. 70–87.
       5        698         697       89,656         90,327           [2] H. Mony, J. Baumgartner, V. Paruthi, R. Kanzelman, and
       6        695         699       91,559         90,606               A. Kuehlmann, “Scalable automated verification via expert-
       7        699         697       92,056         91,498               system guided transformations,” in Formal Methods in
       8        701         699       90,837         92,228
       9        693         696       92,847         91,252
                                                                          Computer-Aided Design, 2004, pp. 159–173.
      10        693         697       93,048         92,663           [3] R. Brayton and A. Mishchenko, “ABC: An academic industrial-
   Average     697.2       697.6      91,596         91,411               strength verification tool,” in Computer Aided Verification,
                                                                          2010, pp. 24–40.
                                                                      [4] G. Cabodi, S. Nocco, and S. Quer, “Benchmarking a model
                                                                          checker for algorithmic improvements and tuning for perfor-
sensitive ic3 is to the magnitude of reductions.                          mance,” in Proc. Hardware Verification Workshop, 2010.
                                                                      [5] P. Bjesse and A. Boralv, “DAG-aware circuit compression for
   Results are shown in Table IV, where the second and                    formal verification,” in Proc. Int. Conf. on Computer-aided
third columns compare the number of solves for pure                       design, 2004, pp. 42–49.
SAT sweeping and the optimum sweeping scheme, and                     [6] A. Mishchenko, S. Chatterjee, and R. Brayton, “DAG-aware
                                                                          AIG rewriting a fresh look at combinational logic synthesis,”
the fourth and fifth columns compare the total runtime.
                                                                          in Proc. Design Automation Conference, 2006, pp. 532–535.
   As the results indicate, ic3 does not benefit much                 [7] A. Kuehlmann and F. Krohm, “Equivalence checking using cuts
from the better reduction achieved by the combined                        and heaps,” in Proc. Design Automation Conference, 1997, pp.
sweeping scheme. The lack of performance enhancement                      263–268.
                                                                      [8] A. Kuehlmann, “Dynamic transition relation simplification for
on ic3 can be attributed to the small improvement                         bounded property checking,” in Proc. Int. Conf. on Computer-
in reduction the combined sweeping approach achieves                      aided design, 2004, pp. 50–57.
over pure SAT sweeping. In particular, while pure SAT                 [9] N. Eén, “Cut sweeping,” Cadence Design Systems, Tech. Rep.,
sweeping removes 737K nodes out of the total 6.1M                         2007.
                                                                     [10] A. Mishchenko, S. Chatterjee, and R. Brayton, “FRAIGs: A
nodes in the 818 benchmarks (12.1% reduction), the                        unifying representation for logic synthesis and verification,”
combined approach removes 802K nodes (13.2% reduc-                        EECS Dept., UC Berkeley, ERL, Tech. Rep., Mar 2005.
tion); a mere 1.1% improvement. This suggests that ic3               [11] A. Biere, A. Cimatti, E. M. Clarke, and Y. Zhu, “Symbolic
                                                                          model checking without BDDs,” in TACAS, 1999, pp. 193–207.
has a small sensitivity to the magnitude of reduction.
                                                                     [12] G. Cabodi, M. Murciano, S. Nocco, and S. Quer, “Stepping
                    VI. C ONCLUSION                                       forward with interpolants in unbounded model checking,” in
                                                                          Proc. Int. Conf. on Computer-aided design, 2006, pp. 772–778.
  In this paper, we presented an empirical study of                  [13] A. Mishchenko, S. Chatterjee, R. Brayton, and N. Een, “Im-
the different sweeping methods proposed in the past.                      provements to combinational equivalence checking,” in Proc.
                                                                          Int. Conference on Computer-aided design, 2006, pp. 836–843.
We have shown that a combined sweeping approach                      [14] A. Biere and K. Claessen, “Hardware model checking compe-
outperforms any of the sweeping methods alone. We                         tition,” in Hardware Verification Workshop, 2010.
have also proposed a BDD-based cut sweeping method                   [15] F. Somenzi, CUDD: CU Decision Diagram Package, University
that is more effective than the original cut sweeping.                    of Colorado at Boulder, ftp://vlsi.colorado.edu/pub/.
                                                                     [16] R. Brayton, N. Een, and A. Mishchenko, “Continued relevance
Finally, we have studied the effect of sweeping on the                    of bit-level verification research,” in Proc. Usable Verification,
new model checking algorithm, ic3, and investigated                       Nov. 2010, pp. 15–16.
the causes of the better performance it experiences with
sweeping. The goal of this analysis is to help designers
of model checkers to make decisions regarding the
incorporation of sweeping methods and to provide a
deeper understanding of how sweeping methods interact
with ic3.
                  ACKNOWLEDGEMENTS
  We thank Aaron Bradley who motivated this work
and contributed to many discussions. We also thank the


     DIFTS 2011                                                 37
                                                                                                                                           1


  Enhancing ABC for LTL Stabilization Verification
          of SystemVerilog/VHDL Models
                       Jiang Long, Sayak Ray, Baruch Sterin, Alan Mishchenko, Robert Brayton
                            Berkeley Verification and Synthesis Research Center (BVSRC)
                               Department of EECS, University of California, Berkeley
                               {jlong, sayak, sterin, alanmi, brayton}@eecs.berkeley.edu


   Abstract—We describe a tool which combines a commercial               (stabilization properties, to be precise). In this paper, we detail
front-end with a version of the model checker, ABC, enhanced to          its liveness capabilities.
handle a subset of LTL properties. Our tool, VeriABC, provides              After reading in a design, VeriABC bit-blasts it into a bit-
a solution at the RTL level and produces models for synthesis
and formal verification purposes. We use Verific (a commercial           level netlist and converts the SVA stabilization properties into
software) as the generic parser platform for SystemVerilog and           an intermediate LTL representation. Then the LTL properties
VHDL designs. VeriABC traverses the Verific netlist database             are folded into the bit-level netlist in an appropriate way
structure and produces a formal model in the AIGER format.               (using an extended liveness-to-safety conversion, explained
LTL can be specified using SVA 2009 constructs that are                  later in Section IV). The resulting bit-level netlist represents
processed by Verific. VeriABC traverses the resulting SVA parse
trees and produces equivalent LTL formulae using the F,G, Until          a formal model of the design, represented as an and-inverter
and X operators. The model checker in ABC has been enhanced              graph (AIG). And-inverter graphs are concise representations
to handles LTL stabilization properties, an important subset of          of finite state machines. The AIGER[2] format is a prevalent
LTL. The paper presents VeriABC’s implementation strategy,               format for AIG representation. supported by various academic
software architecture, tool flow, environment setup for formal           tools and used in annual hardware model checking competi-
verification, use model, the specification of properties in SVA
and translation into LTL. Finally the properties are translated          tions. Since our liveness verification is based on liveness-to-
into safety properties that can be verified by the ABC model             safety conversion, eventually the safety verification backend in
checker.                                                                 ABC[1] is called which works on the bit-level netlist produced
                                                                         by the VeriABC front-end. We have used this methodology
                                                                         to verify liveness in the context of compositional verification
                      I. I NTRODUCTION                                   of deadlock freedom of micro-architectural communication
   We present an integrated tool flow for liveness model check-          fabrics. Preliminary experimental results are encouraging.
ing using industry hardware description languages (HDLs) and
SystemVerilog Assertions: (i) VeriABC: a front-end to read in            A. Related work
hardware models expressed in HDLs, and (ii) capability of                   Although parsing and elaborating RTL languages are a stan-
model checking a subset of liveness properties. VeriABC is               dard practice for commercial EDA products, it is a daunting
able to read in hardware models expressed in SystemVerilog               task for academics due to language complexity and continuous
or VHDL. SystemVerilog and VHDL languages are the most                   language evolution over the years. Although, vl2mv[12] was an
popular HDLs being used in industry today for digital designs.           academic tool that attempted to treat a significant subset of the
VeriABC generates a formal model in the AIGER[2] format                  Verilog language for synthesis and verification purposes, it was
and relies on a commerical front-end, Verific, to build a generic        not maintained and language support was not complete. Tools
parser platform for HDLs. This allows down-stream tool flows             like ABC[1], VIS[9] parse subsets of Verilog language too
in synthesis and verification. A version of ABC was enhanced             strict and not applicable in a broad setting. Freely accessible
from a safety-only verification engine to allow both safety and          tools like icarus[3] contain Verilog languages front-end but are
liveness verification. Our current version supports a particular         not up-to-date with newer SystemVerilog features.
subset of liveness properties called stabilization properties or            Our choice of a commercial and stable front-end Verific,
generalized fairness constraints (defined in Section IV).                allows academics to get around the language barrier to access
   In a typical use model, a user will develop a hardware                real-world designs.
design in SystemVerilog or VHDL, and specify its correctness                Liveness-to-safety conversion was first proposed in [7],
requirements in the property specification language SystemVer-           [24]. They demonstrated that verification problems for any
ilog Assertion (SVA). SVA has been adopted into IEEE                     ω-regular property can be converted into a verification prob-
SystemVerilog standard and is supported by major commer-                 lem of an equisatisfiable safety problem. Their algorithm
cial tools in simulation, synthesis and verification. The SVA            has been deployed successfully in industrial setups and used
language in SystemVerilog 2009 standard contains liveness                to verify liveness properties of microprocessor designs [6].
constructs that allow full specification of liveness properties          Our liveness-to-safety conversion algorithm for stabilization
as those defined in LTL formulas. In our framework, a user               is essentially an extension of the algorithm proposed in [24]
can specify both safety properties and liveness properties               and is broader than discussed in [6].
     DIFTS 2011                                                     38
B. Contribution
   As illustrated in Figure 1, we combine a commercial front-
end, Verific, with a version of our model checker, ABC,
enhanced to handle a subset of LTL properties. Our tool
processes the Verific output, conducts various modeling pro-
cedures on the design, compiles SVA into LTL formulas, then
the enhanced ABC processes the LTL formulas for liveness
model checking. We detail the software architecture, tool flow,
formal model construction, SVA compilation and downstream
LTL modeling checking.


                                    RTL+SVA

                                                      Verific Parser Platform
                            Verific Netlist Databse

                                                      VeriABC
                 DEBUG
                                AIGER + LTL Formula

                                                      L2S
                                      AIGER

                                                      ABC Backend Engines
                                              Error
                            Proven
                                              Trace


                                                                                     Fig. 2.   Verific Parser Flow

Fig. 1.    Complete Tool Flow
                                                                                     VHDL IEEE 1076-1993, and Verilog IEEE 1364-1995/2001
                                                                                     languages, without any restrictions. The resulting parse tree
C. Organization of the Paper                                                         comes with an extensive API.
   We first discuss the capabilities of the Verific parser plat-                        Elaborate supports both static elaboration and RTL elab-
form. In Section III we describe the architecture, formal                            oration. Static elaboration elaborates the entire language, and
modeling of VeriABC and translation of SVA into LTL. In                              specifically binds instances to modules, resolves library ref-
Section IV the stabilization properties are described in further                     erences, propagates defparams, unrolls generate statements,
detail. Section V describes the liveness-to-safety conversion                        and checks all hierarchical names and function/task calls.
for stabilization properties. Experimental results are presented                     The result after static elaboration is an elaborated parse tree,
in Section VI.                                                                       appropriate for simulation like applications. RTL elaboration is
                                                                                     limited to the synthesizable subset of the language. In addition
          II. BACKGROUND : V ERIFIC PARSER P LATFORM                                 to the static elaboration tasks for this subset, it generates
                                                                                     sequential networks through flipflop and latch detection, and
   Verific Design Automation[4] builds SystemVerilog and
                                                                                     Boolean extraction. The result after RTL elaboration is a netlist
VHDL Parser Platforms which enable its customers to develop
                                                                                     database, appropriate to applications such as logic synthesis
advanced EDA products quickly and at low cost. Verific’s
                                                                                     and formal verification. This netlist database is the starting
Parser Platforms are distributed as C++ source code or library
                                                                                     point of VeriABC and we utilize Verific provided C++ APIs
and build on all 32 and 64 bit Unix, Linux, and Windows
                                                                                     to access the database.
operating systems. Applications vary from formal verification
to synthesis, simulation, emulation, virtual prototyping, in
circuit debug, and design-for-test. We chose Verific as our                          A. Verific Netlist Database Structure
front-end for its commercial success and stability in supporting                        In this Section, we use Verilog terminology to present
the latest language constructs in SystemVerilog.                                     Verific’s netlist database structures. The netlist database is
   Figure 2 shows the architecture of the Verific parser front-                      rather intuitive and adheres to the module definitions. Shown
end. The main commands we use in Verific library are analyze                         in Table I, there is a one-to-one correspondence between the
and elaborate. Analyze creates parse-trees and performs                              C++ API class definitions and Verilog constructs.
type-inferencing to resolve the meaning of identifiers. The                             A N etlist corresponds to module definitions in Verilog
Parser/Analyzer supports the entire SystemVerilog IEEE 1800,                         while an Instance object corresponds to module instantiation,
          DIFTS 2011                                                            39
      Verific Database C++ API Class    Verilog RTL Objects
                   Netlist                Module definition
                                                                             As shown in Figure 3, we employ a two phase approach.
                  Instance              Module instantiation              First we construct an intermediate netlist graph, a DAG with
                    Port               Module port declarations           extra annotated node types representing logic structure and
                     Net                   wire/reg/assign                connectivity of the flattened design. In addition to the simple
                   PortRef             Port to Net connectivity
                                                                          node types in and-inverter graphs, extra node types contain
                              TABLE I                                     annotations for language constructs such as tri states, flipflops
                     V ERIFIC N ETLIST O BJECTS
                                                                          and latches etc. For example, flipflops contain set/reset pins
                                                                          and driving d-pins; latches contain additional gated-clock defi-
after the module’s parameters have been characterized. An                 nitions. By language definition, a design can specify any signal
Instance is a thin copy of the N etlist plus a pointer to its             for the clock and reset signals. Design behavior is defined
parent netlist. A N etlist contains a set of P orts, N ets and            by event-driven semantics. Further analysis needs be done to
Instances for its internal logic structure. A P ort corresponds           determine if there is an AIG representation that can capture the
to the Verilog port definitions which can be input, output                original design semantics. The intermediate netlist is a DAG
or inout. A N et is a named component, intuitively a wire.                for which various traversal algorithms can be conducted for
P ortRef contains the connectivity between a P ort and a                  the later analysis. For this step, VeriABC only traverses the
N et. The direction of the P ortRef can be in, out, or inout              design hierarchy and hyper-graph in the Verific netlist database
depending on the type of P ort it contains. Using these C++               to gather information and construct the intermediate DAG
objects, the Verific netlist database defines a directed hyper-           representation without conducting any design style checking
graph and encapsulates the following types of information:                or transformation.
   Design Hierarchy
          Design hierarchy is captured as an instance tree by
          the parent pointers in the Instance with a top-level
          netlist as the root.                                                                         Verific Netlist
                                                                                                        Database
   Unique Hierarchical Name
          Following the design hierarchy through the instance
          tree, each internal object is assigned a unique hier-
          archical name.
   Connectivity                                                                                        Intermediate
          A directed hyper-graph is defined through P ort, N et                                        Netlist Graph
          and P ortRef : P ort being the node, N et being the
          edge, and P ortRef containing the connectivity and
          direction information between pairs of a P ort and a
          N et. As an edge in the hyper-graph, a N et can be
          referenced in multiple P ortRef objects.                                                          AIG
   Logic Definition
          At the leaf of the design hierarchy, a N etlist of
          primitive operator types such as and, or, adder,
                                                                          Fig. 3.   VeriABC Architecture
          flipflop, latches etc defines the basic logic operators.
   Recursively, the functional behavior of the design is cap-
tured through the directed hierarchical hyper-graph with basic            B. Formal Modeling
logic operators at its leaf level.                                           The end result of VeriABC is an AIG model that is
                                                                          consistent with the original RTL design semantics. AIG is
                         III. V ERI ABC                                   recognized as a finite state machine model. Compared to
   VeriABC traverses the above netlist database and transforms            the event-driven semantics in HDL language, the execution
it into a monolithic AIG which can be treated as a directed               semantics of an AIG is synchronous with an implicit universal
acyclic graph (DAG). The AIG contains primary input, pri-                 clock that ticks at every step of the execution. The register
mary output, register nodes and and-inverter nodes. Each                  loads in its driver value at the beginning of each step. The
named P ort and N et in the Verific netlist has a mapped node             semantics inferred from the Verific netlist structure is more
in the AIG graph. Additional book-keeping information such                complex, such as its flip-flop can have arbitrary reset logic and
as hash tables are created that map the hierarchical name to              clock network. The task in formal modeling is to transform the
the corresponding AIG node. The down-stream model checker                 above design components into AIG registers with additional
ABC then reads in this AIG to conduct formal analysis.                    glue logic so that it maintains the consistency. In its simplest
                                                                          form, the following check determines if the design can be
                                                                          readily represented by an AIG.
A. Architecture                                                              • All registers are clocked by the same primary input signal
  A hyper-graph is rather hard to traverse and conduct anal-                   which does not fanout to other nodes.
ysis/transformation at the same time.                                        • Reset/Set signals are primary inputs

      DIFTS 2011                                                     40
  •   No combinational loops                                             Figure 4 are the templates for matching the basic LTL opera-
  •   No multiple drivers per node                                       tors used in the set of stabilization properties. For stabilization
   More complex design modeling and transformations can                  properties, in our current implementation, we restrict the p and
be achieved by identifying certain patterns by traversing the            q to be Boolean expressions which seems to be sufficient in
intermediate netlist graph. Our current implementation sup-              practice. The SVA verification directive assume is used to
ports the above form and produces a design style summary                 specify a fairness constraint, while assert is used to specify
for debugging purposes. Though capacity and performance                  the target LTL properties.
depends on the type of individual transformation and anal-
ysis, for the above ones, the transformation and design style               property Until(p,q);
checking is very fast, as it conducts only a few traversals of                p s_until q;
the intermediate netlist graph. After design style checking, a              endproperty
final traversal of the intermediate netlist graph generates an
AIGER file representing the formal model.                                   property GF(p);
                                                                              always (s_eventually p);
                                                                            endproperty
C. Commands Implemented
   VeriABC implementation utilizes the Tcl command inter-                   property FG(p);
face shipped with the Verific library distribution. The following             s_eventually (always p);
is the list of commands implemented to manage the environ-                  endproperty
ment setup for formal verification.
   veriabc analyze                                                          property X(p);
           This command constructs the intermediate netlist                   s_nexttime (p);
           graph in Figure 3 and conducts design style checking.            endproperty
   veriabc set reset
           Although a reset signal can be automatically de-
           tected in certain situations, this command provides           Fig. 4.   SVA Liveness Template
           the user with the option to specify the length of
           the reset sequence and phase of the reset condition.             Verific also processes SVA constructs into the netlist
           A user can also specify the initial value of the              database structure, essentially a corresponding parse tree is
           registers through a VCD waveform or textual file.             integrated into the netlist. We conduct traversals of the parse
           In the generated formal model, the initial value of           tree, identify specific liveness constructs and map them into
           the registers will be valued at the end of the reset          the corresponding LTL formula. At the end of the procedure,
           sequence and the reset condition will be disabled             along with the AIGER file generated, a separate file containing
           after reset.                                                  the LTL formulas are generated indicating target liveness
   veriabc set clock                                                     assertions and fairness constraints. The support signals referred
           A user can specify the clock periods and relation-            to in the LTL formulas are named output signals in the
           ships in the situation when multi-clocks are in the           AIGER file. Although we currently only support stabilization
           design. A phase-counter network will be created in            properties in LTL, the full LTL language using X, F , G,
           the formal model to generate the corresponding clock          U operators can be specified fully and translated through
           signals.                                                      SVA constructs. In doing so, this completes the formal model
   veriabc set cutpoint                                                  generation and SVA compilation at the RTL level.
           This command will prune the cone-of-influence at
                                                                                IV. L IVENESS M ODEL C HECKING IN V ERI ABC
           cutpoint signal and treat it as free input.
   veriabc set constant                                                     In the FSM modeling formalism, the most intuitive notion
           This command sets either an input or a cut-point              of stabilization states that the system will always reach a
           signal to a constant value.                                   particular state and stay there forever, no matter which state the
   veriabc set assume                                                    system started from or which path it took. Relaxing this notion
           This constrains the design signal to be a constant.           a bit, stabilization means that the system will eventually reach
   veriabc write                                                         and stay within a given subset of states. Also, stabilization may
           write out the final formal model in AIGER format.             denote conditions on the input and output signals of a system
                                                                         when it attains a stable state. Applications of stabilization
The above commands give the user flexibility to model the
                                                                         properties have been demonstrated in [14] and [18], to name
environment with constraints and conduct design abstractions
                                                                         a few. We review some basic temporal logic terminology and
during verification.
                                                                         formally define stabilization properties using LTL below.

D. SVA to LTL Compilation                                                A. LTL, model checking and stabilization property
   In SystemVerilg 2009 standard, a rich set of LTL operators               Familiarity is assumed with LTL, basic model checking
are added into SVA language. The SVA properties shown in                 algorithms, and related terminology like Kripke structures and
      DIFTS 2011                                                    41
Büchi automata. For further details, see [13]. In our current           properties beyond the context of stabilization verification (eg.
context, we use LTL properties GFp and FGp, and thus                     GFp ⇒ GFq).
overview their semantics here: let π be a path in some Kripke               The class of LTL properties defined as stabilization prop-
structure K; π |=K Gp means property p will hold on every                erties in this paper is a very important class of temporal
state along π; π |=K Fp means the property p will hold                   properties extensively studied in the literature. It is related
eventually on some state along π; π |=K GFp means p will                 to so-called fairness specifications. Operators GF and FG
hold along π infinitely often, and π |=K FGp means p will                are often called infinitary operators [19] and symbols F ∞
hold eventually on π forever. Since temporal operators F and             and G∞ are used respectively instead [15]. The class itself
G are dual (i.e. Fp ≡ ¬G¬p), operators FG and GF are also                (i.e. Boolean combination of GF-atoms) has been called
dual (i.e. FGp ≡ ¬GF¬p).                                                 general fairness constraints [16], [19]. As shown in [16],
   Definition 1 (GF-atom): Any LTL formula of the form                   various notions of fairness like impartiality [21], weak fairness
GFp or FGp, where p is some atomic proposition or some                   [20](also called justice [21]), strong fairness [20] (also called
Boolean formula involving atomic propositions only, will be              compassion [21]), generalized fairness [17], state fairness [22]
called a ‘GF-atom’.                                                      (also known as fair choice from states [23]), limited looping
   Stabilization properties are defined as the family of LTL for-        fairness [5], and fair reachability of predicate [23] can be
mulas that are Boolean combinations of GF-atoms. Formally:               expressed by stabilization properties. These properties are used
   Definition 2 (Stabilization Property): The set of stabiliza-          to exclude “unfair” counterexamples in liveness verification
tion properties is the syntactic subset of LTL defined as                in both linear time and (fair) branching time paradigms. For
follows:                                                                 liveness verification, we usually have a liveness property (the
   • any GF-atom is a stabilization property                             actual proof obligation) along with a set of fairness constraints.
   • if φ is a stabilization property, then so is ¬φ                     Liveness properties may not be stabilization properties. In that
   • if φ and ψ are stabilization properties, then so are φ ∧ ψ          case we may need to construct the product of the system
      and φ ∨ ψ                                                          and the Büchi automaton of the (negation of the) liveness
   Example 1: FGp, GFp ⇒ GFq, FGp ∧ FGq ⇒ FGr,                           property before performing the L2S conversion. Interestingly,
and FGp ⇒ FGq ∨ (FGr ∧ GFs) are stabilization proper-                    for many interesting applications as in [14] and [18], the
ties where p, q, r, and s are atomic propositions or Boolean             liveness verification obligations fall entirely in the family of
formulas involving atomic propositions only and a ⇒ b is the             stabilization properties. For these applications, the simple L2S
usual shorthand for ¬a ∨ b. However, G(r ⇒ Fg) is an LTL                 scheme proposed in this paper works. Note that some liveness
liveness property but not a stabilization property.                      properties like G(request ⇒ Fgrant) are not stabilization
   Needless to say, these are all liveness properties. But not           properties, but also have a direct L2S conversion [24]. It is,
all of them specify so-called system stabilization directly.             therefore, an interesting question that under what more general
Properties like FGp and FGp ∧ FGq ⇒ FGr (or its general-                 conditions there exists a direct L2S conversion.
ization ∧ki=1 FGpi ⇒ FGq) are perhaps the most elementary
stabilization properties. FGp means that the system eventually              V. L2S C ONVERSION FOR STABILIZATION PROPERTIES
will reach a state from where p will always hold, i.e. the                  It is important to understand that any counterexample to
system will eventually ‘stabilize’ at p. FGp ∧ FGq ⇒ FGr                 a liveness property (which must be an infinite trace) can be
means that if the system stabilizes at p and also at q (at               seen as a “lasso” like configuration with a finite handle and
perhaps some other time), then it will stabilize eventually              a finite loop. Therefore a liveness counter-example is a lasso
at r. Hence, semantics of these properties are close to the              which does not satisfy the property on the loop but satisfies
intuitive notion of stabilization. [14] demonstrates the use             all imposed fairness constraints on the loop.
and significance of stabilization properties in the context                 In general, a liveness problem is converted to a safety prob-
of biological system analysis. However, our definition of                lem by adding a loop-detection logic and property-detection
stabilization captures a broader family of specifications. It            logic on top of the product of the FSM of the original system
includes FGp ⇒ FGq ∨ (FGr ∧ GFs) which may look                          and the Büchi automaton of the property to be verified. The
contrived, but for example, [18] uses many such complicated              loop-detection logic consists of a set of shadow registers,
stabilization properties for compositional deadlock analysis             comparator logic, and an ‘oracle’. The oracle saves the system
of micro-architectural communication fabrics. On the other               state in the shadow registers at a non-deterministically chosen
hand, our definition includes many properties not intended to            time. In all subsequent time frames, the current state of the
specify so-called stabilization behavior. For example, GFp or            system is compared to the state in the shadow registers.
GFp ⇒ GFq.                                                               Whenever these two states match, the system has completed a
   The main motivation behind considering this broader subset            loop. The non-deterministic nature of the oracle allows all such
of LTL is that we offer a short-cut L2S conversion, avoid-               loops to be explored. The property verification logic checks if
ing Büchi automaton construction, in a uniform way (due                 any of the liveness conditions are violated in any such loop
to the duality between FG and GF operators). The most                    and all fairness conditions always hold in the loop. This check
significant applications of this class that we have encountered          is done as a safety obligation. For a more detailed exposition,
is “stabilization verification”, and hence the name is coined            see [24].
for the family. (This name was inspired by [14]). Thus the                  As mentioned, for some simple properties L2S conversion
L2S conversion proposed here may be applied for proving                  can be achieved while avoiding explicit Büchi automata con-
     DIFTS 2011                                                     42
struction. This is done by adding more functionality to the                 Theorem 1: For any stabilization property, the given proce-
property detection logic. As presented in [24], these properties         dure finds one counter-example if one exists.
are Fp, GFp, FGp, pUq, , G(r ⇒ Fq), and F(p∧Xq) (Table                      (Proof Sketch) Any stabilization property can be trans-
1 of [24]). This approach, reviewed in Figure 5, depicts an L2S          formed into another stabilization property with GF operators
converted circuit for verifying the LTL property Fp.                     only. Let f be the Boolean structure in the negation of the
   In the next paragraph, we describe how this construction              given stabilization property. The procedure described above
verifies Fp. In Section V-A we explain how to extend the ideas           will create a monitor that will search for a lasso-loop where
of Figure 5 for stabilization properties. Instead of presenting          f is violated inside the loop. Since the procedure implicitly
the liveness-to-safety conversion through Kripke structure-              enumerates all possible cycles in the state space, it will detect
based representations (i.e. through explicit state machines              a violating cycle if one such exists.
based representations), we present the idea in terms of an ac-
tual circuit construction (i.e. through symbolic representation                           VI. E XPERIMENTAL R ESULTS
of the state space). Also, although we do not discuss it further,           We implemented our L2S scheme for general stabilization
the same mechanism handles fairness constraints, which are               properties in ABC and experimented with several designs of
always stabilization properties, so they just entail adding              communication fabrics from industry. Our objective was to
additional logic to the circuit for the monitor. For Kripke              verify all stabilization properties defined for every structural
structure-based descriptions of liveness-to-safety conversion,           primitive of the X MAS framework [18]. The properties,
see [24].                                                                though local to each component, are verified in the context
   In Figure 5, save represents an additional primary input              of the whole design in order to avoid explicit environmental
added to the circuit. This plays the role of the ‘oracle’. When          modeling. BLIF models of the communication fabrics were
save is asserted for the first time, the current state of the            generated by the X MAS compiler [11] from high-level C++
circuit is saved in the set of shadow registers, and register            models. The L2S monitor logic was then created by ABC
saved is set. saved thus remembers that input save has                   on these BLIF models. The X MAS compiler also generates
been asserted and allows any further activity on save to                 SMV models from C++ models so that the LTL encoding of
be ignored. For subsequent time frames, saved enables the                the stabilization properties can be verified directly on the SMV
equality detection between the current state of the circuit and          models using the NuSMV model checker.
the state in the shadow registers. Clearly, signal looped is                We found that the ABC based L2S implementation has
asserted iff the system has completed a loop. Signal live                much better scalability than NuSMV. NuSMV can solve only
remembers if the signal p has ever been asserted. The safety             toy designs while on the large designs of interest, it fails
property that the circuit verifies is, therefore, looped ⇒               to produce a result. On the other hand, our tool works well
live. (In general this would be looped & fair ⇒ live.)                   even on large designs. For most cases, it produces a result
The block marked with “⇑” represents this logical implication            almost immediately. For a few cases, initial trials could not
- the direction of the arrow distinguishes the antecedent signal         produce a proof, but with the latest version of ABC using
from the consequent signal of the implication.                           simplification, abstraction, speculative reduction, and property
                                                                         directed reachability (PDR) analysis [8], the proofs were
                                                                         completed. This observation supports the premise that the use
A. L2S for stabilization properties
                                                                         of highly developed safety techniques can pay off for liveness
   In [24], the authors show how to do the L2S conversion for            verification.
GFp and FGp, which are GF-atoms. We demonstrate how to                      Experimental results are shown below. Among all the local
extend this to any Boolean combination of GF-atoms using an              properties that the X MAS compiler generated, we provide
example, omitting a formal proof of correctness.                         results for the most challenging one. Call this property ψ; it
   Consider a simple stabilization property φ of the form                is defined for a FIFO buffer, and has the following LTL form
FGa ⇒ FGb + FGc. An L2S converted circuit for this
                                                                                      ψ := FG(¬a) ⇒ FG(¬b) ∨ FG(c)
is shown in Figure 6. (For simplicity, we do not show any
fairness constraints in the example.) Note that, signal live             where a, b, and c are appropriate design signals (i.e. interface
in Figure 5 monitors if signal p has ever been asserted from             signals of a FIFO buffer). Table 1, 2, and 3 compare the
the very initial time frame. But for verifying GFp, we need to           performance of ABC with NuSMV on small examples. These
monitor whether signal p has been asserted between the time              examples are instances of communication fabrics or sub-
when saved is set and the time when looped becomes true.                 modules thereof, and are explained in full detail in [10].
Using this fact, the duality between FG and GF operators,                simple credit and simple vc (Table 1 and 2, respectively)
and the Boolean structure Xa ⇒ Xb +Xc of the given formula,              are designs corresponding to Figure 4 and 5 of [10], and
we can derive the circuit of Figure 6. Logic that captures the           simple ms (Table 3) is a much simpler version of the design
Boolean structure of φ is marked with a dotted triangle in               shown in Figure 6 of [10]. Note from the table, how perfor-
Figure 6. Hence, for any arbitrary stabilization property, we            mance of NuSMV degrades even for small designs. For large
need to create monitors for individual GF-atoms and a crown              designs, NuSMV could not finish for any single instance of
of combinational logic on top of these monitors that captures            ψ.
the Boolean structure of the property. We can formulate the                 Since ψ is defined for a FIFO buffer and the X MAS
following theorem.                                                       compiler created one instance of ψ for each FIFO buffer, the
     DIFTS 2011                                                     43
                                                                      p

                                                                                                                                Live
                                         Circuit Under
                                  .      Verification
                                  .
                                  .
                                                                           .
                                                              state bits   .
                                                                           .

                                                                                                                                           Looped
                                                                                             0

                                                                                             1
                                                                                                                       =


                                  Save                                                       0
                                                         Saved                               1


                                                                                                         Shadow Registers


Fig. 5.    Liveness-to-safety transformation for Fp


                                                          a


                                                          b


                                 Circuit Under            c
                     .
                                 Verification
                     ..
                      .
                                                                 .                                                                                  Looped
                                                    state bits   ..
                                                                  .

                                                                                                     =   .
                                                                                       0                 ..
                                                                                                          .
                                                                                       1


                                                                                                     =

                          Save                                                         0
                                                 Saved                                 1


                                                                                                  Shadow Registers


Fig. 6.    L2S for stabilization property F Ga ⇒ F Gb + F Gc


                                                                            ABC     NuSMV
                                                              Prop #
                                                                            (sec)    (sec)                                  ABC        NuSMV
             ABC             NuSMV                                                                            Prop #
 Prop #                                                          0          0.09     33.23                                  (sec)       (sec)
             (sec)            (sec)
                                                                 1          0.07     31.8                       0           0.03        431.5
    0        0.25             0.115
                                                                 2          0.06     39.57                      1           0.12       379.59
    1        0.05             0.14
                                                                 3          0.03     16.46                      2            0.8       471.36
    2        0.02             0.09
                                                                 4           0.5     41.37                      3            0.8       385.67
             TABLE II                                            5          0.03     16.89
                                                                                                                           TABLE IV
          SIMPLE CREDIT
                                                                           TABLE III                                       SIMPLE MS
                                                                           SIMPLE VC


          DIFTS 2011                                                                         44
number of ψ instances is the same as the number of FIFO                                                 R EFERENCES
buffers. For example, the designs corresponding to Table 1, 2,            [1] ABC - a system for sequential synthesis and verification. Berkeley
and 3 above have 3, 6, and 4 FIFO buffers, respectively.                      Verification and Synthesis Research Center, http://www.bvsrc.org.
   We also experimented on two large communication fabrics                [2] Aiger, http://fmv.jku.at/aiger/.
                                                                          [3] Icarus verilog, http://iverilog.icarus.com.
of practical interest [10], [18]. One has 20 buffers and the              [4] Verific Design Automation: http://www.verific.com.
other has 24 buffers. 19 out of 20 of the first design and 23             [5] K. Abrahamson. Decidability and expressiveness of logics of processes.
out of 24 from the second design were proved by ABC by a                      In PhD Thesis, University of Washington, 1980.
                                                                          [6] J. Baumgartner and H. Mony. Scalable liveness checking via property-
light-weight interpolation engine within a worst case time of                 preserving transformations. In DATE, pages 1680–1685, 2009.
5.83 seconds (most were proved in less than a second). Light-             [7] A. Biere, C. Artho, and V. Schuppan. Liveness checking as safety
weight interpolation could not prove one instance from each                   checking. In In FMICS02: Formal Methods for Industrial Critical
                                                                              Systems, volume 66(2) of ENTCS. Elsevier, 2002.
design. These were proved using advanced techniques from                  [8] A. Bradley. Sat-based model checking without unrolling. 2011.
ABC’s arsenal of safety verification algorithms. For example,             [9] R. Brayton, G. D. Hachtel, A. Sangiovanni-vincentelli, F. Somenzi,
ABC took a total of 217.2 seconds to prove one of these harder                A. Aziz, S. tsung Cheng, and S. Edwards. Vis : A system for verification
                                                                              and synthesis. pages 428–432. Springer-Verlag, 1996.
properties. In this time span, ABC first did some preliminary            [10] S. Chatterjee and M. Kishinevsky. Automatic generation of inductive
simplification, then it tried interpolation, BMC, simulation                  invariants from high-level microarchitectural models of communication
and PDR in parallel for a time budget of 20 seconds. But                      fabrics. In CAV, pages 321–338, 2010.
                                                                         [11] S. Chatterjee, M. Kishinevsky, and U. Ogras. Modeling communication
this attempt failed and it moved on to further simplification                 micro-architectures (with one hand tied behind your back). Intel
by reducing the design using localization abstraction and                     Technical Report, 2009.
speculation. It ran interpolation, BMC, simulation, BDD-based            [12] S.-T. Cheng and R. K. Brayton. Compiling verilog into automata, 1994.
                                                                         [13] E. Clarke, O. Grumburg, and D. Peled. Model checking. 2000.
reachability and PDR engines in parallel both after abstraction          [14] B. Cook, J. Fisher, E. Krepska, and N. Piterman. Proving stabilization
and speculation, using an elevated time budget of 100 seconds                 of biological systems. 2011.
and 49 seconds respectively. The iteration after abstraction             [15] E. A. Emerson. Handbook of theoretical computer science (vol.
                                                                              b). chapter Temporal and modal logic, pages 995–1072. MIT Press,
could not prove the property, but the iteration after speculation             Cambridge, MA, USA, 1990.
managed to prove it with the PDR engine, which produced the              [16] E. A. Emerson and C.-L. Lei. Modalities for model checking: branching
final proof in 7 seconds.                                                     time logic strikes back. Sci. Comput. Program., 8:275–306, June 1987.
                                                                         [17] N. Francez and D. Kozen. Generalized fair termination. In Proceedings
                                                                              of the 11th ACM SIGACT-SIGPLAN symposium on Principles of pro-
            VII. C ONCLUSION & F UTURE W ORK                                  gramming languages, POPL ’84, pages 46–53, New York, NY, USA,
   We have developed a tool, VeriABC, which allows us to                      1984. ACM.
                                                                         [18] A. Gotmanov, S. Chatterjee, and M. Kishinevsky. Verifying deadlock-
access real industrial designs written in SystemVerilog or                    freedom of communication fabrics. 2011.
VHDL and to process them into the AIGER format. The result               [19] R. Hojati, R. K. Brayton, and R. P. Kurshan. Bdd-based debugging of
can be used for synthesis and verification using a tool like                  design using language containment and fair ctl. In Proceedings of the
                                                                              5th International Conference on Computer Aided Verification, CAV ’93,
ABC. We described how the RTL processing is done using                        pages 41–58, London, UK, 1993. Springer-Verlag.
the commercial front-end, Verific. SVA assertions are also               [20] L. Lamport. ”sometime” is sometimes ”not never”: on the temporal
processed by Verific, and VeriABC creates a separate file of                  logic of programs. In Proceedings of the 7th ACM SIGPLAN-SIGACT
                                                                              symposium on Principles of programming languages, POPL ’80, pages
equivalent LTL formulas. We showed an application of this                     174–185, New York, NY, USA, 1980. ACM.
to property checking, where ABC was enhanced to convert a                [21] D. J. Lehmann, A. Pnueli, and J. Stavi. Impartiality, justice and
subset of LTL into a circuit structure, thus effectively allowing             fairness: The ethics of concurrent termination. In Proceedings of the 8th
                                                                              Colloquium on Automata, Languages and Programming, pages 264–277,
liveness checking in ABC.                                                     London, UK, 1981. Springer-Verlag.
   The use of a stable, supported and complete language                  [22] A. Pnueli. On the extremely fair treatment of probabilistic algorithms.
processing tool like Verific, allows academics access to real in-             In Proceedings of the fifteenth annual ACM symposium on Theory of
                                                                              computing, STOC ’83, pages 278–290, New York, NY, USA, 1983.
dustrial designs, without going through the hassle and daunting               ACM.
task of building their own equivalent tool. Liveness property            [23] J. P. Queille and J. Sifakis. Fairness and related properties in transition
checking is a growing interest in industry, and our enhanced                  systems a temporal logic to deal with fairness. In Acta Informat, pages
                                                                              195–220, 1983.
ABC with a front end that automatically converts to a circuit            [24] V. Schuppan and A. Biere. Efficient reduction of finite state model
structure for liveness checking, can use the advanced safety                  checking to reachability analysis. Int. J. Softw. Tools Technol. Transf.,
property methods of ABC.                                                      5(2):185–204, 2004.
   In the future, the development of VeriABC will allow us
to extract higher level constructs from SystemVerilog and
VHDL by accessing Verific’s parse trees. These constructs
can be passed on using an extended AIGER format to an
enhanced ABC, which will use this information in synthesis
and verification.

                  VIII. ACKNOWLEDGMENT
  This work has been supported in part by SRC contract
2057.001 and our industrial sponsors: Abound Logic, Actel,
Altera, Atrenta, IBM, Intel, Jasper, Magma, Oasys, Real Intent,
Synopsys, Tabula, and Verific.
     DIFTS 2011                                                     45
      On Incremental Satisfiability and Bounded Model Checking
                                           Siert Wieringa∗
                                      Aalto University, Finland
                                       siert.wieringa@aalto.fi


Abstract                                                    1   Introduction
                                                     Model checking is a formal verification technique
Bounded Model Checking (BMC) is a symbolic           revolving around proving temporal properties of
model checking technique in which the existence      systems modelled as finite state machines. A
of a counterexample of a bounded length is rep-      property holds for the model if it holds in all pos-
resented by the satisfiability of a propositional    sible execution paths. If the property does not
logic formula. Although solving a single instance    hold this can be witnessed by a counterexample,
of the satisfiability problem (SAT) is sufficient to which is a valid execution path for the model in
decide on the existence of a counterexample for      which the property does not hold. Because the
any arbitrary bound typically one starts from        model has a finite number of states any infinite
bound zero and solves the sequence of formu-         execution of the system includes a loop, and can
las for all consecutive bounds until a satisfiable   thus be represented by a finite sequence of exe-
formula is found. This is especially efficient in    cution steps. Bounded Model Checking (BMC)
the presence of incremental SAT-solvers, which       [1] is a symbolic model checking technique in
solve sequences of incrementally encoded formu-      which the existence of a counterexample con-
las. In this article we analyze empirical results    sisting of a bounded number of execution steps
that demonstrate the difference in run time be-      is represented by the satisfiability of a proposi-
havior between incremental and non-incremental       tional logic formula. It thus allows the use of
SAT-solvers. We show a relation between the ob-      decision procedures for the propositional satisfi-
served run time behavior and the way in which        ability problem (SAT) for model checking. De-
the activity of variables inside the solver propa-   spite the theoretical hardness of SAT [7] such de-
gates across bounds. This observation has not        cision procedures, called SAT-solvers [9, 14, 17],
been previously presented and is particularly        have become extremely efficient. BMC is pop-
useful for designing solving strategies for paral-   ular as a technique for refuting properties, and
lelized model checkers.                              although BMC based techniques can be used for
                                                     proving properties we do not consider such tech-
                                                     niques here.
   ∗
     Financially supported by the Academy of Finland    A typical BMC encoding will have semantics
project 139402                                       such that if there exists a counterexample of


DIFTS 2011                                             46
length k then there also exists a counterexam-              ier to solve than the largest unsatisfiable ones.
ple of any length greater than k. Thus in prin-             A more opportunistic search strategy may re-
ciple solving a single propositional logic formula          duce the total time required to find a satisfiable
is sufficient to decide on the existence of a coun-         formula by skipping over hard instances. Such
terexample for any arbitrary finite bound. How-             strategies are natural for environments in which
ever, one typically starts to solve the formula             multiple computing nodes are available in paral-
corresponding to bound zero and then solves se-             lel, where one may define some nodes to use a
quentially each consecutive bound until a coun-             more opportunistic strategy than others.
terexample is found. We will refer to this as the              The observation on the empirical hardness of
standard sequential search strategy. This strat-            the smallest satisfiable formulas compared to the
egy has the nice property that it always finds a            largest unsatisfiable ones can also be made for
counterexample of minimal length. As with ev-               BMC. We attempted to implement opportunis-
ery bound the representing formula grows larger             tic strategies in our parallelized BMC framework
it also avoids solving unnecessarily large formu-           Tarmo [18]. This however turned out to be less
las. Importantly, the performance of this strat-            efficient then we would have expected, with per-
egy benefits greatly from the availability of incre-        formance degrading for many benchmarks. In
mental SAT-solvers. Incremental SAT-solvers                 this article we evaluate the performance of the
can solve sequences of formulas that share large            incremental solver and compare it against that
parts in common efficiently in a single solver pro-         of solving each bound separately. The purpose
cess, allowing reuse of information between for-            of this study is not to illustrate that incremen-
mulas.                                                      tal solvers are more effective for BMC than non-
                                                            incremental ones, as that is well known, but to
                                                            understand when and how opportunistic strate-
2    Motivation                                             gies can be applied. This is done by compar-
                                                            ing against non-incremental solver run times be-
Automated SAT based planning is a problem                   cause solvers applying opportunistic strategies
closely related to BMC. It deals with the same              benefit less from the incremental interface of the
sequences of parameterized formulas, except                 solver, as the problem is no longer introduced
that the satisfiability of a formula now corre-             one bound at a time.
sponds to the existence of a plan of a bounded
length. In [15] evaluation strategies for plan-
ning were suggested that are more opportunis-               3    Preliminaries
tic than the standard sequential search strategy.
They suggest to spend some amount of the to-                The majority of modern SAT-solvers are based
tal solving effort at attempting to solve formu-            on the Davis Putnam Logemann Loveland
las for bounds ahead of the currently smallest              (DPLL) procedure [8]. The DPLL-procedure is
unsolved one. It was inspired by the empirical              a backtracking search procedure for SAT that
observation that if a plan exists then amongst              builds a partial assignment by iteratively decid-
the smallest satisfiable formulas in the sequence           ing on a branching variable to be assigned a
there are typically formulas that are much eas-             value in the partial assignment. When the par-


DIFTS 2011                                             47
tition of the search space defined by the partial            out the need for integrating them into one appli-
assignment is without solutions the algorithm                cation. The input sequences used were the same
backtracks. In addition to this modern SAT-                  as the benchmarks used for experimental results
solvers typically employ conflict clause learning            presented in [18]. These sequences were gener-
[17]. Such solvers record a new conflict clause              ated with the current state-of-the-art encoding
whenever they are forced to backtrack. They                  for model checking of linear time temporal logic
then backtrack non-chronologically to a decision             properties with past (PLTL) [2] as implemented
point at which the conflict clause was still satis-          in the model checker NuSMV 2.4.3 [6].
fied.
   The performance of the DPLL-procedure de-
pends heavily on its branching variable decisions.           4    Run time
A commonly used decision heuristic for clause
learning SAT-solvers is the Variable State Inde-    In our experiments we have studied the behav-
pendent Decaying Sum (VSIDS) heuristic first        ior of SAT-solvers on problem sequences from
presented in the solver Chaff [14]. The idea of     BMC regarding both the run time and variable
the heuristic is to favor variables that are in-    activity. A selection of the results is presented in
cluded in recently derived conflict clauses. For    the figures in this article. For each benchmark
each variable an initially zero value called the ac-there are two subfigures, a run time graph la-
tivity is maintained. Whenever a conflict clause    beled (a) and a variable activity graph labeled
                                                    (b). In this section we will focus only on the run
is learnt the activity of all variables that occur in
the clause is increased. Periodically the activity  time graphs, which have bounds on the x-axis
of all variables is divided by a constant.          and time on the y-axis. For each bound a ver-
   All results presented in this article were ob-   tical bar displays the time it took to solve the
tained using the SAT-solver MiniSAT 2.2.0 [9]1 .    formula corresponding to that single bound us-
The solver core was not modified but a num-         ing the SAT-solver in non-incremental fashion.
ber of small modifications2 were made in aux-       If the solver found unsatisfiable a solid red bar
iliary routines such as the file parser in order    is drawn, if the solver found satisfiable only the
to read incremental SAT sequences from disk.        outline of the bar is drawn in green.
When employing BMC typically the SAT-solver            The incremental solver solves using the stan-
will not read the formula sequence from disk but    dard sequential strategy and whenever it com-
it will rather be integrated into a BMC engine      pletes a bound it reports the run time up to
that is generating formulas for new bounds on       that point. The points in the graphs marked
the fly. We use sequences stored on disk as this    with crosses (x) and connected by the thick line
is convenient for testing the performance of the    represent these run times.
SAT-solver independently. As our sequences are         The thin dotted line connecting the plusses
streamable one can also use them as an interface    (+) is representing the cumulative run time of
between a BMC engine and a SAT-solver with-         the non-incremental solver, i.e. for each bound
                                                    k the value displayed is the sum of all run times
  1
    Available from http://www.minisat.se            of the non-incremental solver tests from bound
  2
    Available from http://users.ics.tkk.fi/swiering 0 to k. This demonstrates the time required


DIFTS 2011                                              48
           80                                                                                                                      1600


           70                                                                                                                      1400


           60                                                                                                                      1200


                                                                                                           hyperactive variables
           50                                                                                                                      1000
time (s)


           40                                                                                                                       800


           30                                                                                                                       600


           20                                                                                                                       400


           10                                                                                                                       200


            0                                                                                                                         0
                0        10            20             30          40              50            60                                        0            10             20               30       40          50            60
                                                      bound                                                                                                                            bound
                                                  (a)                                                                                                                            (b)
                                                                 Figure 1: Benchmark irst.dme6 from [3]
           60                                                                                                                      10000


                                                                                                                                    9000

           50
                                                                                                                                    8000


                                                                                                                                    7000
           40
                                                                                                           hyperactive variables


                                                                                                                                    6000
time (s)


           30                                                                                                                       5000


                                                                                                                                    4000

           20
                                                                                                                                    3000


                                                                                                                                    2000
           10

                                                                                                                                    1000


            0                                                                                                                             0
                0            20             40             60                80         100                                                   0             20             40              60          80         100
                                                      bound                                                                                                                            bound
                                                  (a)                                                                                                                            (b)
                                                        Figure 2: Benchmark bc57sensors.p2neg from [3]
           100                                                                                                                     3500


                                                                                                                                   3000
            80

                                                                                                                                   2500
                                                                                                          hyperactive variables


            60
                                                                                                                                   2000
time (s)


                                                                                                                                   1500
            40


                                                                                                                                   1000

            20
                                                                                                                                   500


                0                                                                                                                    0
                    0   20        40             60       80           100        120         140                                         0       20             40             60         80    100        120         140
                                                      bound                                                                                                                            bound
                                                  (a)                                                                                                                           (b)
                                                                Figure 3: Benchmark eijk.S1238.S from [3]


DIFTS 2011                                                                                           49
for the standard sequential strategy using a non-           a way that is easy to update when the bound on
incremental solver. This line is intentionally not          the length of the execution paths is extended.
influencing the range of the y-axis, as it typically        The solver can be thought of as having tuned it-
grows so large that it would make the other re-             self towards verifying the property holds in the
sults hard to see.                                          exact same way over and over.
   From Fig. 1(a) it can be seen that the shortest
counterexample for benchmark irst.dme6 is of
                                                               Another way to look at the result presented in
length 53. The run times of the non-incremental
                                                            Fig. 3(a) is that by using the standard sequen-
SAT-solver clearly show the behavior that in-
                                                            tial strategy we are aiding the solver in proving
spired the opportunistic strategies of [15], i.e.
                                                            the unsatisfiability of the formula corresponding
the run time of the non-incremental solver for
                                                            to the counterexample of length 150, the largest
small satisfiable formulas is much smaller than
                                                            bound tested here. By forcing it through the se-
that of the largest unsatisfiable ones. It may
                                                            quence of formulas we force a direction on the
be observed from Fig. 2(a) that for benchmark
                                                            search that is natural to our problem descrip-
bc57sensors.p2neg the run times for the two
                                                            tion, and apparently this is helpful for the SAT-
smallest satisfiable formulas corresponding to
                                                            solver. For benchmarks with this kind of run
bounds 104 and 105 are relatively large. An easy
                                                            time behavior there is clearly no hope for any
satisfiable formula can however be found a little
                                                            opportunistic strategies.
further ahead at bound 106, thus the use of an
opportunistic strategy could possibly be benefi-
cial.                                                          An incremental solver can be started from
   Note that all results presented in this arti-            any arbitrary bound, and it is possible to pro-
cle demonstrate that the use of the incremental             ceed by increasing the bound by more than one
solver is crucial when performing the standard              every time a formula is solved. Using bound
sequential strategy. Fig. 3(a) presents run time            increments larger than one is one of the sim-
behavior for the benchmark eijk.S1238.S which               ple strategies we have tried in our experiments.
is the encoding of model checking a property that           This strategy should still be considered oppor-
holds on all execution paths of the model. This             tunistic because of the “missing information” it
implies that the formulas are unsatisfiable for all         causes for the solver, leading it further away from
bounds. Here, the crucial role that incremen-               the efficiency of incremental solving, and further
tal SAT solving often plays in solving BMC is               towards non-incremental behavior. Given the
even clearer. Whereas a non-incremental solver              small margin of error available for opportunis-
would take about 100 seconds to find that bound             tic approaches for satisfiable benchmarks, and
150 alone is unsatisfiable, the incremental solver          no chance of any performance improvement for
finds this result for all bounds from 0 to 150 se-          many unsatisfiable benchmarks, we need to be
quentially in half that time. This is typical be-           careful when applying these approaches. They
havior for many benchmarks corresponding to                 are however amongst the most natural ways of
model checking a property that holds. It seems              diversifying search strategies amongst nodes in
that in these cases the solver learns that the              an environment with parallel computation re-
property holds for all short execution paths in             sources.


DIFTS 2011                                             50
5    Parallel SAT solving                            seems that the interrupted solver is missing more
                                                     information than just conflict clauses. This was
There are two common architectures for parallel one of the reasons to look at the way the activ-
SAT-solvers [11]. The first is the classic divide- ity of variables propagates across bounds on the
and-conquer approach in which the formula is incremental SAT-solver runs.
split into multiple disjoint subformulas each of
which are then solved on a different comput-
ing node [4, 19]. The second approach is the 6 Variable activity
so called portfolio approach [10]. The basic idea
is that every computing node is running a SAT- To obtain data on the activity of variables the
solver that is attempting to solve the same for- SAT-solver was modified to print the activity of
mula. As modern SAT-solvers make some deci- all variables after each bound it completed. For
sions randomly their run time varies greatly be- each bound we are interested in which variables
tween runs. This makes the portfolio approach are the most active, and especially in whether
surprisingly efficient as it is able to decide the this activity remains high across several bounds.
satisfiability of the formula as soon as the fastest We consider a variable hyperactive if its activity
solver finishes. Further diversification may be is within the highest 2% of variables with non-
achieved by using different parameters on differ- zero activity.
ent computation nodes. Obviously opportunistic          The graphs labeled (b) in this article visual-
search strategies provide means for diversifica- ize the hyperactive variables. All variables that
tion when we are considering solving incremen- are hyperactive for at least one bound are rep-
tally encoded SAT formulas in a parallel envi- resented by an integer value on the y-axis of the
ronment.                                             graph. The variables are sorted on the y-axis
   The current implementation of our paral- by their index such that if we define y(v) as the
lelized BMC framework Tarmo can be seen as integer on the y-axis corresponding to the vari-
                                                                                        0
a parallelized incremental SAT-solver using the able with index v then for any v > v we have
                                                         0
portfolio approach. Each computing node is run- y(v ) > y(v).
ning an incremental SAT-solver in the conven-           Just like in the run time graphs the values on
tional sequential fashion. The novelty of Tarmo      the   x-axis of the graph represent bounds. If a
is that it allows sharing of conflict clauses be- variable was hyperactive starting from bound k
tween SAT-solvers even if they are working on up to but not including bound k 0 > k then a hor-
different bounds. The solvers operate otherwise izontal line was drawn in the graph from bound
independently, i.e. if one solver solves a formula k to k 0 at the y position corresponding to that
this does not stop the other solvers from at- variable. In other words for all variables v and all
tempting to solve that same formula. This choice bound intervals [k, k 0 ) on which v is hyperactive
was made after observing that interrupting a a line was drawn from (k, y(v)) to (k 0 , y(v)).
solver to make it “catch-up” with another breaks        One may observe that generally variables with
its ability to benefit from incremental SAT to the larger indices become active later. This is be-
full extent. As we made this observation in an cause in the solver each newly introduced vari-
environment where clause sharing takes place it able is given a larger index than all existing vari-


DIFTS 2011                                        51
           20                                                                                2500


                                                                                             2000
           15


                                                                     hyperactive variables
                                                                                             1500
time (s)


           10


                                                                                             1000


            5
                                                                                             500


            0                                                                                  0
                0   20   40   60         80   100   120   140                                       0   20   40   60        80   100   120   140
                                     bound                                                                              bound
                                   (a)                                                                            (b)
                                         Figure 4: Benchmark abp4.ptimoneg from [2]

ables, and for each bound a set of new variables is bound local.
created. For each bound a subset of the new vari-      We have generated graphs like the ones pre-
ables becomes hyperactive quickly, as the solver sented in this paper for a large set of bench-
runs into conflicts on the newly added clauses.     marks3 . We observe that on benchmarks with
   The hyperactive variables in the satisfiable a bound global variable activity the run time of
benchmarks irst.dme6 and bc57sensors.p2neg the non-incremental SAT-solver for the largest
are displayed in Fig. 1(b) and Fig. 2(b). Note bound solved is smaller than the time spent for
that although for each bound some of the new the incremental solver to get to the same bound
variables become hyperactive all these vari- and solve it. For benchmarks with a bound lo-
ables tend to remain hyperactive throughout the cal variable activity this is never the case and
whole process. This means that whenever a thus a opportunistic heuristic will not improve
bound is added the solver still runs into new con- performance.
flicts regarding variables that represent the state    Although we expect all hard satisfiable bench-
at smaller timepoints. We say that the activity     marks   to have a bound global variable activity it
of variables is bound global.                       is not the case that all unsatisfiable benchmarks
                                                    have a bound local variable activity. The bench-
   For the benchmark eijk.S1238.S the activity mark abp4.ptimoneg represented in Fig. 6 is an
graph looks very different. For each bound the example of an unsatisfiable benchmark with a
solver creates conflict clauses including the new bound global variable activity. Apparently the
variables, thus creating a new set of hyperactive correctness of the property is not implied within
variables, but there are only very few variables a short number of execution steps here, and the
for which hyperactivity is maintained. This is in incremental solver needs to evaluate large por-
line with observation on the run time behavior of tions of the search space for every bound. Note
this benchmark which also indicate that hardly also that for this benchmark an opportunistic ap-
any work has to be performed to find unsatisfi-
                                                       3
ability. We say that the activity of variables is        Available from http://users.ics.tkk.fi/swiering


DIFTS 2011                                                      52
proach may help to find unsatisfiable formulas at to the solver. Tolerance to bad choices for such
larger bounds faster.                             an incremental encoding could be achieved by
                                                  doing this in a parallel environment with some
                                                  opportunistic nodes.
7    Conclusions
In this article we have shown a relation between           References
the run time of the standard sequential strat-
                                                           [1] Armin Biere, Alessandro Cimatti, Ed-
egy for bounded model checking and the activ-
                                                               mund M. Clarke, and Yunshan Zhu. Sym-
ity of decision variables in solvers employing this
                                                               bolic model checking without BDDs. In
strategy. We can use this observation in a SAT-
                                                               Rance Cleaveland, editor, TACAS, volume
solver to predict during the search whether a
                                                               1579 of Lecture Notes in Computer Science,
more opportunistic strategy could be beneficial
                                                               pages 193–207. Springer, 1999.
for the search. This is especially useful for par-
allel solvers in which different threads may be            [2] Armin Biere, Keijo Heljanko, Tommi A.
executing different strategies.                                Junttila, Timo Latvala, and Viktor Schup-
   It is also easy to envision how these techniques            pan. Linear encodings of bounded LTL
could be useful for model checkers that use a                  model checking. Logical Methods in Com-
combination of truly different model checking                  puter Science, 2(5), 2006.
techniques such as PdTrav [5]. One could eas-
ily engineer a system which would do BMC for               [3] Armin Biere and Toni Jussila.      Hard-
some amount of time, after which the variable                  ware model checking competition 2007
activity could play a role in the decision on how              (HWMCC07). Organized as a satellite event
to continue. If the variable activity appears to               to CAV 2007, Berlin, Germany, July 3-7,
be bound local then the property is likely to hold             2007.
for the model and thus we may want to start do-
ing a complete model checking technique based              [4] Max Böhm and Ewald Speckenmeyer. A
on for example k-induction [16], Craig interpola-              fast parallel SAT-solver - efficient workload
tion [13] or BDDs [12] to prove this.                          balancing. Ann. Math. Artif. Intell., 17(3-
                                                               4):381–400, 1996.
   Another observation we made is that for de-
ciding the satisfiability of the last formula in an        [5] Gianpiero Cabodi, Paolo Camurati, Luz
incrementally encoded sequence of formulas it                  Garcia, Marco Murciano, Sergio Nocco,
can sometimes be faster to solve all formulas in               and Stefano Quer. Trading-off SAT search
the sequence. This raises the question whether                 and variable quantifications for effective un-
other applications of SAT solving that currently               bounded model checking. In Alessandro
rely on solving a single SAT formula could ben-                Cimatti and Robert B. Jones, editors, FM-
efit from the use of incremental problem encod-                CAD, pages 1–8. IEEE, 2008.
ings. Such encodings allow enforcing a search
direction on the SAT-solver that is natural to             [6] Alessandro Cimatti, Edmund M. Clarke,
the application and therefore possibly beneficial              Enrico Giunchiglia, Fausto Giunchiglia,


DIFTS 2011                                            53
     Marco Pistore, Marco Roveri, Roberto Se-    in Computer Science, pages 1–12. Springer,
     bastiani, and Armando Tacchella. NuSMV      2005.
     2: An opensource tool for symbolic model
     checking. In Ed Brinksma and Kim Guld- [14] Matthew W. Moskewicz, Conor F. Madi-
     strand Larsen, editors, CAV, volume 2404 of gan, Ying Zhao, Lintao Zhang, and Sharad
     Lecture Notes in Computer Science, pages    Malik. Chaff: Engineering an efficient SAT
     359–364. Springer, 2002.                    solver. In DAC, pages 530–535. ACM, 2001.

 [7] Stephen A. Cook.        The complexity of [15] Jussi Rintanen, Keijo Heljanko, and Ilkka
     theorem-proving procedures. In STOC,              Niemelä. Planning as satisfiability: parallel
     pages 151–158. ACM, 1971.                         plans and algorithms for plan search. Artif.
                                                       Intell., 170(12-13):1031–1080, 2006.
 [8] Martin Davis, George Logemann, and Don-
                                                  [16] Mary Sheeran, Satnam Singh, and Gunnar
     ald Loveland. A machine program for
                                                       Stålmarck. Checking safety properties us-
     theorem-proving. Commun. ACM, 5:394–
                                                       ing induction and a SAT-solver. In Warren
     397, July 1962.
                                                       A. Hunt Jr. and Steven D. Johnson, editors,
 [9] Niklas Eén and Niklas Sörensson. An exten-      FMCAD, volume 1954 of Lecture Notes in
     sible SAT-solver. In Enrico Giunchiglia and       Computer Science, pages 108–125. Springer,
     Armando Tacchella, editors, SAT, volume           2000.
     2919 of Lecture Notes in Computer Science,
                                                  [17] João P. Marques Silva and Karem A.
     pages 502–518. Springer, 2003.
                                                       Sakallah. GRASP: A search algorithm for
[10] Youssef Hamadi, Saı̈d Jabbour, and                propositional satisfiability. IEEE Trans.
     Lakhdar Sais. ManySAT: A parallel SAT             Computers, 48(5):506–521, 1999.
     solver. JSAT, 6(4):245–262, 2009.
                                                      [18] Siert Wieringa, Matti Niemenmaa, and
[11] Antti Eero Johannes Hyvärinen, Tommi A.              Keijo Heljanko. Tarmo: A framework for
     Junttila, and Ilkka Niemelä. Partitioning            parallelized bounded model checking. In
     search spaces of a randomized search. In              Lubos Brim and Jaco van de Pol, editors,
     Roberto Serra and Rita Cucchiara, editors,            PDMC, volume 14 of EPTCS, pages 62–76,
     AI*IA, volume 5883 of Lecture Notes in                2009.
     Computer Science, pages 243–252. Springer, [19] Hantao Zhang, Maria Paola Bonacina,
     2009.                                           and Jieh Hsiang. PSATO: A distributed
[12] Kenneth L. McMillan. Symbolic model             propositional prover and its application to
     checking. Kluwer, 1993.                         quasigroup problems. J. Symb. Comput.,
                                                     21(4):543–560, 1996.
[13] Kenneth L. McMillan. Applications of Craig
     interpolants in model checking. In Nico-
     las Halbwachs and Lenore D. Zuck, edi-
     tors, TACAS, volume 3440 of Lecture Notes


DIFTS 2011                                       54
DIFTS’11                                              Keyword Index


Keyword Index


           BMC                                          46

           combinational simplification                 30

           Data structure                               13
           Domain Specific Embedded Language (DSEL)     22
           Dynamic Spectrum Access                       3

           empirical results                            46
           Engine Independence                          22

           front-end engineering                        38

           liveness to safety                           38

           model checking                               30
           Model checking                               13

           On-the-fly                                   13

           Policies                                      3

           Real-time                                    13
           Reasoning                                     3

           SAT                                          46
           Satisfiability Modulo Theories (SMT)         22
           SMT solving                                   3
           solving strategies                           46
           stabilization                                38
           sweeping                                     30

           Timed automata                               13


                                                                 1
DIFTS’11                                Author Index


Author Index


           Arkoudas, Konstantine    3

           Brayton, Robert K.      38

           Chadha, Ritu             3
           Chiang, Jason            3
           Cleaveland, Rance       13

           Drechsler, Rolf         22

           Fey, Goerschwin         22
           Fontana, Peter          13
           Frehse, Stefan          22

           Grosse, Daniel          22

           Haedicke, Finn          22
           Hassan, Zyad            30

           Kuehlmann, Andreas      1

           Long, Jiang             38

           Mishchenko, Alan        38
           Morrison, Chris          2

           Ray, Sayak              38

           Somenzi, Fabio          30
           Sterin, Baruch          38

           Wieringa, Siert         46

           Zhang, Yan              30


                                                  1