Pr rogram m Proceeding gs Program Chairs Malay K. Ganai NEC Labs America USA Armin Biere Johannes Kelpler University Austria Technical Program Committee Clark W. Barrett New York University USA Armin Biere Johannes Kelpler University Austria Alessandro Cimatti Fondazione Bruno Kessler Italy Cindy Eisner IBM Haifa Research Lab Israel Malay K. Ganai NEC Labs America USA Ganesh Gopalakrishnan University of Utah USA Daniel Kroening Oxford University UK Robert P. Kurshan Cadence Design Systems USA Ken Mcmillan Microsoft Research USA Chao Wang Virginia Tech, USA Local Arrangement Chairs Sandip Ray UT Austin, USA Nadia Papakonstantinou NEC Labs America DIFTS 2011 i Preface The first DIFTS (Design and Implementation of Formal Tools and Systems) workshop was held at Austin, Texas on Nov 3rd, 2011, co-located with Formal Methods in Computer-Aided Design (FMCAD) Conference. The workshop emphasized the insightful experiences in tool and system design. It provided a forum for sharing challenges and solutions that are highly original with ground breaking results. It took a broad view of the formal tools area, and solicited contributions from various domains including decision procedures, verification, testing, validation, diagnosis, debugging, and synthesis. It provided a forum for discussing the engineering aspect of the tools, and various design decisions required to put them in practical use. The workshop provided a discussion forum for a pragmatic view of practicing formal methods. The workshop received 9 original submissions, out of which 4 were chosen under tool category and 2 were chosen under system category. There were also two invited talks: one given by Andreas Kuehlmann, Sr. VP of R&D at Coverity on “The pain of making research tools into software products”, and another by Chris Morison, Chief Architect, Real Intent Inc. on “From putty to product: what it takes to bring a verification tool to market”. First of all we thank FMCAD’s steering committee for their approval of the workshop. We also thank Anna Slobodova, David Rager, and Sandip Ray for their help in local arrangements and Nadia Papakonstantinou for her help in web updates. We sincerely thank the program committee members and sub reviewers for selecting the papers and providing candid review feedbacks to the authors. Last but not least, we thank all the authors for contributing to the workshop and to all the participants of the workshop. Malay K. Ganai and Armin Biere Program Chairs DIFTS 2011 ii DIFTS’11 Table of Contents Table of Contents The pain of making research tools into software products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Andreas Kuehlmann From Putty to Product: What it takes to bring a Verification Tool to Market. . . . . . . . . . . . . 2 Chris Morrison An Application of Formal Methods to Cognitive Radios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Konstantine Arkoudas, Ritu Chadha and Jason Chiang Data Structure Choices for On-the-Fly Model Checking of Real-Time Systems . . . . . . . . . . . . 13 Peter Fontana and Rance Cleaveland metaSMT: Focus on Your Application not on Solver Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Finn Haedicke, Stefan Frehse, Goerschwin Fey, Daniel Grosse and Rolf Drechsler A Study of Sweeping Algorithms in the Context of Model Checking . . . . . . . . . . . . . . . . . . . . . . 30 Zyad Hassan, Yan Zhang and Fabio Somenzi Enhancing ABC for stabilization verification of SystemVerilog/VHDL models . . . . . . . . . . . . . 38 Jiang Long, Sayak Ray, Baruch Sterin, Alan Mishchenko and Robert K. Brayton On Incremental Satisfiability and Bounded Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Siert Wieringa 1 DIFTS 2011 iii The Pain of Making Research Tools into Software Products Andreas Kuehlmann Sr. VP of R&D at Coverity President of IEEE Council on EDA (CEDA), and an IEEE fellow Abstract: Building a research implementation of a new algorithm and validating it for a small set of benchmarks is a key component of pushing the state of the art of design and verification tools. Moreover, if the algorithm works, it is often also the most exciting phase of the long journey to have a new idea used by many people. The less exciting and often more painful part includes the productization of the research into a form that can be applied by hundreds or thousands of users on a daily basis. In this talk we discuss some of the challenges that developers encounter in a product development environment from prototyping to code maintenance. The presentation will cover practical aspects of developing software in larger teams from product planning to deployment and support. We will also elaborate on some of the opportunities for new research to help the practical deployment and maintenance of software products. DIFTS 2011 1 From Putty to Product: What it takes to bring a Verification Tool to Market Dr. Chris Morrison Chief Architect, Real Intent Inc. Abstract: For a new company interested in developing a formal verification product, the temptation would be to believe that the majority of the time-to-market would be spent on developing the core formal technology. As a corollary, if that core technology could be bought off-the-shelf, the time-to-market would be greatly reduced. While that belief may have been true for the first generation of formal verification products, it is, unfortunately, not credible today. There are many "significant items" in state-of-the-art products that now consume considerable resources. The input language has changed from BLIF to SystemVerilog '09 and VHDL '08. Designs that the formal tools had to consider used to be single-clock and simple reset designs. Now they have 10+ clocks, derived and gated clocks, complex resets and multiple functional modes. The output is no longer a simple text file, but a hyperlinked report tying together VCD waveforms, source RTL, and state machine and schematic viewers. Manual control with many switches is no longer acceptable. A nearly automatic flow with very few switches is demanded. The tool must manage to deliver very high throughput for thousands of checks on multimillion- gate designs as well as acceptable user experience on high-latency checks. User productivity is the combination of tool run time, piloting convenience as well as the debug experience. The infrastructure needed to achieve these goals (which was originally simple and relatively quick to implement) is now very complex involving an intricate flow through a variety of substantial software packages and multiple databases. With the goal of reaching a broad class of customers, current and future verification products must be automated, easy to use and be able to reliably produce actionable results. Delivering on that requires attention to every part of the product rather than just the formal verification technology. DIFTS 2011 2 ∗ An Application of Formal Methods to Cognitive Radios Konstantine Arkoudas and Ritu Chadha and Jason Chiang Telcordia Research One Telcordia Drive Piscataway, NJ 08854 {konstantine, chadha, chiang}@research.telcordia.com ABSTRACT As wireless devices continue to proliferate, demand for We discuss the design and implementation of a formal policy access to RF spectrum is becoming increasingly pressing, system regulating dynamic spectrum access (DSA) for cog- and increasingly difficult to achieve. After all, the (usable) nitive radios. DSA policies are represented and manipulated spectrum is finite, while the demand for it continues to in- in a proof framework based on first-order logic with arith- crease without bounds. However, it has been recognized [13, metic and algebraic data types. Various algebraic operations 14] that problems of spectrum scarcity and overuse arise not combining such policies can be easily implemented in such so much from physical shortage of frequencies, but mostly a framework. Reasoning about transmission requests is for- as a result of the centralized and rigid way in which spectrum mulated as a satisfiability-modulo-theories (SMT) problem. allocation has been managed. Spectrum has been allocated Efficient SMT solvers are used to answer the corresponding in a static way: bands of the RF range are statically assigned queries and also to analyze the policies themselves. Further, to licenced users on a long-term basis, who then have ex- we show how to reason about transmission requests in an op- clusive rights to the corresponding region. Examples here timal way by modeling the problem as an SMT instance of include the 824–849 MHz, 1.85–1.91 GHz, and 1.930–1.99 weighted Max-SAT, and we demonstrate that this problem GHz frequency bands, which are reserved for FCC-licenced too can be efficiently solved by cutting-edge SMT solvers. cellular and personal communications services (PCS). Other We also show that additional optimal operations on trans- parts of the RF spectrum, by contrast, are unlicensed, and mission requests can be cast as classic optimization prob- anyone can transmit in those frequencies as long as their lems, and to that end we give an algorithm for minimizing power does not exceed the regulatory maximum. Conse- integer-valued objective functions with a small number of quently, large portions of the RF range remain underutilized calls to an oracle SMT solver. We present experimental re- for long periods of time, while other parts—such as the ISM sults on the performance of our system, and compare it to bands—are overutilized. previous work in the field. Observations like these provided the impetus behind the NeXt Generation (XG) Communications program, a tech- nology development program sponsored by DARPA [9], which 1. INTRODUCTION proposed opportunistic spectrum access as a means of cop- One of the world’s most prized physical resources is the ing with spectrum scarcity. The underlying approach has electromagnetic spectrum, and particularly its radio frequency come to be known as dynamic spectrum access (DSA), and (RF) portion, stretching roughly from 10 KHz to 300 GHz. has evolved into a general methodology for dealing with The RF spectrum is so valuable that its allocation is strictly spectrum scarcity [17]. The main idea is that a DSA net- regulated by world governments, and these days even small work has two classes of users: primary and secondary. Pri- parts of it can be sold for billions of dollars in spectrum auc- mary users are licenced to use a particular spectrum band tions. and always have full access to that band whenever they need ∗The research reported in this paper was performed in connection it. Secondary users are allowed to use such spectrum por- tions but only as long as their use does not interfere with the with contract number W15P7T-08-C-P213 with the U. S. Army operating needs of the primary users. The secondary users Communications Electronics Research and Development Engineer- ing Center (CERDEC). The views and conclusions contained in this are typically cognitive radios [4] that can dynamically adjust document are those of the authors and should not be interpreted as their operating characteristics (such as waveform, power, etc.). presenting the official policies or position, either expressed or im- Secondary users sense the spectrum for available transmis- plied, of the U. S. Army CERDEC, or the U. S. Government unless sion opportunities, determine the presence of primary users, so designated by other authorized documents. Citation of manufac- turers or trade names does not constitute an official endorsement and attempt to use the spectrum in a way that interferes as or approval of the use thereof. The U. S. Government is autho- little as possible with the activities of the primary users. rized to reproduce and distribute reprints for Government purposes Policies are used for specifying how secondary radio nodes notwithstanding any copyright notation herein. DIFTS 2011 3 are allowed to behave. Policies consist of declarative state- gramming language [3]. Athena’s rich logic, sort system, ments that dictate what secondary radios may or may not and programming language significantly facilitated the rep- do, without prescribing how they might do it [18]. Different resentation and manipulation of policies. Another innova- policies may be applicable in different regions of the world. tion of our paper is the use of SMT solvers [15] for rea- Even in one and the same region there may be multiple poli- soning with and about policies. The problem of determin- cies in place, reflecting different constraints imposed by dif- ing whether a transmission request is permissible by a given ferent regulatory bodies. Policy systems for cognitive radios policy can be couched as a satisfiability problem, namely, should be able to load new policies on the fly. the problem of determining whether the request is consistent There are two key questions in designing such a policy with the policy. So SAT solving would seem to be a natu- system, one representational and the other computational: ral fit for this domain. However, policy rules in this domain How should we represent policies, and how can we make ef- also make heavy use of equality and numeric relations, not to ficient use of them? Policies regulating DSA must be able to mention symbolic values such as transmission modes. Thus, express rules such as the following: “Allow a transmission if viewed as an abstract syntax tree, the body of a policy rule its frequency is in a certain range and the power does not ex- has an arbitrarily complicated boolean structure internally, ceed a certain limit, or if the transmission is an emergency.” with relational atoms at the leaves, whose semantics come Thus, for representation purposes, we need at least the full from standard theories such as those of arithmetic, equal- power of propositional logic, as well as numbers (both inte- ity, and so on. That is what makes SMT solving an ideal gers and reals), and numeric comparison relations. It would paradigm for this problem. We will show that the problem also be helpful to have algebraic datatypes at our disposal, of making optimal adjustments to transmission parameters so that we could, for example, represent the mode of a trans- can also be formulated quite naturally in this setting as a mission as one of a finite number of distinct symbolic val- weighted Max-SAT problem, many instances of which can ues, such as everyDay or emergency. A certain amount of be efficiently solved despite the problem’s high theoretical quantification would also be convenient, so that we could, complexity. Indeed, competitive SMT solvers such as CVC3 e.g., quantify over all possible transmissions. We must also [7] and Yices [6] are highly optimized programs that can be able to introduce arbitrary transmission parameters, so a readily decide extremely large formulas. As a quick com- strong type system, preferrably with subtyping, should be parison, another recently built policy system for spectrum available. Hence, first-order logic with arithmetic, and per- access, Bresap, which uses BDDs as its underlying technol- haps additional features such as algebraic datatypes and sub- ogy, cannot handle policies with more than 20 atomic ex- sorting, would appear to be a natural representation choice, pressions [1, p. 83]. By contrast, our system can easily han- provided that we can make efficient use of it, a point that we dle policies with many thousands of atomic expressions. will address shortly. The remainder of this paper is structured as follows. The So much for representation. What are the main computa- next section describes the representation of spectrum-access tional problems that a policy system must solve? The main policies in Athena.1 Section 3 discusses the integration of problem is this: determine whether an arbitrary transmission SMT solvers with Athena, and shows how to reduce the request should be allowed or denied, given the current set of problem of reasoning with spectrum-access policies to SMT policies. In fact, if this were all there was to using policies solving. In section 4 we model the problem of making opti- for spectrum access, things would be relatively simple. But mal changes to transmission requests as a Max-SAT prob- we usually want the policy engine to do more. For example, lem, and we describe how our system models and solves when a transmission request is not permissible, a simple “de- such problems. We also present an optimizing algorithm nied” answer is not very useful. The policy system should that uses a version of binary search in order to minimize also tell the requesting radio why the request was denied, integer-valued objective functions with very few calls to an and more importantly, what it would take for its request to oracle SMT solver (at most log(n) such calls, where n is be allowed. As a simple example, the system might tell the the maximum value that can be attained by the objective radio: “Your request cannot be allowed in its present form, function). This algorithm can be used to further optimize but it will be allowed if you only change your transmission transmissions, e.g., by minimizing the distance between the power to such-and-such level.” Further, in such cases the values desired by the cognitive radio and the values allowed policy system will usually have many choices as to which by the policy. Section 5 presents results evaluating the per- parts of the transmission request to modify in order to make formance of our policy engine. Finally, section 6 discusses it permissible. We then want it to make an optimal choice, related work and concludes. i.e., a choice that satisfies the original request to the greatest possible extent. 2. POLICY REPRESENTATION In this paper we present an implementation of a policy system in Athena, an interactive proof system for polymor- 1 While some Athena code is presented, knowledge of the language phic multi-sorted first-order logic with equality, algebraic is not necessary. Any reader familiar with formal methods and data types, subsorting, and a higher-order functional pro- functional programming languages should be able to follow the code. DIFTS 2011 4 The set of radio transmissions is treated as an abstract data normally be where the ellipses now appear in (1) (or in (2)). type with fields such as frequency, power, mode, etc. The Then the result of the said application is the sentence ob- abstract domain of transmissions is introduced in Athena as tained from body by replacing every free occurrence of the follows: quantified variable ?x:Transmission by the term t. In Athena code, this procedure is defined as follows: domain Transmission d e f i n e apply := A transmission field (or “parameter”) can then be introduced lambda (C t) as a function from transmissions to values of some appro- match C { (forall x ((_ x) <==> body)) => priate sort. For illustration purposes, we demonstrate the (replace-var x t body) } declaration of the following parameters: frequency, power, mode, and time. Additional parameters can be introduced as The built-in ternary Athena procedure replace-var is needed. such that (replace-var x t p) produces the result ob- tained from the sentence p by replacing every free occur- declare frequency: [Transmission] -> Int rence of variable x by the term p. As a concrete example, declare power: [Transmission] -> Real here is a sample AC: declare mode: [Transmission] -> Mode declare hours, minutes: [Transmission] -> Int d e f i n e AC1 := (forall ?x . Here Int and Real are built-in Athena sorts represent- allow ?x <==> frequency ?x in [5000 5700] & ing the sets of integers and real numbers, respectively, with power ?x < 30.0 & Int a subsort of Real. The sort Mode is a finite algebraic mode ?x = everyDay & hours ?x <= 11) datatype: It says that a transmission is (provisionally) permissible iff d a t a t y p e Mode := emergency | everyDay | ... its frequency is in the 5000 . . . 5700 MHz range; its power is There are two distinguished unary predicates on the set of less than 30.0 dBm; its mode is everyDay; and the trans- all possible transmissions: allow and deny. These predi- mission’s time is no later than 11 in the morning. Here in is cates are used by policies to specify which transmissions are a binary procedure defined as follows: considered permissible and which are not. More concretely, a policy is represented as a pair of sentences: an allow con- d e f i n e in := dition and a deny condition. An allow condition (or AC for lambda (x range) short) is of the form: 2 match range { [a b] => (a <= x & x <= b)} (forall ?x:Transmission. allow ?x <==> · · · ) (1) while a deny condition (DC) is of the form: (Note that variables do not need to be explicitly annotated with their sorts. Athena has polymorphic sort inference, so (forall ?x:Transmission. deny ?x <==> · · · ) (2) a Hindley-Milner algorithm will recover the most general Either condition may be absent, i.e., a policy may specify possible sorts of all variable occurrences.) We can now con- only an allow condition but no deny condition, or only a struct a policy with the above AC and with no DC as follows: deny condition but no allow condition. An absent condition d e f i n e policy-1 := (make-policy AC1 ()) is represented by the unit value, (). The constructor for policies is therefore a procedure that takes two conditions Our second policy example has both an AC and a DC: and simply puts them together in a two-element list: d e f i n e policy-2 := d e f i n e make-policy := (make-policy lambda (AC DC) [AC DC] (forall ?x . allow ?x <==> mode ?x = emergency) (forall ?x . There are two destructor or selector procedures, one for re- deny ?x <==> power ?x > 25.0)) trieving each component of the policy: d e f i n e [get-AC get-DC] := [first second] A key operation on policies is that of merging. New policies may be downloaded into the policy database at any given An important operation is the application of a given pol- time. A newly acquired policy must be integrated with the icy condition C of the form (1) or (2) to a term t of sort existing policy that is in place in order to produce a single Transmission. Let body be whatever sentence would merged policy with a new AC and a new DC. Merging is 2 therefore a binary operation. The current definition of merg- A more accurate term instead of allow would be “allow provi- ing two policies p1 and p2 is simple: it is disjunction on sionally,” because, as we will see, for a transmission request to be granted, the allow condition must hold and the deny condition must both the AC and DC. Specifically, the AC of the merged pol- not hold. With this caveat in mind, we will continue to use the term icy allows a transmission t iff p1 allows t or p2 allows it; allow in the interest of simplicity. and it denies t iff p1 denies it or p2 denies it. Merging is DIFTS 2011 5 implemented as an Athena procedure. Other alternative def- a yes/no answer from the policy system. But in general the initions of merging are possible, and these could be straight- policy system needs to solve more interesting versions of this forwardly implemented in the same way. As an example, problem, namely: If the transmission request is granted, find here is the result of merging policy-1 and policy-2, appropriate values for any missing parameters, in case the as these were defined above: request was incomplete; if the request is not granted, find ap- d e f i n e merged-policy := propriate values for the transmission parameters that would (merge-policies policy-1 policy-2) render the transmission permissible. Moreover, in the sec- ond case, we often want to compute values in a way that is >(get-AC merged-policy) optimal w.r.t. to the given request, e.g., so that the original transmission request is disrupted as little as possible. We Sentence: (forall ?v4:Transmission discuss such optimality issues in section 4. (iff (allow ?v4) The above problems are naturally formulated as satisfia- (or (and (and (<= 5000 bility problems. However, the most appropriate satisfiability (frequency ?v4)) framework here is not that of straight propositional logic, but (<= (frequency ?v4) rather that of satisfiability modulo theories, or SMT for short 5700)) (and (< (power ?v4) [15]. In the SMT paradigm, the solver is given an arbitrary 30.0) formula of first-order logic with equality and its task is to de- (and (= (mode ?v4) termine whether or not the formula is satisfiable with respect everyDay) to certain background theories. Typical background theories (<= (hours ?v4) of interest are integer and/or real arithmetic (typically linear, 11)))) (= (mode ?v4) but not necessarily), inductive data types (free algebras), the emergency)))) theory of uninterpreted functions, as well as theories of lists, arrays, bit vectors, etc. Hence, most of the function symbols >(get-DC merged-policy) and predicates that appear in the input formula have fixed in- terpretations, given by these background theories. An SMT Sentence: (forall ?v5:Transmission (iff (deny ?v5) solver will not only determine whether a sentence p is satisfi- (> (power ?v5) able; if it is, it will also provide appropriate satisfying values 25.0))) for the free variables and/or constants that occur in p. Athena is integrated with a number of SMT solvers, such (Note that while Athena accepts input in either infix or pre- as CVC3 and Yices; the one used for this project is Yices [6], fix form, sentences are output in prefix for indentation pur- mostly because it can solve Max-SAT problems. The main poses.) interface is given by a unary Athena procedure smt-solve, We define a transmission request as a set of constraints which takes an arbitrary first-order sentence p and attempts over some object of sort Transmission, typically just to determine its satisfiability with respect to the appropri- a free variable ranging over Transmission. These con- ate theories. If successful, smt-solve will return a list of straints, which are usually desired values for some—or all— values for all free variables and/or constants that occur in p. of the transmission parameters, can be expressed simply as Some examples: a sentence. A sample transmission request: d a t a t y p e Day := d e f i n e request-1 := (frequency ?t = 150 & Mon | Tue | Wed | Thu | Fri | Sat | Sun mode ?t = everyDay & hours ?t = 16) >(smt-solve ?x = 2 & ?y = 3.4 | ?x = 5 & ?d = Mon) This is a conjunction specifying that the frequency of the desired transmission (represented by the free variable ?t) List: [(= ?y 3.4) (= ?x 2) (= ?d Mon)] should be 150, its mode should be everyDay, and its time should be between 4:00 and 4:59 (inclusive) in the evening. The input to smt-solve here was a disjunction of two A transmission request may be incomplete, i.e., it may not conjunctions. The free variable ?d ranges over the datatype specify desired values for every transmission parameter. The Day; while ?x and ?y range over Int and Real, respec- preceding request was incomplete, since it did not specify tively. The output is a list of identities providing values for values for the power and minutes parameters. each free variable that render the input satisfiable. Because the argument to smt-solve is a sentence, i.e., a native 3. POLICY REASONING Athena data value, this provides a very flexible and high- The main task of the policy system is this: determine bandwidth programmable interface to SMT solvers. This in- whether the policies currently in place allow or deny a given terface proved extremely useful for our system. transmission request. This is in fact the simplest formula- The policy reasoning problems described earlier have a tion of the core reasoning problem, in that it only requires natural formulation in SMT. Specifically, consider an arbi- DIFTS 2011 6 trary transmission request tr, i.e., a first-order sentence with with the desired value vi of each transmission parameter pi . a free variable x ranging over Transmission (there may The weight wi reflects the importance that the radio attaches be other free variables as well). To determine whether tr to pi taking the value vi . The SMT solver will then try to is allowed by the current policy: (1) We construct a longer find a satisfying assignment for the request that has maxi- request, tr0 , which is the conjunction of (a) tr; (b) the appli- mal total weight. In that case, the request is not just a list cation of the current policy’s AC to x; and (c) the application of constraints [c1 · · · cn ] but a list of pairs of constraints of the negation of the current policy’s DC to x: and weights [[c1 w1 ] · · · [cn wn ]]. A sample trans- tr & (apply (get-total-AC) x) & mission request might then be: ˜ (apply (get-total-DC) x) d e f i n e weighted-request := [[(frequency ?t = 8000) 10] The procedure get-total-AC returns the AC of the cur- [(power ?t = 35.0) 15] rent (“total”) policy; likewise for get-total-DC. So tr0 [(hours ?t = 13) 30]] represents all the constraints imposed on the requested trans- mission, both by the original transmission request, tr, and indicating that the relative importance of the frequency being by the policy itself, whose AC must hold for the transmis- 8000 is 10, that of the power being 35.0 is 15, and that of sion variable x while its DC must not. (2) We then run hours being 13 is 30. Suppose further that the policy in smt-solve on tr’. If the result is a satisfying assignment, place has no DC and a disjunctive AC: then the request is granted, and all we need to do is report d e f i n e AC1 := values for any missing parameters (parameters that did not (forall ?t . allow ?t <==> figure in the given request). If, by contrast, smt-solve (frequency ?t in [5000 7000] & determines that tr’ is unsatisfiable, then tr in its given form power ?t <= 30.0 & hours ?t > 12) | must be rejected. In that case we make a blank request con- (frequency ?t in [6000 9000] & power ?t <= 40.0 & hours ?t <= 8)) sisting of the application of the policy’s AC to x conjoined with the application of the negation of the policy’s DC to d e f i n e policy := (make-policy AC1 ()) x, and run smt-solve on it. This blank request therefore imposes no constraints at all on the transmission apart from Now consider evaluating weighted-request with re- those imposed by the policy. If the result is a satisfying as- spect to this policy. Clearly, the request cannot be allowed signment, we provide it to the user, otherwise we report that as is. We can make it permissible in more than one way: the current policy is unsatisfiable. (a) we could demand a change of frequency and power in The Athena code for this algorithm is expressed in a pro- accordance with the values prescribed by the first branch of cedure evaluate-request, which accepts an arbitrary AC1, while keeping the hours parameter to the requested request and processes it as described above. Here is the out- value of 13; or (b) we could demand a change of the hours put for the example of the previous section: parameter only, in accordance with the second disjunctive >(evaluate-request request-1) branch of AC, while keeping the desired frequency and power values; or (c) we could demand that all three parameter val- Transmission allowed. Appropriate values ues change. From these possibilities, only (a) is optimal, in for missing parameters: that it honors the original request to the greatest extent pos- [(= (power ?t1) 20.0)] sible (as determined by the given weights). Running this example in our system results in the following output: 4. COMPUTING OPTIMAL ADJUSTMENTS >(evaluate-request weighted-request) TO TRANSMISSION PARAMETERS The following parts of the request When a transmission request is denied, there are usually could not be satisfied: many different ways of modifying it so as to make it per- missible. For instance, the solver could change the desired (= (power ?t) 35.0) (= (frequency ?t) 8000) transmission’s time; or it could change its frequency; or it could change its power and mode; or it could change all of Here is a maximally satisfying assignment: the above. Some of these actions may be preferable to oth- [(= (mode ?t) everyDay) ers. For instance, the radio might be less willing to change (= (frequency ?t) 5999) the time of the transmission, or its power level, rather than (= (power ?t) 30) (= (hours ?t) 13)] the frequency. In such cases we want the policy system to return a satisfying assignment that is optimal in the sense of By contrast, if we had changed the weight of the hours respecting as many such preferences of the radio as possible. parameter from 30 to 20, the result would then change the Our system achieves this in a flexible way by formulating hours parameter instead of the frequency and power, since the problem as an (SMT) instance of Max-SAT. In its trans- retaining the values of the two latter parameters would result mission request, the radio can provide a weight wi along in a maximum weight of 25. DIFTS 2011 7 Note that weights can be arbitrary integers or a special z in [720 800]) | “infinite” token, indicating a hard constraint that must be (x in [500 600] & y in [30 40] & satisfied. In fact this infinite weight is the one that Athena z in [920 925]) attaches to the constraints obtained from the AC (and the negation of the DC) of the current policy. But radios can Suppose also that the desired values are x = 13, y = 15, also use infinite weights in their requests. z = 922. The task is to find values for these variables that Occasionally there may be additional requirements on the satisfy c while diverging from the desired values as little as assignments returned by the policy system, beyond honor- possible. We can readily model this problem in a form that ing the original request to the greatest extent possible. For is amenable to smt-solve-and-minimize as follows. example, if a parameter value must change, we may want First we define the objective-function term t, as the sum of the new value to be as close as possible to the original re- the three differences: quested value. If we are not allowed to transmit at the exact d e f i n e t := (?d-x:Int + ?d-y:Int + ?d-z:Int) desired time, for instance, we may want to transmit as close to it as possible (as can be allowed by the currently installed with the difference terms defined as follows: policies). To meet such requirements we generally need the d e f i n e d-x-def := ability to perform optimization by minimizing some objec- (ite (x > 13) (?d-x = x - 13) tive function, in this case the absolute difference between de- (?d-x = 13 - x)) sired and permissible parameter values. Most SMT solvers d e f i n e d-y-def := do not perform optimization (apart from Max-SAT, in the (ite (y > 15) (?d-y = y - 15) case of Yices, though see below), but we can efficiently im- (?d-y = 15 - y)) d e f i n e d-z-def := plement an integer optimizer in terms of SMT solving. The (ite (z > 922) (?d-z = z - 922) idea is to use binary search in order to discover the opti- (?d-z = 922 - z)) mal cost with as few calls to the SMT solver as possible: at most O(log n) calls, where n is the maximum value that the Here ite is a simple if-then-else procedure: objective function can attain. Specifically, let c be an arbi- trary constraint that we wish to satisfy in such a way that d e f i n e ite := lambda (c x y) ((c ==> x) & (˜ c ==> y)) the value of some “cost” term t is minimized, where max is the maximum value that can be attained by the cost function Assume that we do not know the exact maximum value that (represented by t). (Tf this value is not known a priori, it t can attain, but we know that it cannot be more than 106 . can be taken to be the greatest positive integer that can be We can then solve the problem with the following call: represented on the computer.) The algorithm is the follow- ing: We first try to satisfy c conjoined with the constraint d e f i n e diff-clauses := that the cost term t is between 0 and half of the maximum (d-x-def & d-y-def & d-z-def) d e f i n e query := (c & diff-clauses) possible value: 0 ≤ t ≤ (max div 2). If we fail, we recur- sively call the algorithm and try to satisfy c augmented with >(smt-solve-and-minimize query t 1000000) the constraint (max div 2) + 1 ≤ t ≤ max. Whereas, if we succeed, we recursively call the algorithm and try to sat- List: [(= ?x 13) (= ?y 15) (= ?z 800) isfy c augmented with the constraint 0 ≤ t ≤ (max div 4). (= ?d-x 0) (= ?d-y 0) (= ?d-z 122)] Some care must be taken to get the bounds right on each call, but this algorithm is guaranteed to converge to the min- This solution was found by a total of 8 calls to the SMT imum value of t for which c is satisfied, provided of course solver (for a total time of about 100 milliseconds). Why that the original constraint c is satisfiable for some value of were only 8 calls required when we started the binary search t. This algorithm is implemented by a ternary Athena proce- with a maximum of 106 ? One would expect about log 106 dure smt-solve-and-minimize, such that calls to the smt solver, i.e., roughly 20 such calls. However, our implementation uses the information returned by each (smt-solve-and-minimize c t max) call to the SMT solver to speed up the search. That often returns a satisfying assignment for c that minimizes t (whose results in drastic shortcuts, cutting down the total number of maximum value is max). iterations by more than a factor of 2. For example, suppose that x, y, and z are integer variables This procedure enables our system to optimize any quan- to be solved for: tity that can be given an integer metric. Moreover, unlike independent branch-and-bound algorithms for integer pro- d e f i n e [x y z] := [?x:Int ?y:Int ?z:Int] gramming, it has the advantage that it allows not just for nu- meric constraints, but for arbitrary boolean structure as well, subject to the following disjunctive constraint: along with constraints from other theories such as reals, lists, d e f i n e c := arrays, bit vectors, inductive datatypes, etc. While more so- (x in [10 20] & y in [1 20] & phisticated approaches to optimization in the SMT paradigm DIFTS 2011 8 are beginning to emerge (e.g., [2]), the implementation de- scribed here has been quite adequate for our purposes. 5. PERFORMANCE In order to test our system we wrote an Athena program that generates test instances, with two independently con- trollable variables: parameter number and policy number. In particular, we defined a procedure make-policy-set with two parameters, param-num and policy-num. The output is a list of policy-num policies, where each policy involves param-num transmission parameters. The latter came from Figure 1: SMT solving times for processing regular a pre-declared pool of 100 transmission parameters, half of transmission requests. them integer-valued and the other half real-valued. We dis- tinguished the following types of policies: • permissive policies, which have only an AC; among all the preceding types. Specifically, 20% of the policies in L are ordering/N policies. About 1/3 of these • prohibitive policies, which have only a DC and no AC; ordering/N policies are permissive, 1/3 are prohibitive, and 1/3 mixed. The next 20% of L consists of equational/N poli- • mixed policies, which have both. cies, and these are again evenly split between permissive-, • Ordering/N policies. These policies are parameter- prohibitive-, and mixed-policy subsets. The third 20% con- ized over N > 0. Specifically, an ordering/N policy is tains inequational/N policies, and so forth. Once a list L of one whose AC and/or DC is of the following form: policies is thereby obtained, we combine them all into a sin- gle policy by folding the merge-policies operator over (forall ?t:Transmission . D ?t <==> L, with the empty-policy as the identity element. It is (f i1 ?t R1 ci1 ) & ... & (f ik ?t Rk cik )) with respect to this merged policy that we tested transmis- sion requests. The requests were generated randomly. where D is either allow or deny, and ∀ j ∈ {1, . . . , k}: Figure 1 shows the SMT-solving times for processing plain f ij is a transmission parameter; Rj ∈ {<, >, <=, >=}; transmission requests (i.e., without weights attached to the and cij is a constant number of the appropriate sort. We various transmission parameters), for various combinations refer to the ordering constraints (f ij ?t Rj cij ) as of policy-set sizes and transmission parameter numbers, where the atoms of the AC (or DC). An ordering/N policy the policy sets are evenly mixed as described above. Fig- may be permissive, prohibitive, or mixed, but its total ure 2 shows the corresponding times for optimal processing number of atoms (i.e., the number of the AC’s atoms of transmission requests, i.e., requests that attach weights to plus the number of the DC’s atoms) must equal N . each of the transmission parameters and are solved as Max- SAT problems. In order to stress-test the implementation • Equational/N policies. The AC and/or DC of such a further, we repeated the experiments with sets of policies policy is of the same form as shown above, except that that were more structurally complex, doing away with sim- each Ri is the identity relation. These identities are the ple ordering, equational, and inequational policies, and us- atoms of the condition. The total number of atoms (of ing instead only range and disjunctive-conjunctive policies. the AC and DC together) must be N . The corresponding results are shown in Figure 3 and in Fig- • Inequational/N policies. Same as above, except that the relation in question here is inequality. • Range/N policies. The AC and/or DC of such a policy is of the same form as that of an ordering/N policy, ex- cept that each Ri is the relation in, and each constant ci is a range [a b]. • Disjunctive-Conjunctive/N policies. The AC and/or DC of such a policy is a disjunction of conjunctions, where each atom is of the form (f x ?t in [· · · ]). The number of atoms are required to add up to N . A call of the form (make-policy-set N P) returns a list L containing P policies, each of which involves N trans- Figure 2: Max-SAT SMT solving times for optimized mission parameters. This list is roughly equally partitioned processing of transmission requests. DIFTS 2011 9 to get the timing data. In most cases, the reported system time (the sys entry of time’s output) was 0.000, i.e., too small to be reliably measured, and hence the time reported here is real (wall clock) time, which includes time seg- ments spent by other processes and times during which the SMT solver was blocked (e.g., waiting for I/O to complete). Therefore, the times reported here are overly conservative; the actual times spent on SMT solving are smaller. 6. RELATED WORK & CONCLUSIONS Figure 3: Solving times for processing of transmission We have presented an implementation of a policy system requests with structurally complex policies. for dynamic spectrum access. Policies are represented and manipulated in Athena, a formal proof system for first-order logic with polymorphism, subsorting, inductive datatypes, ure 4. With these sort of structurally complex policies in arbitrary computation, and definitional extension. Most of place, the highest increase occurs in optimal processing of these features have proven useful in the engine’s implemen- transmission requests with more than 50 policies and param- tation; others would become useful if the engine were to be eters. extended so that it used, e.g., structured ontologies instead The graphs show only the time spent on SMT solving, of flat data types. which is the bulk of the work that needs to be done when pro- Previous work in this area includes BBN’s XGPLF [8] and cessing transmission requests. We do not include the time SRI’s Coral [10, 11]. XGPLF does not have a formal se- spent on translating from Athena to SMT notation and back. mantics and is limited in what it can express (it is based on This translation is a straightforward linear-time algorithm OWL [19]). Moreover, XGPLF cannot model inheritance. (linear in the size of the formula to be translated, on average, To the best of our knowledge, the only implementation of since it uses hash tables for the necessary mappings between a policy reasoning engine based on XGPLF is the one built Athena and SMT namespaces), and the time spent on it in by the Shared Spectrum Company [12]. They used SWI- most cases is neglibible. For huge policies containing hun- Prolog as the underlying reasoning engine. In field tests us- dreds of thousands of nodes, the translation does take longer, ing resource-limited radio devices, SSC replaced the Prolog though still typically a fraction of a second. Virtually all of reasoner with “a simplified reasoner developed in C/C++” the translation cost is incurred when translating the policy, [12, p. 504]; no details are provided regarding the design, since the transmission requests are tiny by comparison. Ob- implementation, or correctness of this simplified reasoner. serve, however, that there is no reason to be translating the SSC’s engine can only return yes/no answers to transmission entire policy anew each time the engine needs to process a requests. new request. Instead, the (merged) policy can be translated Coral (Cognitive Radio Policy Language) is a new lan- off-line, once, and subsequently cached in SMT notation in guage specifically designed for expressing policies govern- some file. Thus, the translation time is an one-time, off-line ing the behavior of cognitive radios. Like Athena, Coral of- cost. fers a rich and extensible first-order logical language that All experiments were performed on a 2.4 GHz Intel Core includes arithmetic and allows for the introduction and def- i3-370M dual-core processor with 4GB RAM, running Win- inition of new function symbols. It also features subtyping dows 7. The time command of the Cygwin shell was used and can therefore express inheritance and ontologies. Some notable differences from Athena include: (a) Unlike Coral, Athena has polymorphism. Thus, e.g., it is not necessary to introduce lists for integers and lists for booleans sepa- rately; a generic parameterized definition is sufficient. (b) Athena has automatic sort inference based on the Hindley- Milner algorithm, which is convenient in practice because it allows for shorter—and usually cleaner—specifications. (c) Athena has a general-purpose formally defined program- ming language that can seamlessly interact with its logical layer, and which can be used to dynamically construct and manipulate logical formulas very succinctly. This program- ming language offers procedural abstraction, as well as side effects through mutation, including reference cells and vec- Figure 4: Max-SAT SMT solving times for transmission tors, features which are often important for efficiency. The requests with structurally complex policies. ability to compute with terms and sentences as primitive data DIFTS 2011 10 values was very useful. While Coral can express some com- system that was recently implemented at Virginia Tech [1, putations via universally quantified identities which can then 5]. Unlike the other systems discussed above, Bresap is lim- be interpreted as executable rewrite rules by a tool such as ited to policies that can be adequately represented as finite Maude [16], it does not offer procedural abstraction, i.e., it Boolean functions, which it encodes as binary decision di- is not possible to define arbitrary procedures. agrams (BDDs). Therefore, unlike Coral and our system, Two policy engines based on Coral have been implemented Bresap is not suitable as an expressive language for specify- [10, 11]. The initial implementation used Prolog as the un- ing policies. For instance, it has no facilities for introducing derlying reasoning engine, and was only able to return yes/no new policy concepts and/or rules. Indeed, Bresap is not a answers to transmission requests. The second (and current) language. It is a system that reads certain types of policies implementation uses a custom-made proof system to rea- expressed in XML and converts them to BDDs. This is done son about transmission requests. The system has four kinds by assigning a distinct Boolean variable to each atomic ex- of proof rules: forward chaining, backward chaining, par- pression that appears in the policy. For instance, a policy tial evaluation based on conditional rewriting, and constraint such as “allow transmission in the frequency range a . . . b propagation and simplification. The proof system is im- if the power does not exceed m” would result in the intro- plemented in the rewriting engine Maude. One issue with duction of two distinct Boolean variables x1 and x2 , with this implementation is that a positive answer is given only x1 corresponding to the atom a ≤ f ≤ b (where f stands if the transmission request is an exact match of the opera- for the transmission’s frequency); and with x2 correspond- tive policy conditions. But that is not likely to be the case ing to the atomic expression p ≤ m (where p stands for the in practice. Most transmission requests are likely to be in- transmission’s power). The accepting condition could then complete or to diverge from the policy, at least in some small be represented by the boolean function x1 ∧ x2 . Graph al- degree. In such cases, the Coral implementation does not re- gorithms are used to merge different policy BDDs. With a turn values for the relevant transmission parameters. Rather, single BDD in place representing the result of merging all it returns an arbitrarily complicated first-order formula rep- active policies, Bresap can then accept an incoming trans- resenting all the possible “opportunity constraints” that the mission request and evaluate it. A transmission request must radio could use to modify its request so as to make it per- be an attribute-value list of the form a1 = v1 , . . . , an = vn , missible. But that is not likely to be of much use to the where ai is a transmission parameter (such as mode or fre- requesting party. It is not realistic to require that whoever quency) and vi is the corresponding desired value. Notably, made the transmission request should be able to understand Bresap is capable of handling underspecified (incomplete) and reason about arbitrarily complicated logical formulas in requests. It is also capable of attaching costs to transmission order to understand the policy system’s output. parameters (where a given cost indicates the penalty paid for A second issue with this implementation is efficiency. The changing the value of that parameter), and then modifying Coral team has not published precise benchmarks describing the parameter values of a transmission request in a way that their engine’s performance for variable numbers and types of minimizes the overall cost. policies (and for variable numbers of transmission parame- A drawback of Bresap is that the underlying policy repre- ters), but they have released a demo of their implementation sentation framework, that of BDDs (or propositional logic, that can be used to evaluate transmission requests with re- more generally), lacks semantics for the numeric constraints spect to policies that have a fixed set of transmission param- that pervade spectrum access policies. To put it simply, BDDs eters, namely: location (latitude and longitude); time (hours, do not know arithmetic, and thus do not have the where- minutes, and seconds); sensed power; emissions; network withal to “understand” that < is transitive, that 1 + x is id; and role (slave/master). Even with this fairly small set of greater than x, that 10 and 14 − 4 are the same object, and transmission parameters, and with only 11 active policies in so on. Whereas an SMT solver immediately realizes that place, it has been reported [1, p. 18] that evaluating a sin- f < 10 conjoined with f ≥ 30 is contradictory, because gle transmission request took the Coral engine 58 seconds, it uses a dedicated reasoning procedure for arithmetic, in and resulted in an output formula comprising 75 constraints. the approach of Bresap these two atoms would result in the In the preceding section we saw that our implementation can introduction of two Boolean variables, x1 and x2 , the for- evaluate transmission requests with several dozens of param- mer representing f < 10 and the latter representing f ≥ eters and with over 50 complex policies in place in a fraction 30. Thus, the conjunction of x1 and x2 is represented by of a second; this is so even when the requests are processed the innocuous-looking Boolean function x1 ∧ x2 , a perfectly optimally. Moreover, the Coral engine has no notion of opti- consistent condition. To represent the fact that in the present mality. It does not allow the requesting party to specify that context this is actually inconsistent, one has to append to it some parameters are more important than others, or to ask ad hoc clauses such as C = (¬x1 ∨ ¬x2 ) ∧ (x1 ∨ x2 ). This that the request should be satisfied to the greatest extent pos- extra information, C, conjoined with x1 ∧ x2 , would then al- sible, or with as little divergence from the requested values low us to conclude that the condition is unsatisfiable. That is, as possible. in fact, what Bresap does: It “semantically analyzes” the var- Another policy engine for spectrum access is Bresap, a ious atomic expressions in the policy in order to generate ad- DIFTS 2011 11 ditional constraints—so-called “auxiliary rules” [1, section Verification (CAV ’07), volume 4590 of Lecture Notes 3.5]—that prevent the engine from taking “illogical” paths in Computer Science, pages 298–302. in the BDD. This approach has some disadvantages.3 First, Springer-Verlag, 2007. Berlin, Germany. it is an essentially ad hoc encoding of arithmetic facts, the [8] BBN Technologies. XG Working Group, XG Policy generation of which should not be part of the policy engine’s Language Framework, Request for Comments, version trusted computing base. Second, the additional encoded in- 1.0. www.ir.bbn.com/projects/xmac/pollang.html, formation, even if it is polynomial in size, can significantly 2004. blow up the resulting BDD. Other efficiency issues in Bre- [9] Darpa. News Release for Next Generation (XG) sap include the conversion algorithm that produces the BDD Communications Program Request for Comments. from the atomic Boolean expressions of the policy. This al- www.darpa.mil/news/2004/xg_jun_04.pdf. gorithm has exponential time complexity in the number of [10] G. Denker, D. Elenius, R. Senanayake, M.-O. Stehr, expression variables. As a result, Bresap is not currently and D. Wilkins. A Policy Engine for Spectrum able to handle policies with more than 20 atomic Boolean Sharing. In New Frontiers in Dynamic Spectrum expressions [1, p. 83]. Access Networks, 2007. DySPAN 2007. 2nd IEEE In conclusion, we believe that the combination of a pro- International Symposium on, pages 55 –65, April grammable and expressive formal framework such as Athena 2007. with an efficient SMT solver is a highly suitable implemen- [11] D. Elenius, G. Denker, M. O. Stehr, R. Senanayake, tation vehicle for spectrum access policy engines. It com- C. Talcott, and D. Wilkins. CoRaL–Policy Language bines rich expressivity with efficient performance, a consid- and Reasoning Techniques for Spectrum Policies. eration that is likely to be crucial for cognitive radios. To Policies for Distributed Systems and Networks, IEEE the best of our knowledge, SMT solvers have not been used International Workshop on, pages 261–265, 2007. before for reasoning about policies, although we believe that [12] F. Perich. Policy-Based Network Management for they are ideal for this task. Indeed, we believe that there NeXt Generation Spectrum Access Control. In New is not much that is special about spectrum access, and that Frontiers in Dynamic Spectrum Access Networks, the same approach we have introduced here could be used to 2007. DySPAN 2007. 2nd IEEE International represent and reason about policies in other domains. Symposium on, pages 496 –506, April 2007. [13] FCC Spectrum Policy Task Force. Report of the 7. REFERENCES spectrum efficiency working group. Available from www.fcc.gov/sptf/files/SEWGFinalReport_1.pdf. [1] A. A. Deshpande. [14] I. F. Akyildiz and W.-Y. Lee and M. C. Vuran and S. Policy Reasoning for Spectrum Agile Radios. Masters Mohanty. NeXt Generation / Dynamic Spectrum thesis, Electrical and Computer Engineering Access / Cognitive Radio Wireless Networks: A Department, Virginia Tech, 2010. Survey. Computer Networks Journal (Elsevier), [2] A. Cimatti and A. Franzn and A. Griggio and R. 50:2127–2159, September 2006. Sebastiani and C. Stenico. Satisfiability Modulo the [15] L. de Moura and B. Dutertre and N. Shankar. A Theory of Costs: Foundations and Applications. In Tutorial on Satisfiability Modulo Theories. In TACAS 2010, pages 99–113, 2010. Computer Aided Verification, volume 4590 of LNCS, [3] K. Arkoudas. Athena. pages 20–36. Springer, 2007. http://www.pac.csail.mit.edu/athena. [16] M. Clavel and F. Durán and S. Eker and P. Lincoln and [4] B. A. Fette. Cognitive Radio Technology. Academic N. Martı́-Oliet and J. Meseguer and C. Talcott. The Press, 2nd edition, 2009. Maude 2.0 System. In R. Nieuwenhuis, editor, [5] B. Bahrak and A. A. Deshpande and M. Whitaker and Rewriting Techniques and Applications (RTA 2003), J. M. Park. BRESAP: A Policy Reasoner for number 2706 in LNCS, pages 76–87. Springer, 2003. Processing Spectrum Access Policies Represented by [17] Q. Zhao. A Survey of Dynamic Spectrum Access. Binary Decision Diagrams. In New Frontiers in IEEE Signal Processing Magazine, 24(3):79–89, Dynamic Spectrum, 2010 IEEE Symposium on, pages 2007. 1 –12, April 2010. [18] R. Chadha and L. Kant. Policy-Driven Mobile Ad hoc [6] B. Dutertre and L. de Moura. The Yices SMT Solver. Network Management. Wiley-IEEE Press, 2007. Tool paper, available online from [19] World Wide Web Consortium. OWL 2 Web Ontology yices.csl.sri.com/tool-paper.pdf. Language Document Overview. Available from [7] C. Barrett and C. Tinelli. CVC3. In W. Damm and www.w3.org/TR/owl2-overview/, 2009. H. Hermanns, editors, Proceedings of the 19th International Conference on Computer Aided 3 Note that this criticism is not peculiar to Bresap. It applies to all policy engines that represent semantically rich policies (requiring at least linear arithmetic) as Boolean functions. DIFTS 2011 12 Data Structure Choices for On-the-Fly Model Checking of Real-Time Systems Peter Fontana Rance Cleaveland Department of Computer Science Department of Computer Science University of Maryland, College Park University of Maryland, College Park Email: pfontana@cs.umd.edu Email: rance@cs.umd.edu Abstract—This paper studies the performance of sparse- checking based on proof search: a formula correspond- matrix-based data structures to represent clock zones (con- ing to the assertion that a timed automaton satisfies a vex sets of clock values) in an on-the-fly predicate equation mu-calculus property can be checked in a goal-driven system model checker for timed automata. We analyze the fashion to determine its truth. Zhang and Cleaveland [6] impact of replacing the dense difference bound matrix demonstrated the efficiency of this approach vis à vis (DBM) with both the linked-list CRDZone and array-list CRDArray data structure. From analysis on the paired- other real-time model-checking approaches. example-by-example differences in time performance, we In this paper we consider the special model check- infer the DBM is either competitive with or slightly faster ing case of timed automata and timed modal equation than the CRDZone, and both perform faster than the systems representing safety properties (also studied in CRDArray. Using similar analysis on space performance, [6]), for which there are still many opportunities for we infer the CRDZone takes the least space, and the DBM performance improvements. One component of such a takes less space than the CRDArray. model checker that has a noticeable influence on perfor- mance is the data structure for the sets of clock values. I. I NTRODUCTION When analyzing safety properties, each desired set of Automatic verification of real-time systems is under- clock values forms a convex set of clock values, or clock taken using notations for verifiable programs and check- zone (see Definition 3). The conventional way to store able specifications (see [1]–[6]). One common program a clock zone is as a difference bound matrix (DBM) notation is timed automata [7]. There are specification (see Definition 4) [15], which stores the constraints as notations such as timed computation tree logic (TCTL) a matrix. This approach is used by UPPAAL [16] and [1], [8] and timed extensions of a modal mu-calculus, described in [17]. To potentially save space and time, including one in [3] and another given in [5]. Specifica- instead of representing the set of constraints as a matrix, tions in a timed modal mu-calculus can be written as lists one can represent the set as an ordered linked path of of equations, known as timed modal equation systems constraints where any clock difference not on the path [5], [6], [9]. For information on the untimed modal-mu has the implicit constraint < ∞. If we generalize this calculi, see [10]–[12], and see [10], [11] for information to allow for a union of zones to be represented by a on modal equation systems. directed graph of constraints (representing a tree of paths One approach to model checking timed automata as opposed to a single path), we get a clock restriction with timed modal mu-calculus specifications is to use diagram (CRD) [18]. If we compress the nodes to have predicate equation systems (PES), which were invented them represent upper and lower bound constraints as well independently by Groote and Willemse (as parameter- as explicitly encoding both valid and invalid paths, we ized boolean equation systems) [13] and by Zhang and get a clock difference diagram (CDD) [2]. These two Cleaveland [6], [9]. Predicate equation systems provide structures are extensions of binary decision diagrams a general framework for program models including para- (BDDs) (see [19] for information). metric timed automata [6] and Presburger systems [14]. To improve performance, we take the above idea of They also admit a natural on-the-fly approach to model a linked implementation and incorporate the sparseness of the implementation of CRDs while simplifying (or Research supported by NSF Grant CCF-0820072. shrinking) the structure to only support a single clock DIFTS 2011 13 zone (CRDs and CDDs in general can encode unions of • I : L −→ Φ(CX) gives a clock constraint for each clock zones). This simplified structure is a sparse sorted location l. I(l) is called the invariant of l. linked-list implementation of a DBM, the CRDZone (see • E ⊆ L × Σ × Φ(CX) × 2CX × L is the set of edges. Definition 5). We also implement an array-list version In an edge e = (l, a, φ, Y, l0 ) from l to l0 with action of the CRDZone, the CRDArray (see Definition 6). A a, φ ∈ Φ(CX) is the guard of e, and Y represents CRDZone may be seen as a sparse sorted linked-list the set of clocks to reset to 0. implementation of a DBM, and the CRDArray a sparse array-list implementation of the CRDZone. We examine Some sources [6], [23] and our PES checker allow the time and space performance of all three clock zone clock assignments (x1 := x2 ) in addition to clock resets implementations: the matrix DBM, linked-list CRDZone on edges, other sources [17] allow constraints on clock and array-list CRDArray. differences and other sources [1] allow states to be The contributions of this paper are: labelled with atomic propositions that each state satisfies. Timed automata use clock valuations ν ∈ V (V = • We run experiments comparing time and space CX −→ R≥0 is the set of all clock valuations), which performance of a model checker (on safety (reach- at any moment stores a non-negative real value for each ability) properties) with the DBM, CRDZone and clock x ∈ CX . The semantics of a timed automaton are CRDArray data structure implementations. described as an infinite-state machine, where each state • We formalize and extend the analysis style per- is a location-valuation pair (l, ν). Transitions represent formed in the model checking experiments of [2], either time advances or edge executions (performing an [6], [9], [18], [20], [21] by utilizing paired data action). For a formal definition of the semantics of a (each implementation checked the same examples) timed automaton, see [7]. and applying descriptive statistics on the paired example-by-example differences on time and space consumption. See Section VI for details on the Real-Time Model Checking Example 1 (Example of a timed automaton). Consider the timed automaton in Figure 1, which models a train statistics and Section VI-B for the analysis. in the generalized railroad crossing (GRC) protocol. After analyzing the experimental results, for time per- exit, x1 > 1 formance we infer the DBM is either competitive with or slightly faster than the CRDZone and both perform faster 0: far 1: near 2: in than the CRDArray for the examples in this experiment. approach, x1 := 0 x1 < 4 in, x1 = 4, x1 < 15 In terms of space, we infer the CRDZone takes up the x1 := 0 least space, and the DBM and takes less space than the TCTL Fig. (Invalid): 1. Timed automatonAF TA<∞ [near ∨in] 1 , a model of a train in the generalized CRDArray for the examples in this experiment. railroad crossing (GRC) protocol. TCTL (Valid): AG<∞[near ! AF!TP+TDU[far]] II. P ROGRAM M ODEL AND S PECIFICATIONS There are three locations—0: far (initial location), A. Timed Automata 1: near and 2: in, with one clock x1 . There are the actions 2 A timed automaton encodes the behavior of a real-time approach, in and exit in Σ. Here, location 1 has the system [7], [22]. invariant x1 ≤ 4 while 0 has no invariant. The edge (1: near, in, x1 = 4, {x1 },2: in) has the guard x1 = 4 Definition 1 (Clock constraint φ ∈ Φ(CX)). Given a set and resets x1 to 0. of clocks CX , a clock constraint φ is constructed with the following grammar, where xi is a clock and c ∈ Z: B. Modal Equation Systems (MES) We use a modal equation system (MES) to represent φ ::= xi < c | xi ≤ c | xi > c | xi ≥ c | φ ∧ φ real-time temporal properties that timed automata can Φ(CX) is the set of all possible clock constraints. possess. A MES is an ordered list of equations with variables on the left hand side and basic timed temporal Definition 2 (Timed automaton). A timed automaton logical formulae on the right. Each equation involves a T A = (L, L0 , Σ, CX, I, E) is a tuple where: variable X , a basic formula φ and a greatest fixpoint (ν ) • L is a finite set of locations with the initial set of or a least fixpoint (µ), and the equation is labeled with ν µ locations L0 ⊆ L. the fixpoint (X = φ or X = φ). For a formal definition • Σ is the set of actions and CX is the set of clocks. of MES syntax and semantics, see [6], [9]. DIFTS 2011 14 Example 2 (Continuation of Example 1). Again consider While converting to a canonical form takes a consid- the timed automaton in Figure 1. The MES erable amount of time, it is needed to simplify and stan- ν dardize the algorithms for the zone operations includ- X1 = far ∧ ∀([ − ](X1 )) (1) ing time successor (succ(z)) computations and subset represents the safety property “the train is always in checks. For time successor, having the zone in canonical state 0: far”, read as “the variable X1 is the greatest form allows the time elapse operation to simply set all fixpoint of being in state 0: far and for all time advances single-clock upper bound constraints to < ∞. (∀), for all next actions ([ − ]), X1 is true.” This is an B. Clock Zone Data Structures: Difference Bound Ma- invalid specification for the timed automaton because the trix (DBM), CRDZone and CRDArray execution One way to represent a clock zone is a difference 2.5 bound matrix (DBM), described in Definition 4. See [15], (0: far, [xi = 0]) −→ (0: far, [xi = 2.5]) approach 2 [17] for a more thorough description. −→ (1: near, [xi = 0]) −→ . . . (2) Definition 4 (Difference bound matrix (DBM)). Let reaches location 1: near and thus violates the property. n − 1 be the number of clocks. A DBM is an n × n matrix where entry (i, j) is the upper bound of the clock III. DATA S TRUCTURES FOR C LOCK VALUE S ETS constraint xi − xj , represented as xi − xj ≤ uij or A. Clock Zones xi − xj < uij . The 0th index is reserved for a dummy clock x0 , which is always 0, allowing bounds on single This definition of a clock zone is taken from [7], [19]. clocks to be represented by the clock differences xi − x0 Definition 3 (Clock zone). A clock zone is a convex and x0 − xj . See Figure 2 for a picture of the DBM combination of single-clock inequalities. Each clock structure and Example 4 for a concrete example. zone can be constructed using the following grammar, where xi and xj are arbitrary clocks and c ∈ Z: Definition 5 (CRDZone). A CRDZone is a sparse sorted linked-list representation of a clock zone. Each constraint Z ::= xi < c | xi ≤ c | xi > c | xi ≥ c is encoded like a constraint in a DBM, as an upper bound | xi − xj < c | xi − xj ≤ c | Z ∧ Z (3) constraint on xi − xj , labeled as (i, j), with x0 always being 0. The CRDZone has these properties: 1) Nodes are in lexicographical order of clock con- Clock zones extend clock constraints with inequalities straint: (i1 , j1 ) ≺ (i2 , j2 ) iff i1 < i2 or (i1 = i2 of clock differences. These inequalities are used for and j1 < j2 ). model checking even though clock difference inequali- 2) The (0, 0) node is always given to ensure a univer- ties are not used in timed automata. Moreover, in general, sal head node with an initial value of x0 − x0 ≤ 0. the representation of a clock zone is not unique. 3) If a CRDZone node (i, j) is missing then the zone has an implicit constraint: (i, i) : xi − xi ≤ 0 and Example 3. Let z = 1 ≤ x1 < 3 ∧ 0 ≤ x2 ≤ 5. (i, j), i 6= j : xi − xj < ∞. There is the implicitly encoded constraint x2 − x1 ≤ 4. See Figure 3 for a picture of the CRDZone structure To see this, consider the longer path of constraints (x0 and Example 4 for a concrete example. is a dummy clock that always has value 0): This lexicographical ordering is the same ordering x2 − x0 ≤ 5 (x2 ≤ 5) used in CDDs and CRDs [2], [18]. While the ordering + x0 − x1 ≤ −1 (1 ≤ x1 ) x2 − x1 ≤ 4 j To provide a standardized, or canonical, form for clock z }| { (. . . . . . ... zone representations, we use shortest path closure [17]. This form makes every implicit constraint explicit. This i . . . . . . ...  can be implemented in O(n3 ) time using Floyd-Warshall . . . . . . xi − xj ≤ uij all-pairs shortest path algorithm, described in [24], [25]. Other standard forms exist [18], [20]. Fig. 2. DBM: a matrix with constraint xi − xj ≤ uij in entry (i, j). DIFTS 2011 15 is conjectured to influence performance, this is the only the number of nodes looked at during a full traversal. ordering we implemented. Likewise, having different This can speed up traversal-based algorithms such as implicit constraints, such as making xi − x0 ≤ 0 (for intersect and subset check. However, algorithms like all i) implicit, is conjectured to influence performance. clock reset, emptiness check and canonical form use O(1) access of middle nodes in DBMs (the CRDZone Definition 6 (CRDArray). The CRDArray is an array list and CRDArray do not have O(1) access for all nodes), implementation of the CRDZone. Thus, instead of using resulting in a performance slowdown for those CRDZone linked nodes, we use an array to store the nodes with and CRDArray methods. For space, the CRDZone and the 0th element being the head. We statically allocate CRDArray can store fewer nodes but must store the the array to hold the maximum number of elements and explicit indices, resulting in more space per node. store a back-pointer to the back of the array list. See Figure 4 for a picture of the CRDArray structure and IV. O N -T HE -F LY M ODEL C HECKING : C ONVERTING Example 4 for a concrete example. TO A PES AND P ROOF S EARCH Our model checker takes in a predicate equation Using a dynamic allocation instead of our static allo- system (PES) (taken from [9], [13]), which is a general cation for the CRDArray array list is conjectured to save space at the expense of time. framework representing logical expressions that involve fixpoints and first order quantifiers. We take a timed Example 4 (Clock zone in various representations). automaton and a MES and convert it to a PES. Currently Consider the clock zone from Example 3, which is the PES model checker can only check safety properties, z = 1 ≤ x1 < 3 ∧ 0 ≤ x2 ≤ 5 ∧ x2 − x1 ≤ 4. which involve only greatest fixpoints in both the PES DBM representation of z : and the input MES. For more information on a PES,   x0 − x0 ≤ 0 x0 − x1 ≤ −1 x0 − x2 ≤ 0 including its syntax, semantics and how to convert a x1 − x0 < 3 x1 − x1 ≤ 0 x1 − x2 < ∞ timed automaton and a MES to a PES, see [6], [9]. x2 − x0 ≤ 5 x2 − x1 ≤ 4 x2 − x2 ≤ 0 The model checker takes the conclusion sequent (the CRDZone representation of z : sequent we wish to prove) and applies proof rules in a recursive goal-driven tree-like fashion on the premise x0 − x0 ≤ 0 −→ x0 − x1 ≤ −1 −→ x0 − x2 ≤ 0 sequents, trying to prove each premise sequent until it reaches a proof rule with no premise (called a leaf) or a circularity (a previously seen premise sequent). When −→ x1 − x0 < 3 −→ x2 − x0 ≤ 5 −→ x2 − x1 ≤ 4 checking a proof, we will often encounter circularity. CRDArray representation of z : In general, when the circularity reached is a greatest fixpoint, we can stop and declare the proof branch valid. [x0 − x0 ≤ 0|x0 − x1 ≤ −1|x0 − x2 ≤ 0 For the formal conditions for circularity and the proof |x1 − x0 < 3|x2 − x0 ≤ 5|x2 − x1 ≤ 4] rules, see [6], [9]. Remark 1 (On DBM vs. CRDZone and CRDArray V. E XPERIMENTS : VARIOUS DATA S TRUCTURE methods). Due to the sparse implementation and removal I MPLEMENTATIONS of implicit nodes, the CRDZone and CRDArray can We compare the DBM implementation to the CRD- improve time by reducing the number of nodes, and thus Zone and CRDArray implementations. Each implemen- tation uses shortest path closure to compute canonical form. The only difference in the DBM, CRDZone and −→ . . . −→ xi − xj ≤ uij −→ ... CRDArray versions is the data structure implementation. Remark 2 (On our analysis approach). We ask: what does Fig. 3. CRDZone: A linked list with nodes in lexicographical order it mean for an implementation to perform better than of constraint xi − xj ≤ uij . another? We consider consider better to be measured in the number (or percentage) of examples that one [ | | . . . |xi − xj ≤ uij | . . .] system outperforms another in. The larger aim is for any implementation, if we were to know all the examples Fig. 4. CRDArray: An array list with nodes in lexicographical order that it would run (including and beyond the experiment of constraint xi − xj ≤ uij . examples), we would like one implementation to perform DIFTS 2011 16 TABLE I (strictly) better for at least 51% of this hypothetical set. D ESCRIPTIVE S TATISTICS FOR PAIRED DBM − ( MINUS ) This influences our analysis. CRDZ ONE EXAMPLES , FOR TIME ( S ) AND SPACE (MB). Given our meaning of better in Remark 2, we con- Statistic DBM − CRD- DBM − CRD- sider the median, 25% and 75% percentile values as Zone (Time) Zone (Space) insights into typical examples and use the histograms Mean -67.55 34.96 to get a bigger picture of the sample distribution of the Standard Deviation 428.35 212.65 performance differences for the experiment, and weigh 25% Percentile -1.24 0.00 these more heavily than the mean and standard deviation Median 0.00 1.85 75% Percentile 0.06 25.70 values. The mean and the standard deviation provide us with an alternative picture of the overall performance and TABLE II give hints of either a unusual sample distribution (since D ESCRIPTIVE S TATISTICS FOR PAIRED DBM − ( MINUS ) in a symmetric distribution the mean equals the median) CRDA RRAY EXAMPLES , FOR TIME ( S ) AND SPACE (MB). or the presence of potential outliers. The benchmark choice was modeled off of [6], with Statistic DBM − CRDAr- DBM − CRDAr- the addition of a model of the generalized railroad ray (Time) ray (Space) crossing (GRC) protocol [26]. We also used all the Mean -112.95 -47.75 Standard Deviation 655.65 235.63 protocols in [6], which are the Carrier Sense, Multi- 25% Percentile -3.16 -20.54 ple Access with Collision Detection (CSMA/CD), the Median -0.29 -2.81 Fiber Distributed Data Interface (FDDI), Fischer’s Mu- 75% Percentile 0.00 -0.01 tual Exclusion (FISCHER), the Leader Election protocol (LEADER and LBOUND) and the PATHO Operating TABLE III System (PATHOS) protocol, where each of these proto- D ESCRIPTIVE S TATISTICS FOR PAIRED CRDZ ONE − ( MINUS ) CRDA RRAY EXAMPLES , FOR TIME ( S ) AND SPACE (MB). cols is described some in [6]. There are 53 benchmarks that ran on each implementation. Statistic CRDZone − CR- CRDZone − CR- Experiments were run on a Linux machine with a DArray (Time) DArray (Space) 3.4 GHz Intel Pentium 4 Dual Processor (each a single Mean -45.40 -82.71 core) with 4 GB RAM. Time and space measurements Standard Deviation 229.06 160.91 (maximum space used) were made using the memtime 25% Percentile -2.02 -52.67 Median -0.21 -19.35 (http://www.update.uu.se/∼johanb/memtime/) tool (using 75% Percentile -0.03 -1.63 time elapsed and Max VSize). The data tables are in the Appendix. VI. S TATISTICS , A NALYSIS AND D ISCUSSION the DBM, CRDZone and CRDArray. Figures 5, 6 and A. Histograms and Descriptive Statistics 7 have histograms that plot the overall time and space Running the different data structure implementations differences between the DBM, CRDZone and CRDArray with the same examples yields paired data. Hence, implementations. Bin colors and are changed to help we can take the two implementations and pair them more easily find the -0.001 to 0.001 (equal performance, example-by-example on their time and space differences since our measurement precision is 0.01 units), and -0.25 to analyze their performance. When we pair the DBM − to -0.001 and 0.001 to 0.25 bins (slight differences). CRDZone samples, we take the DBM measurement and We do not use 95% confidence intervals, paired two- subtract the CRDZone measurement for the same exam- sample hypothesis (z ) tests or ANOVA (Analysis of ple to get a DBM − CRDZone paired data point. For Variance) because the independence assumption of the instance, the MUX-5-a paired point is -0.92s, 1.94MB, samples (the example benchmarks) does not hold. Fur- since the DBM point is 1.22s, 14.67MB, and the CRD- thermore, we do not use a Wilcoxon signed-rank test Zone point is 2.14s, 12.73MB. Pairings are likewise done for the median because the symmetry assumption of the to obtain the paired samples for DBM − CRDArray and distribution is not believed to hold, and thus we cannot CRDZone − CRDArray. For more information, see a analyze the hypothetical benchmark distribution referred Statistics text such as [27]. to in Remark 2. We do use paired sampling since we have Tables I, II and III contain descriptive statistics on the its only requirement—perfect correlation of the samples. paired difference in example-by-example performance of More information is in [27]. DIFTS 2011 17 DBM – CRDZone Time (s) CRDZone – CRDArray Time (s) 16 15 16 15 14 14 12 Count 12 12 Count 10 8 10 8 8 8 6 6 8 6 6 4 5 6 4 2 3 2 4 3 2 0 1 1 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 -5 o - To 0 To 0 T -0 To -1 -0 To 25 0. o 0 o 25 1 T 1 To 0 e -5 o - To 0 To -0 To 1 -0 To 25 0. o 0 o 25 1 T 1 To 0 M e 0 T 00 00 00 -1 -10 T 01 1 T 01 0. .25 10 10 0 T 00 00 0 T 00 00 00 -1 -10 T 01 1 T 01 0. .25 10 10 0 T 00 00 or or 00 50 10 10 00 50 -1 To - 10 10 To To 00 50 -1 -1 5 -0. 01 .0 00 .0 50 To 5 50 00 50 -1 -1 5 -0. 01 .0 00 .0 50 To 5 50 -1 o 0 0 o o M .0 -0 .0 -0 -5 o - -5 o - 0 o o 0 0 pT pT U U .2 .2 DBM – CRDZone Space (MB) CRDZone – CRDArray Space (MB) 12 18 12 11 16 10 10 15 Count Count 8 9 6 5 5 10 7 4 4 2 2 1 1 5 2 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 -2 o - To 6 To 6 T -0 To -1 -0 To .25 0. o 0 o 25 1 T 1 To 4 e 8 T 48 24 24 -1 -16 T 01 1 T 01 0. .25 16 16 6 T 56 48 -2 o - To 6 To -0 To 1 -0 To .25 0. o 0 1 o 25 1 T 1 25 To 2 M e 8 T 48 24 24 -1 -16 1 T 01 0. .25 16 o 16 4 6 T 56 48 or 56 25 12 12 or To 56 25 -1 To - T 0 12 o 12 04 20 -1 -1 01 .0 00 .0 25 To 2 20 To -1 o 5 -0 0 o 04 20 -1 -1 01 .0 00 .0 20 M 5 -0 0 .0 -0 -2 o - .0 -0 o -2 o - T 6 o 4 pT 4 pT U .2 U .2 Fig. 5. Histograms comparing the DBM − (minus) CRDZone time Fig. 7. Histograms comparing the CRDZone − (minus) CRDArray (s) (top) and space (MB) (bottom) differences. time (s) (top) and space (MB) (bottom) differences. DBM – CRDArray Time (s) 12 12 10 9 0.25MB less space for 28 such examples and more than Count 8 7 7 7 6 4 0.25MB space for only 11 examples. Thus (even though 4 3 2 0 1 1 1 1 0 0 0 57% is not a large majority), we infer the CRDZone 0 takes less space overall for this benchmark. -5 o - To 0 To 0 T -0 To -1 -0 To 25 0. o 0 o 25 1 T 1 To 0 e 0 T 00 00 00 -1 -10 T 01 1 T 01 0. .25 10 10 0 T 00 00 or 00 50 10 10 To 00 50 -1 -1 5 - 0. 01 . 0 00 . 0 50 To 5 50 -1 o 0 o M .0 -0 2) DBM vs. CRDArray: The CRDArray performs -5 o - o 0 pT U .2 slower for 64% of the tested examples (at least as slow DBM – CRDArray Space (MB) 14 for 89%) with a median difference of 0.29s slower and 14 12 11 9 a mean difference of 112.95s slower. Thus we infer the Count 10 8 6 4 6 CRDArray performs slower overall for this benchmark. 4 3 3 2 0 0 1 0 1 0 1 0 0 The CRDArray takes more space for 77% (at least as much space for 77%) of the examples with a median -2 o - To 6 To 6 T -0 To -1 -0 To 25 0. o 0 o 25 1 T 1 To 4 e 8 T 48 24 24 -1 -16 T 01 1 T 01 0. .25 16 16 6 T 56 48 or 56 25 12 12 To 04 20 -1 -1 5 - 0. 01 . 0 00 . 0 25 To 2 20 -1 o 0 o M .0 -0 -2 o - amount of 2.81MB more space and mean amount of o 4 pT U .2 47.75MB more. Thus we infer the CRDArray takes more space overall for this benchmark. Fig. 6. Histograms comparing the DBM − (minus) CRDArray time (s) (top) and space (MB) (bottom) differences. 3) CRDZone vs. CRDArray: The CRDArray performs slower for 77% of the tested examples (at least as slow for 100%) with a median difference of 0.21s slower and B. Analysis of Results a mean difference of 45.40s slower. Thus we infer the CRDArray is slower overall for this benchmark. 1) DBM vs. CRDZone: The CRDZone performs slower for 45% of the tested examples (at least as slow The CRDArray takes more space for 100% of the for 74%) with a median difference of 0.00s slower, while examples with a median amount of 19.35MB more space the CRDZone has a mean difference of 67.55s slower. and a mean amount of 82.71MB more. Thus we infer the Thus, we infer the CRDZone is either slightly slower or CRDArray takes more space overall for this benchmark. competitive to the DBM for this benchmark, but due to insufficient evidence (45% of the examples is not C. Discussion of Results enough) do not infer that the DBM performs strictly The CRDZone and CRDArray method that converts faster than the CRDZone. zones to canonical form was implemented using array- The CRDZone takes less space for 57% of the tested based algorithms, where it temporarily converts the clock examples (at most as much space for 57%) with a median zone to and from a matrix (the algorithm is similar to amount of 1.85MB less space and a mean amount a DBM algorithm for those methods). It is possible that of 34.96MB less space. The CRDZone takes at least performance can be improved by trying an algorithm that DIFTS 2011 18 does not require copying to and from a matrix. All time [21], but this experiment compares data structures for vs. space tradeoffs were taken to save time. non-convex sets of clock valuations (unions of clock Furthermore, we ran a CRDZone execution twice (first zones), and the comparison is across different tools with execution reported) and compared the distribution of different model checking approaches, and not the same their differences to get an idea of noise and/or mea- tool with different data structures. surement error. The differences in the histograms are VII. C ONCLUSIONS AND F UTURE W ORK larger than the differences of the noisy implementation, so thus we suspect the differences in performances are Here are our conclusions from the experiment: due to more than just uncertain measurement/noisy data. 1) Time: (DBM ≤t CRDZone) ; 3 4 bitvector a = new_bitvector(width); 5 bitvector b = new_bitvector(width); 6 bitvector c = new_bitvector(width); M IDDLE - END 7 8 assertion(ctx, nequal(a, bvuint(1,width)) ); 9 assertion(ctx, nequal(b, bvuint(1,width)) ); DirectSolver BitBlast GraphSolver 10 11 assertion( ctx, equal( zero_extend(width, c) SAT Clause SAT Aiger Groups , bvmul( zero_extend(width, a), zero_extend(width, b)) )); 12 13 for (unsigned i=0; i < 10000; ++i) { 14 unsigned r = random_number ( 2, 2ˆwidth - BACKEND 1 ); 15 assumption( ctx, equal(c, bvuint(r, 2* SWORD Z3 MiniSAT CUDD width)) ); 16 17 if( solve( ctx ) ) { Boolector PicoSAT AIGER 18 unsigned a_value = read_value( ctx, a ); 19 unsigned b_value = read_value( ctx, b ); 20 21 printf("factorized %d into %d * %d\n", r Solver API , a_value, b_value); 22 } else { 23 printf("%d is prime.", r); Fig. 2. metaSMT layer Architecture 24 } 25 } the input languages, defined in the SMT-LIB format (e.g., Core Boolean logic, QF BV). The middle-end multiplication ~a · ~b = ~c is constrained. However the layer provides translations, intermediate representation multiplication is done in double width to avoid overflows. and optimizations of the input expressions. Optionally, These constraints are identical for each iteration of expressions can directly be passed to the the backend the loop starting in line 13, therefore they are declared layer where the solvers are integrated via their native API. outside the loop as an assertion, which is permanent. Various configurations of middle-ends with backends are Inside the loop, in lines 14-15 c is set equal to a random possible. The frontends allow to combine each translation number from 2 to 2width − 1 using an assumption, middle-end with any compatible backend. However, not which is only valid for the next satisfiability check of the every backend supports every logic. Therefore, some solver, i.e., for one loop iteration. middle-ends supply emulation or translations to other After setting up the SMT instance, the satisfiability logics, e.g, a bit-vector expression can be translated into check is performed in line 17. If the instance is sat- a set of Boolean expressions. isfiable, the values of a and b are determined using The frontends are independent from the underlying read_value. Both operands are printed out in line 21. two layers and have no semantics attached. To evaluate Otherwise the instance is unsatisfiable, the else branch frontend expressions, a context is used that defines their is executed, which outputs c is prime. meaning. The context is the combination of at least one middle-end and one backend, where the middle-end IV. A RCHITECTURE defines how the input language is mapped to function In the following sections the architecture of metaSMT calls of the backend. is described. At first the basic layers are introduced. Then B. Frontends each layer is described in detail. The terms frontend for the input languages, middle-end for the intermediate layer The frontends define the input languages for metaSMT. and backend for the solvers are taken from compiler This includes Core Boolean logic and SMT QF BV as design to denote the metaSMT layers. well as a version of Array logic over bit-vectors. Each frontend defines its own set of available functions as well A. metaSMT Layers as public datatypes. metaSMT consists of three basic layers depicted in The Core Boolean logic defines the public datatype Figure 2. The frontend layer provides primitives of predicate which describes propositional variables. DIFTS 2011 24 Furthermore, Boolean constants are available, i.e., true 4) BitBlast: This emulation of a QF BV bit-vector and false. This logic also defines primitive Boolean backend uses only Core Boolean logic operations to allow functions, e.g., Or, And. The frontend creates a static the transparent use of SAT or BDD solvers with bit-vector syntax tree for the expression described in the code. This expressions. The translation is performed in a standard syntax tree is passed to the middle-end. way: Given only the Core Boolean logic, each bit-vector C. Middle-ends is transformed into a vector of Boolean variables. The bitwise operations can be applied easily, e.g., an exclusive- The core of metaSMT are basic optimizations and or over two bit-vectors is a bitwise exclusive-or for translations from the frontend to the backend. While the each pair of Boolean variables. The bit-vector predicates frontends provide languages and the backends provide (equal, less-than, etc.) are mapped to a sequence solver integrations, the middle-ends allow the user to of Boolean predicates, e.g., a conjunction of exclusive- customize metaSMT, i.e., on how the input language is nors for equal. Arithmetic operations are reduced to an mapped to the backend. Even in the middle-end itself, equivalent Boolean expression. several modules can be combined. 1) DirectSolver: To enable a low-overhead translation D. Backends from a frontend to a backend the DirectSolver The respective solvers and other constraint solving is provided. All the elements of the input expression techniques are integrated as backends. For each reasoning are directly evaluated in the backend. Variables are engine a dedicated backend is created that maps from the guaranteed to be constructed only once and are stored internal metaSMT API to the API of the engine. Backends in a lookup table. For example, given a multiplication do not have an explicit interface to inherit from. They operation in QF BV logic directly corresponds to a implement the required methods for the languages they multiplication operation in the SMT solver Boolector. support using C++ template mechanisms to integrate The direct middle-end is very lightweight and allows them into a context. This allows the compiler to optimize the compiler to inline all function calls. For a modern the code and, in the case of DirectSolver, produces compiler the resulting executable should perform equally code that is close to a hand-coded implementation using well to a hand-written application using the same backend. the same API. 2) GraphSolver: Instead of adding the frontend ex- This section gives an overview of the backends inte- pressions directly to the solver, they are first inserted grated into metaSMT. They are grouped by the input into a directed graph. The graph models the explicit language they support. The compatibility of the solvers syntax tree of the expression as a Directed Acyclic is also summarized in Table I. Graph (DAG). Formally a node in the graph is a tuple 1) Core Boolean logic backends: Several core logic (Operation, Static Arguments) where the SMT command backends as well as higher level backends are available. and its static arguments are captured (e.g. extract and Core logic is directly supported by backends that accept the range to extract). The edges point from an operation all common Boolean operations. For example, the Binary to the SMT expressions used in this command. A label Decision Diagram (BDD) package CUDD [20] supports on the edges stores the position of the subexpression in all Boolean operations and is integrated in metaSMT. the command. Each time a new expression is evaluated Furthermore, with some minor transformations based on it is first searched in a lookup table before a new node is De-Morgan And-Inverter-Graphs (AIGs) are also able to created, when the node is not found. When the instance handle Boolean operations. Those AIGs are internally is checked for satisfiability, the graph is traversed and represented by the AIGER package [21]. SAT solvers evaluated in the backend. can receive Boolean logic expressions either via the The graph-based translation provides a way to auto- SAT Clause adapter that creates one or more clauses matically detect common subexpressions and efficiently per logic operation or via the SAT Aiger adapter, that handle them to create smaller SMT instances which po- builds an AIG for the expression using the AIGER tentially increases performance of the reasoning process. backend. Afterwards, the AIG is translated into a set This is especially useful if the user wants to automate this of clauses. This infrastructure allow the usage of any process, but either does not want to manually optimize the SAT solver supporting CNF as input language either by SMT instance or does not know the instance in advance an API or externally through files. PicoSAT [22] as well because it is created dynamically inside the program. as MiniSAT [23] are directly supported as Core logic 3) Groups: This middle-end provides an interface and backends via their APIs. Other solvers are supported by implementation of constraint groups for solvers that do generating CNF files and calling the executable of the not have native support for groups. A group is a set SAT solvers. of constraints that belong together. The user can create Furthermore, all SMT QF BV backends natively sup- groups, add expressions to them and delete them at any port Core logic as a subset of the language. time. The solver will then disable all expressions in 2) SMT QF BV backends: Native SMT bit-vector the group. Groups are emulated using guard variables solvers like Boolector [24], SWORD [25] and Z3 [26] and temporary assumptions, e.g., the expression x ∧ y are directly connected through their API for QF BV in group 1 is transformed to g1 → (x ∧ y) using the support. Furthermore, the BitBlast middle-end provides an guard variable g1 and an implication. Depending on the emulation for QF BV using only basic logic operations. solver deleting a group can either lead to the removal This emulation permits using QF BV expressions in of the constraints or to the constraint just being disabled solvers that do not support them natively but support permanently. Core Boolean logic e.g., CUDD or SAT-solvers. DIFTS 2011 25 TABLE I BACKEND COMPATIBILITY DirectSolver Boolector BACKEND C ORE QF BV A RRAY (BV) SAT AIGER [21] yes emulated no no Boolector [24] yes yes yes no CUDD [20] yes emulated no no MiniSAT [23] emulated emulated no yes GraphSolver Boolector PicoSAT [22] emulated emulated no yes SWORD [25] yes yes no no Z3 [26] yes yes yes no BitBlast yes yes no no SAT Aiger yes emulated no no SAT Clause yes emulated no no DirectSolver BitBlast SAT Clause MiniSAT equal qf_bv_var& GraphSolver BitBlast SAT Aiger MiniSAT {c,width} bvmul qf_bv_var& qf_bv_var& {a,width} {b,width} Fig. 4. Data flow in different contexts Fig. 3. Syntax Tree for equal(c, bvmul(a, b)). Listing 3. metaSMT command grammar 3) SMT QF ABV backends: In addition to Core command ::= assert_cmd | assume_cmd Boolean logic and bit-vector logic the Boolector and | eval_cmd Z3 backends also support arrays in the form of QF QBV | solve_cmd | result_cmd assert_cmd ::= ’assertion(’ context ’,’ logic. Therefore metaSMT supports declaring and work- expression ’);’ ing with arrays over bit-vectors. assume_cmd ::= ’assumption(’ context ’,’ expression ’);’ eval_cmd ::= ’evaluate(’ context ’,’ V. I MPLEMENTATION expression ’);’ solve_cmd ::= ’solve(’ context ’);’ This section describes how the architecture is imple- result_cmd ::= ’read_value(’ context ’,’ mented in metaSMT and how metaSMT is integrated in variable ’);’ C++ programs. variable ::= boolean_variable | bitvector_variable expression ::= A. Syntax and Semantics For the evaluation of metaSMT expressions a context Figure 4 gives some example contexts and visualizes is used which defines syntax and semantics. The context the data flow inside. This illustrates how different concept and different kinds of contexts are described in contexts can change the evaluation of a constraint. The this section. first context defines a solver using Boolector without The syntax component is provided by Boost.Proto. An intermediate representation (DirectSolver). The context expression like equal(c, bvmul(a, b)) is created directly supports Core Boolean logic and QF BV. In from the custom Boost.Proto functions equal and contrast, in the last example, MiniSAT is used. QF BV bvmul as well as the variables a, b and c. From the as well as Core Boolean logic are emulated for this clause expression the syntax tree in Figure 3 is created. The based backend. Furthermore this context uses a graph nodes are labeled with the C++ type and strings inside and an AIG as intermediate representations. the curly braces denote the content of the respective The GraphSolver-based and AIGER-based context first nodes. For metaSMT the tree is used as static type of the create an internal representation and pass the complete ex- expression. The expression and the syntax tree are data, pression directly before solving. When using approaches i.e., they neither have semantics attached nor trigger any without intermediate representation, the requests are actions. forwarded to the next layer until they reach the backend. The semantics for the expression is introduced by Only explicit transformations are applied before passing the metaSMT context, that defines how the syntax tree the expression (e.g., BitBlast, SAT Clause). is evaluated and transformed for a specific solver. The evaluation of Boost.Proto-based expressions is performed B. Usage and API in the metaSMT translation middle-end (e.g., GraphSolver or DirectSolver) so that the backends do not need to The example from Listing 2 contains most of the handle Boost.Proto expressions directly. This reduces the core commands of metaSMT. These are summarized in overhead to implement new backends. Listing 3. DIFTS 2011 26 Listing 4. Programmatic constraint construction using temporary Listing 5. Using a shared graph for different contexts Variables 1 GraphSolver_Context sword; 1 bitvector x = new_bitvector(8); 2 GraphSolver_Context btor(sword); 2 for( ... ) { 3 3 bitvector tmp = x; 4 GraphSolver_Context::result_type x = 4 x = new_bitvector(8); evaluate(sword, bvuint(0, 8)); 5 assertion(ctx, equal( x, bvmul(tmp, ...))); 5 6 } 6 for( ... ) { 7 ... 7 x = evaluate(sword, equal( x, bvmul(x, 8 solve(ctx) ...))); 8 } 9 assertion(sword, x); The first three functions accept frontend expressions, 10 assertion(btor, x); however they have different effects. The functions 11 solve(sword) == solve(btor); assertion and assumption create the constraint instance where the first adds a constraint permanently to When a GraphSolver is constructed using the copy the (incremental) solver context while the latter adds the constructor, a shared graph is internally used by the constraint for the next call of the solver only. In both contexts. The newly created solver also copies all cases the expression needs to have a Boolean result. The assertions and assumptions, so that both solvers have third function evaluate does not change the instance the same internal state. In this setup the results of but returns a context specific representation of the input evaluate can be shared among the solvers. Each expression only. backend will only evaluate the parts of the graph that To query a context for satisfiability, the solve are required as parts of assertions or assumptions. The function is provided. The result is a Boolean value directly application of evaluate is demonstrated in Listing 5. representing SAT (true) or UNSAT (false). After a call to This can be used for example when building multiple solve the assumptions are discarded while the assertions instances from the same base. At a specific point the are still valid for the subsequent calls. context can be copied and from there both contexts can Getting a SAT result for solve(ctx), i.e., the diverge into seperate instances. instance is satisfiable, a model is generated. The model can be retrieved with the read_value function. The VI. E MPERICAL E VALUATION function takes a context and a variable and returns the This section presents two different applications of assignment of this variable in the given context. The metaSMT. Furthermore, a comparison of metaSMT with result of read_value is automatically convertible to the native API of an SMT solver is presented. The many C++ datatypes, including strings, bit-vectors (vector experiments have been performed on a system with AMD of bool, tribool, bitset) or integers. Opteron 2222 SE processor, 32 GB RAM and a Linux In addition to these core commands, custom middle- operating system. In the following, the run times are ends may provide additional extensions. The Group always given in CPU seconds. middle-end for example provides functions to add groups, change the active group and delete groups. These func- A. Exact Synthesis tions cannot be used in any other context. This section presents examples from exact synthesis of reversible circuits [28], [29]. A reversible circuit computes C. Expression Creation a bijective function, i.e., there is a unique input and output Typically it is necessary to create the metaSMT ex- mapping. Those circuits purely consist of reversible gates pression at run time, e.g., in a loop. As metaSMT syntax – here, the universal Toffoli gate and basic quantum gates trees are statically typed, an extension of the syntax tree are considered. The synthesis problem searches for a is not possible. To work around this limitation, metaSMT reversible circuit consisting of reversible gates which provides two options. The first option is to create a partial computes the function specified by a truth table. There expression and constrain equality to a temporary variable are several exact approaches to synthesize those circuits. that is later reused to create the complete expression. We considered the approach from [28], which translates This would allow strict grammar checking but introduces the problem into a decision problem and asks for a circuit a temporary variable and a constraint, see Listing 4. realization for a given number of gates. The size of the The second option is the use of the evaluate(Ctx, problem instances grows exponentially with number of Expr) function and the context’s result_type. The input variables, because the entire truth table has to be function takes a context and a frontend expression taken into account. This usually results in hard problem and returns the context specific representation of the instances even for small truth tables. expression. The result of the evaluation is of the backend The underlying problem formulation has been encoded specific type Ctx::result_type. This expression in QF BV and an incremental algorithm searches for can be stored and later be used in other expressions. a valid circuit implementation. Using metaSMT six- Note however that the return value is solver specific and teen different configurations have been evaluated. The therefore not portable or reusable in other contexts, not configurations consist of four internal API backend even contexts of the same type. solvers, i.e. Boolector, Z3, MiniSAT, and PicoSAT. A powerful exception to this rule is the Additionally, metaSMT is used to generated CNF files result_type of a GraphSolver-based context, to run the external solvers PicoSAT, MiniSAT, Pre- where the result is a node in the internal graph. coSAT, and Plingeling [30]. All eight backends are used DIFTS 2011 27 Direct Direct Graph 10000 Graph Dir Dir Graph 1000 Graph Direct Graph Direct time [s] 100 Graph Direct Graph Direct Graph 10 1 0.1 dec per tof 4m gra fre alu gra mi tof hw od e s_c fol od yco dk -v0 yco lle fol b4 24 o i_c 5-v de6 in_ _in de6 r_c i_d _co -v0 mp om 0_ _co com com _co om ou mp _in let ple inc ple ple ble let com e_4 te_ om mp te_ ple mp te_ _co e_2 ple 1 ple let 3 te_ le te_ 5 mp 0 te_ e_19 10 19 let te_ 8 e_ 15 2 Fig. 5. Run times of reversible circuit synthesis for hard instances using a metaSMT-based version of RevKit [27] TABLE II with the DirectSolver middle-end as well as the R ESULT FOR MUTATION TESTING USING DIFFERENT CONTEXTS GraphSolver middle-end. Name Time [s] For the synthesis 11 specifications were used which are Direct Graph Direct Graph publicly available from [31]. A timeout of 3,600 seconds Boolector Boolector MiniSAT MiniSAT was applied. The results are presented in Figure 5. On isl 0.05 0.22 0.34 0.47 the x-axis the respective benchmark is shown, whereas min 0.34 0.48 0.82 0.65 the y-axis denotes the run time in seconds for each mid 4.73 5.07 7.87 4.36 fmin3 5.02 5.85 4.45 2.55 configuration in logarithmic scale. Externally called fmin5 21.08 25.96 14.92 9.56 solvers are marked with ∗. fmin10 310.52 260.38 997.65 550.94 From Figure 5 it is clear that no single solver tri 207.69 193.99 1596.99 652.64 dominates the benchmarks. For example, for the benchmarks from decode24-v0_incomple_5 to Boolector contexts and the graph-based MiniSAT perform graycode6_complete_19 Boolector performs much better. However, for difficult instances with a run time better than any other solver. But for the benchmarks from over 1 minute, the graph-based Boolector context is miller_complete_5, Boolector is outperformed by fastest while MiniSAT-based contexts require significantly the SAT solvers. MiniSAT as well as PicoSAT are evalu- more time. For these harder instances the GraphSolver ated as internal and external versions. The accumulated middle-end outperforms the direct variant of the same run times of the solvers MiniSAT and PicoSAT over all backend. This effect is most likely due to the removal benchmarks are 18,790 seconds for the internal version of redundancies in the graph. For MiniSAT this amounts and 8,989 seconds for the external version. Surprisingly, to run time reductions of around 50%. With metaSMT the externally called solvers are much better here than the as abstraction layer it is easy to evaluate the effects of internal called solvers, though the internal solvers may different contexts or optimizations. When changes are benefit from learnt information. Overall, the externally only in the abstraction layer no application code needs called PrecoSAT solver needs 326.24 seconds over all to be changed and only little effort is required. benchmarks and Boolector needs 3,285.05 seconds. To summarize, all the presented results can be achieved C. Comparison with direct API very easily by using metaSMT. Only one parameter needs For the factorization algorithm from Listing 2 a hand to be switched to run a different solver. coded implementation using the Boolector C API is com- pared to a metaSMT DirectSolver-based implementation B. Mutation Testing with the Boolector backend. The resulting application is Given a program, by mutation several faulty variants available as part of the metaSMT package. of the program are created and combined into a single The experiment has the following setup: The algorithm program called meta-mutant. Using metaSMT the meta- from Listing 2 was changed to work on sequences instead mutant is encoded into a set of constraints. Each satisfying of generating random numbers. A sequence of 10,000 assignment yields a test case that discovers at least one random 64 bit numbers is generated and the algorithm it fault. Experiments of [32] where executed on a set of applied to it 10 times. The same sequence is used for both metaSMT contexts. The results are shown in Table II. the hand coded and the metaSMT-based implementation Comparing the Boolector backend with the MiniSAT of the algorithm. The complete experiment was repeated 5 backend using DirectSolver as well as GraphSolver times with Boolector being executed first and 5 times with middle-ends. metaSMT being executed first. Altogether each solver was The results significantly vary with the difficulty of forced to solve 1,000,000 constraints of 64 bit numbers the instance. For easy instances the directly connected factorized into two 32 bit numbers. DIFTS 2011 28 The results showed no significant difference caused by [11] S. Ranise and C. Tinelli, “The Satisfiability Modulo Theories the metaSMT abstraction layer: 1, 736s for plain Boolec- Library (SMT-LIB),” http://www.smtlib.org, 2006. [12] O. Shtrichman, “Pruning techniques for the SAT-based bounded tor compared to 1, 729s for metaSMT with Boolector, model checking problem,” in CHARME, ser. LNCS, vol. 2144, i.e., 1.7 seconds for 10,000 factorizations. 2001, pp. 58–70. D. Other applications [13] S. Cook, “The complexity of theorem proving procedures,” in 3. ACM Symposium on Theory of Computing, 1971, pp. 151–158. In addition to the aforementioned projects metaSMT is [14] C. Barrett, R. Sebastiani, S. A. Seshia, and C. Tinelli, Satisfia- also used in a Constraint Random Stimuli generation Li- bility Modulo Theories, ser. Frontiers in Artificial Intelligence and Applications. IOS Press, February 2009, vol. 185, ch. 26, brary. In the library elements of the SystemC Verification pp. 825–885. Library [33] and techniques from [34] are combined. [15] D. R. Cok, “jSMTLIB: Tutorial, validation and adapter tools for smt-libv2,” in NASA Formal Methods, ser. LNCS, 2011, vol. VII. C ONCLUSIONS 6617, pp. 480–486. [16] “Boost C++ libraries,” http://www.boost.org/. metaSMT is a library that abstracts details of reasoning [17] E. Niebler, “Proto: A compiler construction toolkit for DSELs,” engines. Based on metaSMT very little programming in Proceedings of the 2007 Symposium on Library-Centric Software Design, ser. LCSD ’07. New York, NY, USA: ACM, effort is required to integrate formal methods into a user‘s 2007, pp. 42–51. application. Once this has been done, a wide range of [18] J. de Guzman and D. Nuffer, “The Spirit library: Inline parsing solvers as well as optimization techniques can be selected. in C++,” C/C++ User Journal, vol. 21, no. 9, 2003. Various research projects already integrate metaSMT. [19] T. L. Veldhuizen, “Arrays in Blitz++,” in Proceedings of the Second International Symposium on Computing in Object- Future work on metaSMT includes the development Oriented Parallel Environments, 1998, pp. 223–230. of the following features: New frontend logics will [20] F. Somenzi, CUDD: CU Decision Diagram Package Release complete the support for SMT logics (e.g. uninterpreted 2.4.1. University of Colorado at Boulder, 2009. functions, integer arithmetic), while new middle-ends [21] “Aiger,” http://fmv.jku.at/aiger/. [22] A. Biere, “Picosat essentials,” JSAT, vol. 4, no. 2-4, pp. 75–97, will increase solving performance (e.g. portfolio or multi- 2008. threaded contexts) and new backends will provide access [23] N. Eén and N. Sörensson, “An extensible sat-solver,” in SAT, to additional SMT solvers. ser. LNCS, E. Giunchiglia and A. Tacchella, Eds., vol. 2919. Springer, 2003, pp. 502–518. [24] R. Brummayer and A. Biere, “Boolector: An efficient SMT VIII. ACKNOWLEDGMENTS solver for bit-vectors and arrays,” in Tools and Algorithms for We would like to thank Hoang M. Le, Heinz Riener the Construction and Analysis of Systems, 2009, pp. 174–177. [25] R. Wille, G. Fey, D. Große, S. Eggersglüß, and R. Drechsler, and Fereshta Yazdani for the helpful discussions, their “Sword: A SAT like prover using word level information,” in proposals for improvements, testing and contributions to VLSI of System-on-Chip, 2007, pp. 88–93. metaSMT. [26] L. M. de Moura and N. Bjørner, “Z3: An efficient smt solver,” in TACAS, ser. LNCS, C. R. Ramakrishnan and J. Rehof, Eds., vol. 4963. Springer, 2008, pp. 337–340. R EFERENCES [27] M. Soeken, S. Frehse, R. Wille, and R. Drechsler, “RevKit: A [1] A. Biere, A. Cimatti, E. Clarke, and Y. Zhu, “Symbolic model toolkit for reversible circuit design,” in Workshop on Reversible checking without BDDs,” in Tools and Algorithms for the Computation, 2010, pp. 69 – 72. Construction and Analysis of Systems, ser. LNCS, vol. 1579. [28] D. Große, R. Wille, G. Dueck, and R. Drechsler, “Exact Springer Verlag, 1999, pp. 193–207. multiple-control toffoli network synthesis with SAT techniques,” [2] K. McMillan, “Interpolation and SAT-based model checking,” Computer-Aided Design of Integrated Circuits and Systems, IEEE in Computer Aided Verification, ser. LNCS. Springer Berlin / Transactions on, vol. 28, no. 5, pp. 703 –715, May 2009. Heidelberg, 2003, vol. 2725, pp. 1–13. [29] R. Wille, D. Große, M. Soeken, and R. Drechler, “Using higher [3] E. Arbel, O. Rokhlenko, and K. Yorav, “SAT-based synthesis levels of abstraction for solving optimization problems by of clock gating functions using 3-valued abstraction,” in Formal boolean satisfiability,” in Symposium on VLSI, 2008. ISVLSI Methods in Computer-Aided Design, 2009, 2009, pp. 198 –204. ’08. IEEE Computer Society Annual, 2008, pp. 411 –416. [4] F. Haedicke, B. Alizadeh, G. Fey, M. Fujita, and R. Drechsler, [30] A. Biere, “Lingeling, plingeling, picosat and precosat at “Polynomial datapath optimization using constraint solving and SAT race 2010,” Tech. Rep., 2010. [Online]. Available: formal modelling,” in Computer-Aided Design (ICCAD), 2010 http://fmv.jku.at/papers/Biere-FMV-TR-10-1.pdf IEEE/ACM International Conference on, 2010, pp. 756 –761. [31] R. Wille, D. Große, L. Teuber, G. W. Dueck, and R. Drech- [5] R. Drechsler, S. Eggersgluss, G. Fey, A. Glowatz, F. Hapke, sler, “RevLib: An online resource for reversible functions and J. Schloeffel, and D. Tille, “On acceleration of SAT-based ATPG reversible circuits,” in Int’l Symp. on Multi-Valued Logic, 2008, for industrial designs,” Computer-Aided Design of Integrated pp. 220–225, RevLib is available at http://www.revlib.org. Circuits and Systems, IEEE Transactions on, vol. 27, no. 7, pp. [32] H. Riener, R. Bloem, and G. Fey, “Test case generation from 1329 –1333, 2008. mutants using model checking techniques,” in International [6] M. K. Ganai and A. Gupta, “Accelerating high-level bounded Conference on Software Testing, Verification and Validation model checking,” in International Conference on Computer- Workshops, 2011, pp. 388 – 397. aided design. New York, NY, USA: ACM, 2006, pp. 794–801. [33] SystemC Verification Standard Specification Version 1.0e, Sys- [7] P. Bjesse, “A practical approach to word level model checking temC Verification Working Group. of industrial netlists,” in International Conference on Computer [34] R. Wille, D. Große, F. Haedicke, and R. Drechsler, “SMT- Aided Verification, 2008, pp. 446–458. based stimuli generation in the SystemC verification library,” [8] A. Armando, J. Mantovani, and L. Platania, “Bounded model in Advances in Design Methods from Modeling Languages checking of software using SMT solvers instead of SAT solvers,” for Embedded Systems and SoC’s: Selected Contributions on Int. J. Softw. Tools Technol. Transf., vol. 11, pp. 69–83, 2009. Specification, Design, and Verification from FDL 2009, ser. [9] “SMT-COMP 2009,” http://www.smtcomp.org/2009, 2009. Lecture Notes in Electrical Engineering, D. Borrione, Ed. [10] “SMT-COMP 2010,” http://www.smtcomp.org/2010, 2010. Springer Netherlands, 2010, vol. 63, pp. 227–244. DIFTS 2011 29 A Study of Sweeping Algorithms in the Context of Model Checking Zyad Hassan, Yan Zhang, and Fabio Somenzi Dept. of Electrical, Computer, and Energy Engineering University of Colorado at Boulder Boulder, CO 80309 Abstract—Combinational simplification techniques have niques benefit interpolation considerably. The recently proved their usefulness in both industrial and academic introduced ic3 [1] incrementally discovers invariants model checkers. Several combinational simplification al- that hold relative to stepwise reachability information. gorithms have been proposed in the past that vary in Designing a model checking flow that involves ic3 efficiency and effectiveness. In this paper, we report our requires understanding how combinational simplification experience with three algorithms that fall in the com- binational equivalence checking (sweeping) category. We algorithms affect it. propose an improvement to one of these algorithms. We This paper makes the following contributions: have conducted an empirical study to identify the strengths • We carry out a comparative study of the different and weaknesses of each of the algorithms and how they sweeping methods. can be synergistically combined, as well as to understand • We propose a BDD-based cut sweeping method that how they interact with ic3 [1]. is more effective than the original cut sweeping. • We propose a combined sweeping approach in I. I NTRODUCTION which more than one sweeping method is applied. Combinational simplification eliminates redundancies We show that the combined approach can achieve and increases sharing of logic in a design. It has been more simplification than any of the methods can successfully employed in logic synthesis, equivalence achieve individually. checking, and model checking. • We perform an empirical study of the effect of In the model checking context, combinational sim- sweeping on ic3. plification often dramatically improves the performance The rest of the paper is organized as follows. Section of the proof engines. This has made it into a primary II contains preliminaries. In Section III, we introduce component in both industrial [2] and academic [3], the BDD-based cut sweeping method. In Section IV, [4] model checkers. Several combinational simplification we explain the rationale behind the combined sweeping algorithms have been proposed in the past, such as approach. In Section V we present the experimental DAG-aware circuit compression [5], [6] and sweeping results and in Section VI we conclude. methods [7]–[9]. Sweeping methods merge functionally equivalent nodes. They include BDD sweeping [7], SAT II. P RELIMINARIES sweeping [8], [10], and cut sweeping [9]. When designing a model checker, the strengths and A. AND-inverter-graph weaknesses of each of the sweeping methods should The input and output of our sweeping algorithms be taken into account. To the best of our knowledge, are AND-inverter-graphs (AIGs). An AIG is a directed no studies have been carried out so far to evaluate acyclic graph that has four types of nodes: source nodes, and compare the different sweeping methods, with the sink nodes, internal nodes and the constant TRUE node. exception of a limited study reported in [9]. A primary input (PI) is a source node in an AIG. A The effect of combinational simplification on several primary output (PO) is a sink node and has exactly one model checking engines has been studied in the past. predecessor. The internal nodes represent 2-input AND In [8], SAT sweeping has been shown to positively gates. A node v is a fanin of v0 if there is an edge (v, v0 ); affect bounded model checking (BMC) [11]. In [12], it it is a fanout of v0 if there is an edge (v0 , v). Left(v) is shown that combinational redundancy removal tech- and Right(v) refer to the left and right predecessors of v . DIFTS 2011 30 Fanin(v) and Fanout (v) denote the fanins and fanouts point is that the original BDD is kept for equivalence of node v . An edge in an AIG may have the INVERTED checking, but is not used to produce new BDDs. The attribute to model an inverter. The Boolean function of algorithm is complete if the threshold is so large that a PI is a unique Boolean variable. For an edge, it is no auxiliary variable is introduced. In practice, however, the function of its source node if the edge does not have this can be prohibitive. the INVERTED attribute, and the complement otherwise. For an internal node, it is the conjunction of the functions C. SAT Sweeping of its incoming edges. The Boolean function of a PO is The advancements in SAT solver technology over the that of its incoming edge. past two decades has proliferated SAT-based reasoning A path from node a to b is a sequence of nodes methods. SAT sweeping is one such method proposed hv0 , v1 , v2 , . . . , vn i, such that v0 = a, vn = b and by Kuehlmann [8] for combinational redundancy identi- vi ∈ Fanin(vi+1 ), 0 ≤ i < n. The height of a node v , fication. SAT sweeping queries the SAT solver to prove h(v), is the length of the longest path from a PI to v . The or refute the equivalence of two nodes in the circuit. fanin (fanout) cone of node v is the set of nodes that have The basic SAT sweeping algorithm works as follows. a path to (from) v . Two nodes are functionally equivalent First, the circuit is simulated with random inputs, and (complementary) if they represent the same (comple- candidate equivalence classes are formed, where nodes mentary) Boolean function(s). Functionally equivalent having the same simulation values are placed together. (complementary) nodes can be merged transferring the Next, for each two nodes belonging to the same class, fanouts of the redundant nodes to their representative. the SAT solver is queried for their equivalence. If the To simplify our presentation, in what follows we delib- SAT solver reports they are equivalent, one of them is erately ignore detection of complementary functions. merged into the other. Otherwise, the counterexample provided is simulated to break this equivalence class, B. BDD Sweeping and possibly rule out other candidate equivalences as BDD sweeping identifies two nodes as equivalent if well. This process is repeated until a resource limit is they have the same BDD representation. The original reached, or until all classes become singletons, indicating algorithm of Kuehlmann and Krohm [7] works in two the absence of further equivalences. stages: equivalence checking and false negative detec- In our implementation of SAT sweeping, several tion. In the first stage, it builds BDDs for each node heuristics were applied. We mention each of them briefly. and merges nodes having the same BDD. The algorithm 1) Simulating Distance-1 Vectors: This heuristic was introduces an auxiliary BDD variable (cut point) for each proposed in [13]. Instead of just simulating the coun- node that has been merged, which potentially leads to terexample to equivalence derived by the SAT solver, false negatives. In the second stage, it takes the BDDs all distance-1 vectors, that have a single bit flipped, are of the output functions and for each of them, replaces simulated as well. Only the bits corresponding to the the auxiliary variables with their driving functions. The inputs that are in the fanin cone of the two nodes being algorithm is complete in the sense that it can find all checked for equivalence are flipped. We have found this the equivalences in the circuit given a sufficiently large heuristic to be very effective in practice. In [13], this limit on the BDD sizes. However, a large limit often hurts process is repeated for vectors that were successful in efficiency. In this paper, we intend to use BDD sweeping breaking up equivalence classes until convergence. In our in conjunction with SAT sweeping which is complete implementation, we only simulate the distance-1 vectors and avoids inherent BDD inefficiencies [8]. For that, we for the original counterexample: for the benchmark suite employ a version of BDD sweeping that is incomplete we experimented with, recurring on successful vectors is yet faster than the original. too expensive for the number of refinements it achieves. The algorithm we adopt iterates the following steps 2) Clustering: Simulating distance-1 vectors often on each node v of the AIG in some topological order. results in considerable refinement of the equivalence It builds a BDD for v and checks if there exists another classes. This is desirable, since an equivalence class node that has the same BDD. If so, it merges these two is often broken up more cheaply by simulation than nodes and continues. Otherwise, if the BDD size exceeds by the SAT solver. Moreover, we have observed that a given threshold, the algorithm introduces an auxiliary with distance-1 vector simulation, it becomes very likely BDD variables for v to be used in place of the original that nodes remaining in an equivalence class are indeed BDD when dealing with the fanouts of v . An important equivalent. Therefore, rather than checking the equiva- DIFTS 2011 31 lence of two nodes at a time, we check the equivalence in Φ(v) is associated with a truth table. Next, it searches of all nodes in an equivalence class using a single SAT for a node equivalent to v by looking for a cut equivalent query. If they are all indeed equivalent, we find that using to some cut in Φ(v). If it succeeds, the two nodes are a single SAT query rather than n − 1 queries where n is merged. Otherwise, a heuristic is applied to prune Φ(v) the number of nodes in the class. to at most N cuts. After pruning, the algorithm stores 3) Height-Based Merging: When two nodes are Φ(v) as the cut-set of v and builds a cross-reference proved equivalent, we merge the node with a larger between each of its cuts and v . height into the one with a smaller height, instead of The heuristic for pruning, which we call the quality merging based on a topological order as in [13]. The heuristic, computes a value q for each cut: intuition being that a node having a larger height often X 1 q(C) = . (3) has a larger fanin cone, which suggests that merging it | Fanout (n)| n∈C would lead to a larger reduction. Nodes coming later in a The cuts with the smallest values of q are kept. The topological order do not necessarily have a larger height intuition of the quality heuristic is two-fold. First, it tries than nodes coming earlier. to localize the equivalence and thus favors smaller cuts. D. Cut Sweeping Second, it normalizes cuts by attempting to express them with the same set of variables. The chosen variables are Cut sweeping [9] is a fast yet incomplete approach those that have a large fanouts, i.e., that are shared by for combinational equivalence checking. It iteratively many other nodes. computes cuts for each AIG node and compares the A good truth-table implementation is critical to the functions associated to the cuts. performance of cut sweeping. In [9], truth tables are A cut is defined with respect to an AIG node, called implemented as bit vectors. An advantage of bit vectors root. A cut C(v) of root v is a set of nodes, called leaves, is the constant-time cost of Boolean operations. On the such that any path from a PI to v intersects C(v). A cut- other hand, bit interleaving is required to extend the bit set Φ(v) consists of several cuts of v . For cut-sets Φ(v1 ) vectors to the same length so that the corresponding bits and Φ(v2 ), the merge operator ⊗ is defined as represent the same minterm1 . Φ(v1 ) ⊗ Φ(v2 ) = {C1 ∪ C2 | C1 ∈ Φ(v1 ), C2 ∈ Φ(v2 )} . III. BDD- BASED C UT S WEEPING (1) Assume k ≥ 1. A k-feasible cut is a cut that contains Representing functions having a small number of at most k leaves. A k-feasible cut-set is a cut-set that inputs using bit vectors is very efficient. However, the contains only k-feasible cuts. The k-merge operator, ⊗k , number of bits required grows exponentially with the creates only k-feasible cuts. Cut enumeration recursively number of variables, which can easily lead to memory computes all k-feasible cuts for an AIG. It computes the blow-up. As an alternative, BDDs, which are more suit- k -feasible cut-set for a node v as follows: able for large functions, can also be used to represent cut ( functions. Furthermore, the strong canonicity of BDDs {{v}} v ∈ PI makes it trivial to check for equivalence. The use of Φ(v) = {{v}} ∪ Φ (Left (v)) ⊗k Φ (Right (v)) v ∈ AND , BDDs also enables a heuristic which we describe below. (2) The proposed algorithm differs from the original one where PI and AND refer to the set of PIs and 2-input in two aspects. First, we introduce a new parameter s, the AND gates respectively. Note that cuts are not built for maximum size of a BDD, to replace k. That is, instead of POs because they are never merged. k -feasible cuts, we keep cuts whose functions contain at The function of a cut is the Boolean function of the most s BDD nodes. Node count, as opposed to number root in terms of the leaves. It can be represented in of inputs, is a more natural characterization of BDD size. different ways, for instance, using bit vectors or BDDs. The second difference comes from the pruning heuris- Two cuts are equivalent if their cut functions are equal. tic. We define the height h of a cut C as the average Hence, two nodes are functionally equivalent if their cut- height of its nodes: sets contain equivalent cuts. X h(v) Cut sweeping is parametrized by k and N , the maxi- h(C) = . (4) v∈C |C| mum cut size and the maximum cut-set size, respectively. For each node v in some topological order of the AIG, 1 A good reference of bit-interleaving can be found at http:// the algorithm builds a k-feasible cut-set Φ(v). Each cut graphics.stanford.edu/∼seander/bithacks.html. DIFTS 2011 32 A smaller h indicates that the leaves in the cut are closer cut sweeping only keeps those below the threshold. The to the PIs. The height heuristic keeps at most N cuts quality heuristic tries to attract cuts on certain nodes, choosing the ones with smallest values of h. which is similar to the placement of auxiliary variables in BDD sweeping. Nevertheless, the number of “attractors” a p a r in the quality heuristic tends to be much larger than in b c g BDD sweeping. f c b q s d d IV. C OMBINING S WEEPING M ETHODS Fig. 1. Two implementations of a 4-input AND gate The idea of combining several simplification algo- rithms is not new. Many existing model checkers iterate A motivating example for the new heuristic is in several simplification algorithms before the problem is Figure 1, which shows two different 4-input AND gates. passed to the model checking engines. However, we are Nodes a, b, c, and d are PIs. Nodes p, q , r , and s are unaware of any studies that have been carried out to internal nodes. Nodes f and g can only be merged if identify the best way they could be combined. In this their cut-sets both contain {a, b, c, d}. However, if the section we attempt to give general guidelines to which internal nodes have many more fanouts than the PIs, the a combined approach should adhere. We support our quality heuristic may select cuts containing the internal claims by empirical evidence collected in the experi- nodes instead, causing the merge to be missed. ments reported in Section V. As mentioned before, the quality heuristic tries to The problem we address is as follows: given a time normalize the cuts on certain “attractors.” This reduces budget for combinational simplification, how should it the possibility that equivalent functions are represented be allotted to the different algorithms? The sweeping differently. However, this might also lead to the loss algorithms discussed in previous sections vary in their of the opportunity to find equivalences that cannot be completeness and speed, with cut sweeping being the expressed by those “attractors,” as in Figure 1. most incomplete method yet the fastest of the three meth- On the other hand, the height heuristic tries to push the ods, SAT sweeping being a complete, yet the slowest, cut boundary as far as possible. A supporting argument and BDD sweeping lying in between, both in terms of is that, if a node is employed in equivalent cuts, then completeness and speed. replacing it with its predecessors preserve equivalence. Possible solutions include allocating the whole time Furthermore, new merges that are otherwise undiscover- budget to a single algorithm, or dividing it among two or able (consider other equivalences that require a and b in more algorithms. The fact that some methods are better the above example) may be found. The height heuristic in approaching certain problems than others, suggests does not attract cuts to certain nodes, which may result that more than one method should be applied. If two or in different cuts for equivalent nodes. As shown in the more methods are to be applied in sequence, the intuition experiments, the effectiveness of the height heuristic suggests that the lower effort methods should precede reduces as the height of nodes increases. the higher effort ones. The advantages of doing so are The two heuristics have their own strengths and weak- two-fold. First, although the higher-effort methods are nesses. A natural question is whether it is possible to likely to discover the merges that a lower-effort method combine them to benefit from their individual strengths. discovers, in general, it will take them more time to do We can choose a few cuts with each heuristic. This may so. Second, preceding higher-effort methods by lower- lead to more merges but may also worsen the efficiency effort methods is beneficial in having them present a if it significantly increases the number of cuts. To prevent smaller problem that is easier to handle. such an increase, a combined heuristic only records Finally, the percentages of total time that should be height cuts for the lower nodes, while it keeps both types allotted to each of the methods to yield the maximum of cuts for the others. possible reduction is studied in Section V. There is some connection between cut sweeping with each of the two heuristics and BDD sweeping. With V. R ESULTS the height heuristic, cut sweeping tries to build cuts as large as possible, as BDD sweeping does. However, BDD The experiments are divided into three parts. The first sweeping can store cuts that exceed the threshold while part compares different variations of the cut sweeping DIFTS 2011 33 algorithm. The second part shows the results of the com- proved significantly with s. This means that with s, each bined sweeping methods, and the third part is concerned cut has a larger chance of resulting in a merge. with the effect of sweeping on ic32 . Second, while “k-quality” and “k-combined-1” have We use the benchmark set from HWMCC’10 [14], a very close sweeping times, the latter achieves 19.8% set of 818 benchmarks in total. The experiments are run more merges. Furthermore, the decrease in the “Height” on a machine with a 2.8GHz Quad-Core CPU and 9GB column reveals that the height cuts indeed lead to of memory. We use CUDD [15] as our BDD package merges. Although “s-quality” is more effective than the for all the BDD-related algorithms. two above methods, it is less efficient due to the larger cut sizes. A. BDD-based Cut Sweeping For the methods with s (excluding “s-quality”), we Variations of cut sweeping are applied to the observe that “s-height N = 1” is the fastest and produces HWMCC’10 benchmark set. The differences between a good number of merges. Increasing the number of variations are two-fold. First, either the number of vari- height cuts to two triples the run time without gaining ables, k, or the number of BDD nodes, s, is used to many more merges. Comparing it with “s-combined-1”, drop over-sized cuts. Second, we experiment with several an improvement on the merges is shown by the latter. heuristics for pruning cut-sets: the quality heuristic, This indicates that maintaining one height cut and one the height heuristic, and two combined heuristics. The quality cut works better than two height cuts. For “s- naive combined heuristic (“combined-1”) chooses one combined-2”, the number of merges is between the two cut based on the height heuristic and the others based above methods, but with lower run time. Furthermore, on the quality heuristic. The other heuristic (“combined- the numbers of generated cuts and kept cuts are even 2”) sets a threshold on the node height (350 in our comparable to “s-height N = 1”. That is, even though experiments). For nodes that are below the threshold, we keep three cuts for those nodes with height larger it only keeps a height cut. For higher nodes, it produces than 350, on average we compute only a few percent cut-sets consisting of a height cut and two quality cuts. more cuts than we do in the case of one cut per node. We denote a method by a k or an s followed by the the The “Height” values of the three methods confirm the heuristic name. All the variations use BDDs to represent assumption made in Section III: most merges produced the cut functions. by the height heuristic come from cuts close to the The results are shown in Table I; they are aggregated PIs. When the two heuristics are combined, a significant over the 818 benchmarks. Based on experiments, both increase on the “Height” value is observed. In Figure the threshold of BDD sweeping and s in BDD-based 2, we show the number of merges found by “s-height cut sweeping are set to 250. The total number of AIG N = 1” and “s-combined-2” on nodes within different nodes before sweeping is 7.22M. “Final” is the size of height ranges. The plot is normalized to “k-quality” and AIGs after sweeping. “Generated” and “Kept” are the has bin size of 50, i.e., a point at (2, 1) indicates that the number of cuts generated and kept by the corresponding method finds the same number of merges as “k-quality” methods. For an individual benchmark, its “height” is the for nodes with height from 100 to 149. Obviously, the average height of all merged cuts. The “Height” column height heuristic works better on smaller height nodes, is computed by taking the average of the “height” of while the quality heuristic catches more on larger height all the benchmarks. A smaller value indicates that more ones. The combined heuristic takes the advantage of merges are found by cuts that are close to the PIs. Note both and produces an even better profile on nodes with that since we use BDDs, the results in terms of efficiency larger heights. Note that although the height heuristic of bit-vectors based methods may not be as good as in works worse on the nodes with larger heights, it can [9]. Therefore, when dealing with them, we just compare still get more merges. This may be due to the fact that the effectiveness. in this benchmark set, a large percent of equivalences Results indicate that the resulting AIGs are con- are located at lower heights. sistently smaller with s than with k. There are In our setup, cut sweeping is intended for usage as a a few interesting observations. First, the ratios fast method. Thus we consider “s-height N = 1” and GeneratedCuts /Merge and KeptCuts/Merge are im- “s-combined-2” to be the best variants. Compared to BDD sweeping, those two variants are faster because 2 Detailed results for first and second parts can be found at http: they create fewer BDD nodes than BDD sweeping, but //eces.colorado.edu/∼hassanz/sweeping-study are less effective since BDD sweeping may keep BDDs DIFTS 2011 34 TABLE I R ESULTS OF CUT SWEEPING , BDD SWEEPING AND SAT SWEEPING ON THE HWMCC’10 SET. B Y DEFAULT, k = 8 AND s = 250. Method Final (×106 ) Merges (×105 ) Time (s) Generated (×107 ) Kept (×107 ) Height k-quality N = 5 6.82 2.62 123.26 5.83 1.92 10.20 k-combined-1 N = 5 6.75 3.14 129.64 5.84 1.90 8.57 s-quality N = 5 6.71 3.31 536.75 7.63 2.32 12.11 s-height N = 1 6.55 4.07 58.99 1.07 0.54 3.19 s-height N = 2 6.51 4.20 181.52 2.18 0.99 2.94 s-combined-1 N = 2 6.48 4.42 181.21 2.29 1.02 12.86 s-combined-2 6.52 4.28 74.64 1.10 0.54 12.72 BDD Sweeping 6.34 5.61 112.74 – – – SAT Sweeping 6.10 6.37 2149.4 – – – 2 10 spent in sweeping. Data are aggregated over the 818 Height Heuristic Combined Heuristic benchmarks. The base case for our comparisons is the pure SAT sweeping case in which SAT sweeping gets Ratio over Quality Heuristic 1 the whole budget. The time budget used in our study is 10 10 seconds. We consider the following policies: (a) allocating the budget to two methods, (b) allocating it to three 0 10 methods, and (c) allocating the whole budget to SAT sweeping. For (a) and (b), we consider all the different permutations of assigning integer time values to each −1 method, such that they sum up to 10 seconds. Note that if 10 0 20 40 60 Node Height (/50) 80 100 120 a sweeping algorithm times out, what it has achieved thus far is used in what follows. In all cases, a set of light- Fig. 2. Number of merges on nodes within different height ranges weight sequential simplification algorithms are applied before sweeping. This set of algorithms includes cone- of-influence reduction, stuck-at-constant latch detection, that exceed the threshold. and equivalent latch detection. The total number of AIG nodes for all 818 benchmarks measured after the B. Combined Sweeping Methods sequential simplification step is 6.1M. In this section, we show experimental evidence that Results are presented in Table II. The first column lists supports the guidelines of combining sweeping methods the methods, where the number before each sweeping presented in Section IV. In particular, we try several pos- method indicates the number of seconds given to it. sibilities of allotting the budget to the different sweeping The second column shows the number of AIG nodes algorithms with the purpose of identifying empirically if removed. The third column shows the total time spent in they should be combined, and if so, in which way. In sweeping. The methods are listed in order of decreasing what follows we use the “s-height N = 1” variant of reduction. The last row is for pure SAT sweeping. We cut-sweeping since it is the fastest, and we simply refer only show the best three setups in terms of reduction for to it as cut sweeping. each of the possible orders of the method sequences. In our combined approach, SAT sweeping is always Several observations can be made. First, when it run last since it is the only complete method of the three, comes to running two methods in sequence, BDD and should thus be given whatever time is left to find sweeping combined with SAT sweeping outperforms cut equivalences not discovered by the other methods. Also, sweeping combined with SAT sweeping. The method the time not used by any of the preceding methods is that achieves maximum reduction (8 seconds of BDD passed to SAT sweeping. For instance, if cut sweeping sweeping followed by 2 seconds of SAT sweeping) is given a time budget of 4 seconds and only uses 3 of removes 56K more nodes than pure SAT sweeping (7.7% them, SAT sweeping gets to run for one extra second. more reduction). Second, more reduction is achievable We compare the reduction measured in terms of the by running three methods in sequence. As suggested in number of AIG nodes removed, and the total time Section IV, ordering the methods by increasing effort DIFTS 2011 35 TABLE II TABLE III E FFECT OF BUDGET ALLOCATION ON REDUCTION . E FFECT OF SWEEPING ON ic3’ S PERFORMANCE . Method Reduction Total Sweeping Time (s) Solves Solves Runtime (s) Runtime (s) 4 Cut, 5 BDD, 1 SAT 801,932 518 Seed (No (With (No (With 2 Cut, 5 BDD, 3 SAT 801,137 516 Index Sweeping) Sweeping) Sweeping) Sweeping) 6 Cut, 3 BDD, 1 SAT 801,119 522 1 693 698 96,297 91,762 4 BDD, 1 Cut, 5 SAT 794,052 517 2 689 699 95,629 90,341 8 BDD, 1 Cut, 1 SAT 793,921 515 3 691 699 95,050 92,714 7 BDD, 2 Cut, 1 SAT 793,814 519 4 696 697 93,691 91,141 8 BDD, 2 SAT 793,226 500 5 693 698 95,007 89,656 7 BDD, 3 SAT 793,068 503 6 690 695 96,270 91,559 5 BDD, 5 SAT 792,797 508 7 693 699 94,784 92,056 1 Cut, 9 SAT 772,563 512 8 690 701 94,351 90,837 6 Cut, 4 SAT 771,070 513 9 693 693 95,491 92,847 3 Cut, 7 SAT 769,483 511 10 690 693 95,124 93,048 10 SAT 736,594 619 Average 691.8 697.2 95,169 91,596 (or equivalently by increasing degree of completeness) and after sweeping) in the sweeping case. We compare achieves more reduction than otherwise. Here, the best the number of solves, and the aggregate runtime among method (4 seconds, 5 seconds, and 1 second for cut, all benchmarks. BDD and SAT sweeping, respectively), has an edge It is important to note that the ic3 algorithm has a of 65K nodes over pure SAT sweeping (about 8.9% random flavor. In particular, the order by which gener- more reduction). Third, in terms of sweeping time, it alization (dropping literals) is attempted is randomized. is clear that a large drop occurs (> 100 seconds) when Also, since the algorithm is SAT-based, randomization two or three methods are combined versus pure SAT occurs in the SAT solver decisions. To have reliable sweeping, which is due to the often smaller time needed experimental results, each experiment is repeated with by BDD and cut sweeping to discover equivalences than 10 different seeds, and the results are averaged over the SAT sweeping. Given an overall model checking budget, different seeds. smaller sweeping time allows more time for the model Results are shown in Table III. The first column shows checking engine, which is desirable. the seed index, the second and third columns show the The question of whether such difference has a consid- number of solves without and with sweeping, and the erable effect on the performance of the model checking fourth and fifth columns show the aggregate runtime engine is answered in the next section. without and with sweeping. The results confirm a positive effect of sweeping on C. Effect on ic3 the performance of ic3. On average, five more solves The recently developed model checking algorithm, are achieved with sweeping, and the aggregate runtime ic3 [1], has been regarded as the best standalone model drops by 3.8%. checking algorithm developed up till now [16]. As the The enhancement in the performance of ic3 in pres- interaction of combinational simplification methods with ence of sweeping can be attributed to two factors. First, different model checking algorithms has been studied in reduction in the number of gates caused by sweeping can the past, we here aim to study how they interact with result in the reduction in the SAT solver time. Second, ic3. In particular, we would like to empirically find simplification often results in dropping of latches (e.g., out if ic3 benefits from preprocessing the netlist with if it merges the next-state functions of two latches). For a simplification algorithm or not, and if it does, how example, for the benchmark set used in our experiments, sensitive it is to the magnitude of reductions achieved sweeping reduces the aggregate number of latches from through simplification. 279,161 to 269,091 (3.6% decrease). This reduces the In the first experiment, we compare two runs of ic3, amount of work done by ic3 in generalization of one that is preceded by SAT sweeping, and one that counterexamples to induction. is not. The experimental setup is as follows. A total We now repeat the previous experiment, but this time timeout of 10 minutes is used. The budget for SAT we compare the number of solves and the aggregate sweeping is 10 seconds. The light-weight sequential runtime between pure SAT sweeping and the empirically simplification algorithms referred to in Section V-B are optimum combined sweeping scheme of Section V-B. applied once in the no-sweeping case, and twice (before The purpose of this experiment is to understand how DIFTS 2011 36 TABLE IV O PTIMUM SWEEPING VERSUS PURE SAT SWEEPING . reviewers for their insightful comments regarding cut sweeping’s pruning heuristics that prompted us to try Solves Solves Runtime (s) Runtime (s) the “combined-2” heuristic. (Pure (Optimum (Pure (Optimum Seed SAT Sweeping SAT Sweeping Index Sweeping) Scheme) Sweeping) Scheme) R EFERENCES 1 698 696 91,762 91,567 2 699 696 90,341 91,113 [1] A. R. Bradley, “SAT-based model checking without unrolling,” 3 699 697 92,714 92,373 in Proc. Int. Conf. on Verification, model checking, and abstract 4 697 702 91,141 90,478 interpretation, 2011, pp. 70–87. 5 698 697 89,656 90,327 [2] H. Mony, J. Baumgartner, V. Paruthi, R. Kanzelman, and 6 695 699 91,559 90,606 A. Kuehlmann, “Scalable automated verification via expert- 7 699 697 92,056 91,498 system guided transformations,” in Formal Methods in 8 701 699 90,837 92,228 9 693 696 92,847 91,252 Computer-Aided Design, 2004, pp. 159–173. 10 693 697 93,048 92,663 [3] R. Brayton and A. Mishchenko, “ABC: An academic industrial- Average 697.2 697.6 91,596 91,411 strength verification tool,” in Computer Aided Verification, 2010, pp. 24–40. [4] G. Cabodi, S. Nocco, and S. Quer, “Benchmarking a model checker for algorithmic improvements and tuning for perfor- sensitive ic3 is to the magnitude of reductions. mance,” in Proc. Hardware Verification Workshop, 2010. [5] P. Bjesse and A. Boralv, “DAG-aware circuit compression for Results are shown in Table IV, where the second and formal verification,” in Proc. Int. Conf. on Computer-aided third columns compare the number of solves for pure design, 2004, pp. 42–49. SAT sweeping and the optimum sweeping scheme, and [6] A. Mishchenko, S. Chatterjee, and R. Brayton, “DAG-aware AIG rewriting a fresh look at combinational logic synthesis,” the fourth and fifth columns compare the total runtime. in Proc. Design Automation Conference, 2006, pp. 532–535. As the results indicate, ic3 does not benefit much [7] A. Kuehlmann and F. Krohm, “Equivalence checking using cuts from the better reduction achieved by the combined and heaps,” in Proc. Design Automation Conference, 1997, pp. sweeping scheme. The lack of performance enhancement 263–268. [8] A. Kuehlmann, “Dynamic transition relation simplification for on ic3 can be attributed to the small improvement bounded property checking,” in Proc. Int. Conf. on Computer- in reduction the combined sweeping approach achieves aided design, 2004, pp. 50–57. over pure SAT sweeping. In particular, while pure SAT [9] N. Eén, “Cut sweeping,” Cadence Design Systems, Tech. Rep., sweeping removes 737K nodes out of the total 6.1M 2007. [10] A. Mishchenko, S. Chatterjee, and R. Brayton, “FRAIGs: A nodes in the 818 benchmarks (12.1% reduction), the unifying representation for logic synthesis and verification,” combined approach removes 802K nodes (13.2% reduc- EECS Dept., UC Berkeley, ERL, Tech. Rep., Mar 2005. tion); a mere 1.1% improvement. This suggests that ic3 [11] A. Biere, A. Cimatti, E. M. Clarke, and Y. Zhu, “Symbolic model checking without BDDs,” in TACAS, 1999, pp. 193–207. has a small sensitivity to the magnitude of reduction. [12] G. Cabodi, M. Murciano, S. Nocco, and S. Quer, “Stepping VI. C ONCLUSION forward with interpolants in unbounded model checking,” in Proc. Int. Conf. on Computer-aided design, 2006, pp. 772–778. In this paper, we presented an empirical study of [13] A. Mishchenko, S. Chatterjee, R. Brayton, and N. Een, “Im- the different sweeping methods proposed in the past. provements to combinational equivalence checking,” in Proc. Int. Conference on Computer-aided design, 2006, pp. 836–843. We have shown that a combined sweeping approach [14] A. Biere and K. Claessen, “Hardware model checking compe- outperforms any of the sweeping methods alone. We tition,” in Hardware Verification Workshop, 2010. have also proposed a BDD-based cut sweeping method [15] F. Somenzi, CUDD: CU Decision Diagram Package, University that is more effective than the original cut sweeping. of Colorado at Boulder, ftp://vlsi.colorado.edu/pub/. [16] R. Brayton, N. Een, and A. Mishchenko, “Continued relevance Finally, we have studied the effect of sweeping on the of bit-level verification research,” in Proc. Usable Verification, new model checking algorithm, ic3, and investigated Nov. 2010, pp. 15–16. the causes of the better performance it experiences with sweeping. The goal of this analysis is to help designers of model checkers to make decisions regarding the incorporation of sweeping methods and to provide a deeper understanding of how sweeping methods interact with ic3. ACKNOWLEDGEMENTS We thank Aaron Bradley who motivated this work and contributed to many discussions. We also thank the DIFTS 2011 37 1 Enhancing ABC for LTL Stabilization Verification of SystemVerilog/VHDL Models Jiang Long, Sayak Ray, Baruch Sterin, Alan Mishchenko, Robert Brayton Berkeley Verification and Synthesis Research Center (BVSRC) Department of EECS, University of California, Berkeley {jlong, sayak, sterin, alanmi, brayton}@eecs.berkeley.edu Abstract—We describe a tool which combines a commercial (stabilization properties, to be precise). In this paper, we detail front-end with a version of the model checker, ABC, enhanced to its liveness capabilities. handle a subset of LTL properties. Our tool, VeriABC, provides After reading in a design, VeriABC bit-blasts it into a bit- a solution at the RTL level and produces models for synthesis and formal verification purposes. We use Verific (a commercial level netlist and converts the SVA stabilization properties into software) as the generic parser platform for SystemVerilog and an intermediate LTL representation. Then the LTL properties VHDL designs. VeriABC traverses the Verific netlist database are folded into the bit-level netlist in an appropriate way structure and produces a formal model in the AIGER format. (using an extended liveness-to-safety conversion, explained LTL can be specified using SVA 2009 constructs that are later in Section IV). The resulting bit-level netlist represents processed by Verific. VeriABC traverses the resulting SVA parse trees and produces equivalent LTL formulae using the F,G, Until a formal model of the design, represented as an and-inverter and X operators. The model checker in ABC has been enhanced graph (AIG). And-inverter graphs are concise representations to handles LTL stabilization properties, an important subset of of finite state machines. The AIGER[2] format is a prevalent LTL. The paper presents VeriABC’s implementation strategy, format for AIG representation. supported by various academic software architecture, tool flow, environment setup for formal tools and used in annual hardware model checking competi- verification, use model, the specification of properties in SVA and translation into LTL. Finally the properties are translated tions. Since our liveness verification is based on liveness-to- into safety properties that can be verified by the ABC model safety conversion, eventually the safety verification backend in checker. ABC[1] is called which works on the bit-level netlist produced by the VeriABC front-end. We have used this methodology to verify liveness in the context of compositional verification I. I NTRODUCTION of deadlock freedom of micro-architectural communication We present an integrated tool flow for liveness model check- fabrics. Preliminary experimental results are encouraging. ing using industry hardware description languages (HDLs) and SystemVerilog Assertions: (i) VeriABC: a front-end to read in A. Related work hardware models expressed in HDLs, and (ii) capability of Although parsing and elaborating RTL languages are a stan- model checking a subset of liveness properties. VeriABC is dard practice for commercial EDA products, it is a daunting able to read in hardware models expressed in SystemVerilog task for academics due to language complexity and continuous or VHDL. SystemVerilog and VHDL languages are the most language evolution over the years. Although, vl2mv[12] was an popular HDLs being used in industry today for digital designs. academic tool that attempted to treat a significant subset of the VeriABC generates a formal model in the AIGER[2] format Verilog language for synthesis and verification purposes, it was and relies on a commerical front-end, Verific, to build a generic not maintained and language support was not complete. Tools parser platform for HDLs. This allows down-stream tool flows like ABC[1], VIS[9] parse subsets of Verilog language too in synthesis and verification. A version of ABC was enhanced strict and not applicable in a broad setting. Freely accessible from a safety-only verification engine to allow both safety and tools like icarus[3] contain Verilog languages front-end but are liveness verification. Our current version supports a particular not up-to-date with newer SystemVerilog features. subset of liveness properties called stabilization properties or Our choice of a commercial and stable front-end Verific, generalized fairness constraints (defined in Section IV). allows academics to get around the language barrier to access In a typical use model, a user will develop a hardware real-world designs. design in SystemVerilog or VHDL, and specify its correctness Liveness-to-safety conversion was first proposed in [7], requirements in the property specification language SystemVer- [24]. They demonstrated that verification problems for any ilog Assertion (SVA). SVA has been adopted into IEEE ω-regular property can be converted into a verification prob- SystemVerilog standard and is supported by major commer- lem of an equisatisfiable safety problem. Their algorithm cial tools in simulation, synthesis and verification. The SVA has been deployed successfully in industrial setups and used language in SystemVerilog 2009 standard contains liveness to verify liveness properties of microprocessor designs [6]. constructs that allow full specification of liveness properties Our liveness-to-safety conversion algorithm for stabilization as those defined in LTL formulas. In our framework, a user is essentially an extension of the algorithm proposed in [24] can specify both safety properties and liveness properties and is broader than discussed in [6]. DIFTS 2011 38 B. Contribution As illustrated in Figure 1, we combine a commercial front- end, Verific, with a version of our model checker, ABC, enhanced to handle a subset of LTL properties. Our tool processes the Verific output, conducts various modeling pro- cedures on the design, compiles SVA into LTL formulas, then the enhanced ABC processes the LTL formulas for liveness model checking. We detail the software architecture, tool flow, formal model construction, SVA compilation and downstream LTL modeling checking. RTL+SVA Verific Parser Platform Verific Netlist Databse VeriABC DEBUG AIGER + LTL Formula L2S AIGER ABC Backend Engines Error Proven Trace Fig. 2. Verific Parser Flow Fig. 1. Complete Tool Flow VHDL IEEE 1076-1993, and Verilog IEEE 1364-1995/2001 languages, without any restrictions. The resulting parse tree C. Organization of the Paper comes with an extensive API. We first discuss the capabilities of the Verific parser plat- Elaborate supports both static elaboration and RTL elab- form. In Section III we describe the architecture, formal oration. Static elaboration elaborates the entire language, and modeling of VeriABC and translation of SVA into LTL. In specifically binds instances to modules, resolves library ref- Section IV the stabilization properties are described in further erences, propagates defparams, unrolls generate statements, detail. Section V describes the liveness-to-safety conversion and checks all hierarchical names and function/task calls. for stabilization properties. Experimental results are presented The result after static elaboration is an elaborated parse tree, in Section VI. appropriate for simulation like applications. RTL elaboration is limited to the synthesizable subset of the language. In addition II. BACKGROUND : V ERIFIC PARSER P LATFORM to the static elaboration tasks for this subset, it generates sequential networks through flipflop and latch detection, and Verific Design Automation[4] builds SystemVerilog and Boolean extraction. The result after RTL elaboration is a netlist VHDL Parser Platforms which enable its customers to develop database, appropriate to applications such as logic synthesis advanced EDA products quickly and at low cost. Verific’s and formal verification. This netlist database is the starting Parser Platforms are distributed as C++ source code or library point of VeriABC and we utilize Verific provided C++ APIs and build on all 32 and 64 bit Unix, Linux, and Windows to access the database. operating systems. Applications vary from formal verification to synthesis, simulation, emulation, virtual prototyping, in circuit debug, and design-for-test. We chose Verific as our A. Verific Netlist Database Structure front-end for its commercial success and stability in supporting In this Section, we use Verilog terminology to present the latest language constructs in SystemVerilog. Verific’s netlist database structures. The netlist database is Figure 2 shows the architecture of the Verific parser front- rather intuitive and adheres to the module definitions. Shown end. The main commands we use in Verific library are analyze in Table I, there is a one-to-one correspondence between the and elaborate. Analyze creates parse-trees and performs C++ API class definitions and Verilog constructs. type-inferencing to resolve the meaning of identifiers. The A N etlist corresponds to module definitions in Verilog Parser/Analyzer supports the entire SystemVerilog IEEE 1800, while an Instance object corresponds to module instantiation, DIFTS 2011 39 Verific Database C++ API Class Verilog RTL Objects Netlist Module definition As shown in Figure 3, we employ a two phase approach. Instance Module instantiation First we construct an intermediate netlist graph, a DAG with Port Module port declarations extra annotated node types representing logic structure and Net wire/reg/assign connectivity of the flattened design. In addition to the simple PortRef Port to Net connectivity node types in and-inverter graphs, extra node types contain TABLE I annotations for language constructs such as tri states, flipflops V ERIFIC N ETLIST O BJECTS and latches etc. For example, flipflops contain set/reset pins and driving d-pins; latches contain additional gated-clock defi- after the module’s parameters have been characterized. An nitions. By language definition, a design can specify any signal Instance is a thin copy of the N etlist plus a pointer to its for the clock and reset signals. Design behavior is defined parent netlist. A N etlist contains a set of P orts, N ets and by event-driven semantics. Further analysis needs be done to Instances for its internal logic structure. A P ort corresponds determine if there is an AIG representation that can capture the to the Verilog port definitions which can be input, output original design semantics. The intermediate netlist is a DAG or inout. A N et is a named component, intuitively a wire. for which various traversal algorithms can be conducted for P ortRef contains the connectivity between a P ort and a the later analysis. For this step, VeriABC only traverses the N et. The direction of the P ortRef can be in, out, or inout design hierarchy and hyper-graph in the Verific netlist database depending on the type of P ort it contains. Using these C++ to gather information and construct the intermediate DAG objects, the Verific netlist database defines a directed hyper- representation without conducting any design style checking graph and encapsulates the following types of information: or transformation. Design Hierarchy Design hierarchy is captured as an instance tree by the parent pointers in the Instance with a top-level netlist as the root. Verific Netlist Database Unique Hierarchical Name Following the design hierarchy through the instance tree, each internal object is assigned a unique hier- archical name. Connectivity Intermediate A directed hyper-graph is defined through P ort, N et Netlist Graph and P ortRef : P ort being the node, N et being the edge, and P ortRef containing the connectivity and direction information between pairs of a P ort and a N et. As an edge in the hyper-graph, a N et can be referenced in multiple P ortRef objects. AIG Logic Definition At the leaf of the design hierarchy, a N etlist of primitive operator types such as and, or, adder, Fig. 3. VeriABC Architecture flipflop, latches etc defines the basic logic operators. Recursively, the functional behavior of the design is cap- tured through the directed hierarchical hyper-graph with basic B. Formal Modeling logic operators at its leaf level. The end result of VeriABC is an AIG model that is consistent with the original RTL design semantics. AIG is III. V ERI ABC recognized as a finite state machine model. Compared to VeriABC traverses the above netlist database and transforms the event-driven semantics in HDL language, the execution it into a monolithic AIG which can be treated as a directed semantics of an AIG is synchronous with an implicit universal acyclic graph (DAG). The AIG contains primary input, pri- clock that ticks at every step of the execution. The register mary output, register nodes and and-inverter nodes. Each loads in its driver value at the beginning of each step. The named P ort and N et in the Verific netlist has a mapped node semantics inferred from the Verific netlist structure is more in the AIG graph. Additional book-keeping information such complex, such as its flip-flop can have arbitrary reset logic and as hash tables are created that map the hierarchical name to clock network. The task in formal modeling is to transform the the corresponding AIG node. The down-stream model checker above design components into AIG registers with additional ABC then reads in this AIG to conduct formal analysis. glue logic so that it maintains the consistency. In its simplest form, the following check determines if the design can be readily represented by an AIG. A. Architecture • All registers are clocked by the same primary input signal A hyper-graph is rather hard to traverse and conduct anal- which does not fanout to other nodes. ysis/transformation at the same time. • Reset/Set signals are primary inputs DIFTS 2011 40 • No combinational loops Figure 4 are the templates for matching the basic LTL opera- • No multiple drivers per node tors used in the set of stabilization properties. For stabilization More complex design modeling and transformations can properties, in our current implementation, we restrict the p and be achieved by identifying certain patterns by traversing the q to be Boolean expressions which seems to be sufficient in intermediate netlist graph. Our current implementation sup- practice. The SVA verification directive assume is used to ports the above form and produces a design style summary specify a fairness constraint, while assert is used to specify for debugging purposes. Though capacity and performance the target LTL properties. depends on the type of individual transformation and anal- ysis, for the above ones, the transformation and design style property Until(p,q); checking is very fast, as it conducts only a few traversals of p s_until q; the intermediate netlist graph. After design style checking, a endproperty final traversal of the intermediate netlist graph generates an AIGER file representing the formal model. property GF(p); always (s_eventually p); endproperty C. Commands Implemented VeriABC implementation utilizes the Tcl command inter- property FG(p); face shipped with the Verific library distribution. The following s_eventually (always p); is the list of commands implemented to manage the environ- endproperty ment setup for formal verification. veriabc analyze property X(p); This command constructs the intermediate netlist s_nexttime (p); graph in Figure 3 and conducts design style checking. endproperty veriabc set reset Although a reset signal can be automatically de- tected in certain situations, this command provides Fig. 4. SVA Liveness Template the user with the option to specify the length of the reset sequence and phase of the reset condition. Verific also processes SVA constructs into the netlist A user can also specify the initial value of the database structure, essentially a corresponding parse tree is registers through a VCD waveform or textual file. integrated into the netlist. We conduct traversals of the parse In the generated formal model, the initial value of tree, identify specific liveness constructs and map them into the registers will be valued at the end of the reset the corresponding LTL formula. At the end of the procedure, sequence and the reset condition will be disabled along with the AIGER file generated, a separate file containing after reset. the LTL formulas are generated indicating target liveness veriabc set clock assertions and fairness constraints. The support signals referred A user can specify the clock periods and relation- to in the LTL formulas are named output signals in the ships in the situation when multi-clocks are in the AIGER file. Although we currently only support stabilization design. A phase-counter network will be created in properties in LTL, the full LTL language using X, F , G, the formal model to generate the corresponding clock U operators can be specified fully and translated through signals. SVA constructs. In doing so, this completes the formal model veriabc set cutpoint generation and SVA compilation at the RTL level. This command will prune the cone-of-influence at IV. L IVENESS M ODEL C HECKING IN V ERI ABC cutpoint signal and treat it as free input. veriabc set constant In the FSM modeling formalism, the most intuitive notion This command sets either an input or a cut-point of stabilization states that the system will always reach a signal to a constant value. particular state and stay there forever, no matter which state the veriabc set assume system started from or which path it took. Relaxing this notion This constrains the design signal to be a constant. a bit, stabilization means that the system will eventually reach veriabc write and stay within a given subset of states. Also, stabilization may write out the final formal model in AIGER format. denote conditions on the input and output signals of a system when it attains a stable state. Applications of stabilization The above commands give the user flexibility to model the properties have been demonstrated in [14] and [18], to name environment with constraints and conduct design abstractions a few. We review some basic temporal logic terminology and during verification. formally define stabilization properties using LTL below. D. SVA to LTL Compilation A. LTL, model checking and stabilization property In SystemVerilg 2009 standard, a rich set of LTL operators Familiarity is assumed with LTL, basic model checking are added into SVA language. The SVA properties shown in algorithms, and related terminology like Kripke structures and DIFTS 2011 41 Büchi automata. For further details, see [13]. In our current properties beyond the context of stabilization verification (eg. context, we use LTL properties GFp and FGp, and thus GFp ⇒ GFq). overview their semantics here: let π be a path in some Kripke The class of LTL properties defined as stabilization prop- structure K; π |=K Gp means property p will hold on every erties in this paper is a very important class of temporal state along π; π |=K Fp means the property p will hold properties extensively studied in the literature. It is related eventually on some state along π; π |=K GFp means p will to so-called fairness specifications. Operators GF and FG hold along π infinitely often, and π |=K FGp means p will are often called infinitary operators [19] and symbols F ∞ hold eventually on π forever. Since temporal operators F and and G∞ are used respectively instead [15]. The class itself G are dual (i.e. Fp ≡ ¬G¬p), operators FG and GF are also (i.e. Boolean combination of GF-atoms) has been called dual (i.e. FGp ≡ ¬GF¬p). general fairness constraints [16], [19]. As shown in [16], Definition 1 (GF-atom): Any LTL formula of the form various notions of fairness like impartiality [21], weak fairness GFp or FGp, where p is some atomic proposition or some [20](also called justice [21]), strong fairness [20] (also called Boolean formula involving atomic propositions only, will be compassion [21]), generalized fairness [17], state fairness [22] called a ‘GF-atom’. (also known as fair choice from states [23]), limited looping Stabilization properties are defined as the family of LTL for- fairness [5], and fair reachability of predicate [23] can be mulas that are Boolean combinations of GF-atoms. Formally: expressed by stabilization properties. These properties are used Definition 2 (Stabilization Property): The set of stabiliza- to exclude “unfair” counterexamples in liveness verification tion properties is the syntactic subset of LTL defined as in both linear time and (fair) branching time paradigms. For follows: liveness verification, we usually have a liveness property (the • any GF-atom is a stabilization property actual proof obligation) along with a set of fairness constraints. • if φ is a stabilization property, then so is ¬φ Liveness properties may not be stabilization properties. In that • if φ and ψ are stabilization properties, then so are φ ∧ ψ case we may need to construct the product of the system and φ ∨ ψ and the Büchi automaton of the (negation of the) liveness Example 1: FGp, GFp ⇒ GFq, FGp ∧ FGq ⇒ FGr, property before performing the L2S conversion. Interestingly, and FGp ⇒ FGq ∨ (FGr ∧ GFs) are stabilization proper- for many interesting applications as in [14] and [18], the ties where p, q, r, and s are atomic propositions or Boolean liveness verification obligations fall entirely in the family of formulas involving atomic propositions only and a ⇒ b is the stabilization properties. For these applications, the simple L2S usual shorthand for ¬a ∨ b. However, G(r ⇒ Fg) is an LTL scheme proposed in this paper works. Note that some liveness liveness property but not a stabilization property. properties like G(request ⇒ Fgrant) are not stabilization Needless to say, these are all liveness properties. But not properties, but also have a direct L2S conversion [24]. It is, all of them specify so-called system stabilization directly. therefore, an interesting question that under what more general Properties like FGp and FGp ∧ FGq ⇒ FGr (or its general- conditions there exists a direct L2S conversion. ization ∧ki=1 FGpi ⇒ FGq) are perhaps the most elementary stabilization properties. FGp means that the system eventually V. L2S C ONVERSION FOR STABILIZATION PROPERTIES will reach a state from where p will always hold, i.e. the It is important to understand that any counterexample to system will eventually ‘stabilize’ at p. FGp ∧ FGq ⇒ FGr a liveness property (which must be an infinite trace) can be means that if the system stabilizes at p and also at q (at seen as a “lasso” like configuration with a finite handle and perhaps some other time), then it will stabilize eventually a finite loop. Therefore a liveness counter-example is a lasso at r. Hence, semantics of these properties are close to the which does not satisfy the property on the loop but satisfies intuitive notion of stabilization. [14] demonstrates the use all imposed fairness constraints on the loop. and significance of stabilization properties in the context In general, a liveness problem is converted to a safety prob- of biological system analysis. However, our definition of lem by adding a loop-detection logic and property-detection stabilization captures a broader family of specifications. It logic on top of the product of the FSM of the original system includes FGp ⇒ FGq ∨ (FGr ∧ GFs) which may look and the Büchi automaton of the property to be verified. The contrived, but for example, [18] uses many such complicated loop-detection logic consists of a set of shadow registers, stabilization properties for compositional deadlock analysis comparator logic, and an ‘oracle’. The oracle saves the system of micro-architectural communication fabrics. On the other state in the shadow registers at a non-deterministically chosen hand, our definition includes many properties not intended to time. In all subsequent time frames, the current state of the specify so-called stabilization behavior. For example, GFp or system is compared to the state in the shadow registers. GFp ⇒ GFq. Whenever these two states match, the system has completed a The main motivation behind considering this broader subset loop. The non-deterministic nature of the oracle allows all such of LTL is that we offer a short-cut L2S conversion, avoid- loops to be explored. The property verification logic checks if ing Büchi automaton construction, in a uniform way (due any of the liveness conditions are violated in any such loop to the duality between FG and GF operators). The most and all fairness conditions always hold in the loop. This check significant applications of this class that we have encountered is done as a safety obligation. For a more detailed exposition, is “stabilization verification”, and hence the name is coined see [24]. for the family. (This name was inspired by [14]). Thus the As mentioned, for some simple properties L2S conversion L2S conversion proposed here may be applied for proving can be achieved while avoiding explicit Büchi automata con- DIFTS 2011 42 struction. This is done by adding more functionality to the Theorem 1: For any stabilization property, the given proce- property detection logic. As presented in [24], these properties dure finds one counter-example if one exists. are Fp, GFp, FGp, pUq, , G(r ⇒ Fq), and F(p∧Xq) (Table (Proof Sketch) Any stabilization property can be trans- 1 of [24]). This approach, reviewed in Figure 5, depicts an L2S formed into another stabilization property with GF operators converted circuit for verifying the LTL property Fp. only. Let f be the Boolean structure in the negation of the In the next paragraph, we describe how this construction given stabilization property. The procedure described above verifies Fp. In Section V-A we explain how to extend the ideas will create a monitor that will search for a lasso-loop where of Figure 5 for stabilization properties. Instead of presenting f is violated inside the loop. Since the procedure implicitly the liveness-to-safety conversion through Kripke structure- enumerates all possible cycles in the state space, it will detect based representations (i.e. through explicit state machines a violating cycle if one such exists. based representations), we present the idea in terms of an ac- tual circuit construction (i.e. through symbolic representation VI. E XPERIMENTAL R ESULTS of the state space). Also, although we do not discuss it further, We implemented our L2S scheme for general stabilization the same mechanism handles fairness constraints, which are properties in ABC and experimented with several designs of always stabilization properties, so they just entail adding communication fabrics from industry. Our objective was to additional logic to the circuit for the monitor. For Kripke verify all stabilization properties defined for every structural structure-based descriptions of liveness-to-safety conversion, primitive of the X MAS framework [18]. The properties, see [24]. though local to each component, are verified in the context In Figure 5, save represents an additional primary input of the whole design in order to avoid explicit environmental added to the circuit. This plays the role of the ‘oracle’. When modeling. BLIF models of the communication fabrics were save is asserted for the first time, the current state of the generated by the X MAS compiler [11] from high-level C++ circuit is saved in the set of shadow registers, and register models. The L2S monitor logic was then created by ABC saved is set. saved thus remembers that input save has on these BLIF models. The X MAS compiler also generates been asserted and allows any further activity on save to SMV models from C++ models so that the LTL encoding of be ignored. For subsequent time frames, saved enables the the stabilization properties can be verified directly on the SMV equality detection between the current state of the circuit and models using the NuSMV model checker. the state in the shadow registers. Clearly, signal looped is We found that the ABC based L2S implementation has asserted iff the system has completed a loop. Signal live much better scalability than NuSMV. NuSMV can solve only remembers if the signal p has ever been asserted. The safety toy designs while on the large designs of interest, it fails property that the circuit verifies is, therefore, looped ⇒ to produce a result. On the other hand, our tool works well live. (In general this would be looped & fair ⇒ live.) even on large designs. For most cases, it produces a result The block marked with “⇑” represents this logical implication almost immediately. For a few cases, initial trials could not - the direction of the arrow distinguishes the antecedent signal produce a proof, but with the latest version of ABC using from the consequent signal of the implication. simplification, abstraction, speculative reduction, and property directed reachability (PDR) analysis [8], the proofs were completed. This observation supports the premise that the use A. L2S for stabilization properties of highly developed safety techniques can pay off for liveness In [24], the authors show how to do the L2S conversion for verification. GFp and FGp, which are GF-atoms. We demonstrate how to Experimental results are shown below. Among all the local extend this to any Boolean combination of GF-atoms using an properties that the X MAS compiler generated, we provide example, omitting a formal proof of correctness. results for the most challenging one. Call this property ψ; it Consider a simple stabilization property φ of the form is defined for a FIFO buffer, and has the following LTL form FGa ⇒ FGb + FGc. An L2S converted circuit for this ψ := FG(¬a) ⇒ FG(¬b) ∨ FG(c) is shown in Figure 6. (For simplicity, we do not show any fairness constraints in the example.) Note that, signal live where a, b, and c are appropriate design signals (i.e. interface in Figure 5 monitors if signal p has ever been asserted from signals of a FIFO buffer). Table 1, 2, and 3 compare the the very initial time frame. But for verifying GFp, we need to performance of ABC with NuSMV on small examples. These monitor whether signal p has been asserted between the time examples are instances of communication fabrics or sub- when saved is set and the time when looped becomes true. modules thereof, and are explained in full detail in [10]. Using this fact, the duality between FG and GF operators, simple credit and simple vc (Table 1 and 2, respectively) and the Boolean structure Xa ⇒ Xb +Xc of the given formula, are designs corresponding to Figure 4 and 5 of [10], and we can derive the circuit of Figure 6. Logic that captures the simple ms (Table 3) is a much simpler version of the design Boolean structure of φ is marked with a dotted triangle in shown in Figure 6 of [10]. Note from the table, how perfor- Figure 6. Hence, for any arbitrary stabilization property, we mance of NuSMV degrades even for small designs. For large need to create monitors for individual GF-atoms and a crown designs, NuSMV could not finish for any single instance of of combinational logic on top of these monitors that captures ψ. the Boolean structure of the property. We can formulate the Since ψ is defined for a FIFO buffer and the X MAS following theorem. compiler created one instance of ψ for each FIFO buffer, the DIFTS 2011 43 p Live Circuit Under . Verification . . . state bits . . Looped 0 1 = Save 0 Saved 1 Shadow Registers Fig. 5. Liveness-to-safety transformation for Fp a b Circuit Under c . Verification .. . . Looped state bits .. . = . 0 .. . 1 = Save 0 Saved 1 Shadow Registers Fig. 6. L2S for stabilization property F Ga ⇒ F Gb + F Gc ABC NuSMV Prop # (sec) (sec) ABC NuSMV ABC NuSMV Prop # Prop # 0 0.09 33.23 (sec) (sec) (sec) (sec) 1 0.07 31.8 0 0.03 431.5 0 0.25 0.115 2 0.06 39.57 1 0.12 379.59 1 0.05 0.14 3 0.03 16.46 2 0.8 471.36 2 0.02 0.09 4 0.5 41.37 3 0.8 385.67 TABLE II 5 0.03 16.89 TABLE IV SIMPLE CREDIT TABLE III SIMPLE MS SIMPLE VC DIFTS 2011 44 number of ψ instances is the same as the number of FIFO R EFERENCES buffers. For example, the designs corresponding to Table 1, 2, [1] ABC - a system for sequential synthesis and verification. Berkeley and 3 above have 3, 6, and 4 FIFO buffers, respectively. Verification and Synthesis Research Center, http://www.bvsrc.org. We also experimented on two large communication fabrics [2] Aiger, http://fmv.jku.at/aiger/. [3] Icarus verilog, http://iverilog.icarus.com. of practical interest [10], [18]. One has 20 buffers and the [4] Verific Design Automation: http://www.verific.com. other has 24 buffers. 19 out of 20 of the first design and 23 [5] K. Abrahamson. Decidability and expressiveness of logics of processes. out of 24 from the second design were proved by ABC by a In PhD Thesis, University of Washington, 1980. [6] J. Baumgartner and H. Mony. Scalable liveness checking via property- light-weight interpolation engine within a worst case time of preserving transformations. In DATE, pages 1680–1685, 2009. 5.83 seconds (most were proved in less than a second). Light- [7] A. Biere, C. Artho, and V. Schuppan. Liveness checking as safety weight interpolation could not prove one instance from each checking. In In FMICS02: Formal Methods for Industrial Critical Systems, volume 66(2) of ENTCS. Elsevier, 2002. design. These were proved using advanced techniques from [8] A. Bradley. Sat-based model checking without unrolling. 2011. ABC’s arsenal of safety verification algorithms. For example, [9] R. Brayton, G. D. Hachtel, A. Sangiovanni-vincentelli, F. Somenzi, ABC took a total of 217.2 seconds to prove one of these harder A. Aziz, S. tsung Cheng, and S. Edwards. Vis : A system for verification and synthesis. pages 428–432. Springer-Verlag, 1996. properties. In this time span, ABC first did some preliminary [10] S. Chatterjee and M. Kishinevsky. Automatic generation of inductive simplification, then it tried interpolation, BMC, simulation invariants from high-level microarchitectural models of communication and PDR in parallel for a time budget of 20 seconds. But fabrics. In CAV, pages 321–338, 2010. [11] S. Chatterjee, M. Kishinevsky, and U. Ogras. Modeling communication this attempt failed and it moved on to further simplification micro-architectures (with one hand tied behind your back). Intel by reducing the design using localization abstraction and Technical Report, 2009. speculation. It ran interpolation, BMC, simulation, BDD-based [12] S.-T. Cheng and R. K. Brayton. Compiling verilog into automata, 1994. [13] E. Clarke, O. Grumburg, and D. Peled. Model checking. 2000. reachability and PDR engines in parallel both after abstraction [14] B. Cook, J. Fisher, E. Krepska, and N. Piterman. Proving stabilization and speculation, using an elevated time budget of 100 seconds of biological systems. 2011. and 49 seconds respectively. The iteration after abstraction [15] E. A. Emerson. Handbook of theoretical computer science (vol. b). chapter Temporal and modal logic, pages 995–1072. MIT Press, could not prove the property, but the iteration after speculation Cambridge, MA, USA, 1990. managed to prove it with the PDR engine, which produced the [16] E. A. Emerson and C.-L. Lei. Modalities for model checking: branching final proof in 7 seconds. time logic strikes back. Sci. Comput. Program., 8:275–306, June 1987. [17] N. Francez and D. Kozen. Generalized fair termination. In Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of pro- VII. C ONCLUSION & F UTURE W ORK gramming languages, POPL ’84, pages 46–53, New York, NY, USA, We have developed a tool, VeriABC, which allows us to 1984. ACM. [18] A. Gotmanov, S. Chatterjee, and M. Kishinevsky. Verifying deadlock- access real industrial designs written in SystemVerilog or freedom of communication fabrics. 2011. VHDL and to process them into the AIGER format. The result [19] R. Hojati, R. K. Brayton, and R. P. Kurshan. Bdd-based debugging of can be used for synthesis and verification using a tool like design using language containment and fair ctl. In Proceedings of the 5th International Conference on Computer Aided Verification, CAV ’93, ABC. We described how the RTL processing is done using pages 41–58, London, UK, 1993. Springer-Verlag. the commercial front-end, Verific. SVA assertions are also [20] L. Lamport. ”sometime” is sometimes ”not never”: on the temporal processed by Verific, and VeriABC creates a separate file of logic of programs. In Proceedings of the 7th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’80, pages equivalent LTL formulas. We showed an application of this 174–185, New York, NY, USA, 1980. ACM. to property checking, where ABC was enhanced to convert a [21] D. J. Lehmann, A. Pnueli, and J. Stavi. Impartiality, justice and subset of LTL into a circuit structure, thus effectively allowing fairness: The ethics of concurrent termination. In Proceedings of the 8th Colloquium on Automata, Languages and Programming, pages 264–277, liveness checking in ABC. London, UK, 1981. Springer-Verlag. The use of a stable, supported and complete language [22] A. Pnueli. On the extremely fair treatment of probabilistic algorithms. processing tool like Verific, allows academics access to real in- In Proceedings of the fifteenth annual ACM symposium on Theory of computing, STOC ’83, pages 278–290, New York, NY, USA, 1983. dustrial designs, without going through the hassle and daunting ACM. task of building their own equivalent tool. Liveness property [23] J. P. Queille and J. Sifakis. Fairness and related properties in transition checking is a growing interest in industry, and our enhanced systems a temporal logic to deal with fairness. In Acta Informat, pages 195–220, 1983. ABC with a front end that automatically converts to a circuit [24] V. Schuppan and A. Biere. Efficient reduction of finite state model structure for liveness checking, can use the advanced safety checking to reachability analysis. Int. J. Softw. Tools Technol. Transf., property methods of ABC. 5(2):185–204, 2004. In the future, the development of VeriABC will allow us to extract higher level constructs from SystemVerilog and VHDL by accessing Verific’s parse trees. These constructs can be passed on using an extended AIGER format to an enhanced ABC, which will use this information in synthesis and verification. VIII. ACKNOWLEDGMENT This work has been supported in part by SRC contract 2057.001 and our industrial sponsors: Abound Logic, Actel, Altera, Atrenta, IBM, Intel, Jasper, Magma, Oasys, Real Intent, Synopsys, Tabula, and Verific. DIFTS 2011 45 On Incremental Satisfiability and Bounded Model Checking Siert Wieringa∗ Aalto University, Finland siert.wieringa@aalto.fi Abstract 1 Introduction Model checking is a formal verification technique Bounded Model Checking (BMC) is a symbolic revolving around proving temporal properties of model checking technique in which the existence systems modelled as finite state machines. A of a counterexample of a bounded length is rep- property holds for the model if it holds in all pos- resented by the satisfiability of a propositional sible execution paths. If the property does not logic formula. Although solving a single instance hold this can be witnessed by a counterexample, of the satisfiability problem (SAT) is sufficient to which is a valid execution path for the model in decide on the existence of a counterexample for which the property does not hold. Because the any arbitrary bound typically one starts from model has a finite number of states any infinite bound zero and solves the sequence of formu- execution of the system includes a loop, and can las for all consecutive bounds until a satisfiable thus be represented by a finite sequence of exe- formula is found. This is especially efficient in cution steps. Bounded Model Checking (BMC) the presence of incremental SAT-solvers, which [1] is a symbolic model checking technique in solve sequences of incrementally encoded formu- which the existence of a counterexample con- las. In this article we analyze empirical results sisting of a bounded number of execution steps that demonstrate the difference in run time be- is represented by the satisfiability of a proposi- havior between incremental and non-incremental tional logic formula. It thus allows the use of SAT-solvers. We show a relation between the ob- decision procedures for the propositional satisfi- served run time behavior and the way in which ability problem (SAT) for model checking. De- the activity of variables inside the solver propa- spite the theoretical hardness of SAT [7] such de- gates across bounds. This observation has not cision procedures, called SAT-solvers [9, 14, 17], been previously presented and is particularly have become extremely efficient. BMC is pop- useful for designing solving strategies for paral- ular as a technique for refuting properties, and lelized model checkers. although BMC based techniques can be used for proving properties we do not consider such tech- niques here. ∗ Financially supported by the Academy of Finland A typical BMC encoding will have semantics project 139402 such that if there exists a counterexample of DIFTS 2011 46 length k then there also exists a counterexam- ier to solve than the largest unsatisfiable ones. ple of any length greater than k. Thus in prin- A more opportunistic search strategy may re- ciple solving a single propositional logic formula duce the total time required to find a satisfiable is sufficient to decide on the existence of a coun- formula by skipping over hard instances. Such terexample for any arbitrary finite bound. How- strategies are natural for environments in which ever, one typically starts to solve the formula multiple computing nodes are available in paral- corresponding to bound zero and then solves se- lel, where one may define some nodes to use a quentially each consecutive bound until a coun- more opportunistic strategy than others. terexample is found. We will refer to this as the The observation on the empirical hardness of standard sequential search strategy. This strat- the smallest satisfiable formulas compared to the egy has the nice property that it always finds a largest unsatisfiable ones can also be made for counterexample of minimal length. As with ev- BMC. We attempted to implement opportunis- ery bound the representing formula grows larger tic strategies in our parallelized BMC framework it also avoids solving unnecessarily large formu- Tarmo [18]. This however turned out to be less las. Importantly, the performance of this strat- efficient then we would have expected, with per- egy benefits greatly from the availability of incre- formance degrading for many benchmarks. In mental SAT-solvers. Incremental SAT-solvers this article we evaluate the performance of the can solve sequences of formulas that share large incremental solver and compare it against that parts in common efficiently in a single solver pro- of solving each bound separately. The purpose cess, allowing reuse of information between for- of this study is not to illustrate that incremen- mulas. tal solvers are more effective for BMC than non- incremental ones, as that is well known, but to understand when and how opportunistic strate- 2 Motivation gies can be applied. This is done by compar- ing against non-incremental solver run times be- Automated SAT based planning is a problem cause solvers applying opportunistic strategies closely related to BMC. It deals with the same benefit less from the incremental interface of the sequences of parameterized formulas, except solver, as the problem is no longer introduced that the satisfiability of a formula now corre- one bound at a time. sponds to the existence of a plan of a bounded length. In [15] evaluation strategies for plan- ning were suggested that are more opportunis- 3 Preliminaries tic than the standard sequential search strategy. They suggest to spend some amount of the to- The majority of modern SAT-solvers are based tal solving effort at attempting to solve formu- on the Davis Putnam Logemann Loveland las for bounds ahead of the currently smallest (DPLL) procedure [8]. The DPLL-procedure is unsolved one. It was inspired by the empirical a backtracking search procedure for SAT that observation that if a plan exists then amongst builds a partial assignment by iteratively decid- the smallest satisfiable formulas in the sequence ing on a branching variable to be assigned a there are typically formulas that are much eas- value in the partial assignment. When the par- DIFTS 2011 47 tition of the search space defined by the partial out the need for integrating them into one appli- assignment is without solutions the algorithm cation. The input sequences used were the same backtracks. In addition to this modern SAT- as the benchmarks used for experimental results solvers typically employ conflict clause learning presented in [18]. These sequences were gener- [17]. Such solvers record a new conflict clause ated with the current state-of-the-art encoding whenever they are forced to backtrack. They for model checking of linear time temporal logic then backtrack non-chronologically to a decision properties with past (PLTL) [2] as implemented point at which the conflict clause was still satis- in the model checker NuSMV 2.4.3 [6]. fied. The performance of the DPLL-procedure de- pends heavily on its branching variable decisions. 4 Run time A commonly used decision heuristic for clause learning SAT-solvers is the Variable State Inde- In our experiments we have studied the behav- pendent Decaying Sum (VSIDS) heuristic first ior of SAT-solvers on problem sequences from presented in the solver Chaff [14]. The idea of BMC regarding both the run time and variable the heuristic is to favor variables that are in- activity. A selection of the results is presented in cluded in recently derived conflict clauses. For the figures in this article. For each benchmark each variable an initially zero value called the ac-there are two subfigures, a run time graph la- tivity is maintained. Whenever a conflict clause beled (a) and a variable activity graph labeled (b). In this section we will focus only on the run is learnt the activity of all variables that occur in the clause is increased. Periodically the activity time graphs, which have bounds on the x-axis of all variables is divided by a constant. and time on the y-axis. For each bound a ver- All results presented in this article were ob- tical bar displays the time it took to solve the tained using the SAT-solver MiniSAT 2.2.0 [9]1 . formula corresponding to that single bound us- The solver core was not modified but a num- ing the SAT-solver in non-incremental fashion. ber of small modifications2 were made in aux- If the solver found unsatisfiable a solid red bar iliary routines such as the file parser in order is drawn, if the solver found satisfiable only the to read incremental SAT sequences from disk. outline of the bar is drawn in green. When employing BMC typically the SAT-solver The incremental solver solves using the stan- will not read the formula sequence from disk but dard sequential strategy and whenever it com- it will rather be integrated into a BMC engine pletes a bound it reports the run time up to that is generating formulas for new bounds on that point. The points in the graphs marked the fly. We use sequences stored on disk as this with crosses (x) and connected by the thick line is convenient for testing the performance of the represent these run times. SAT-solver independently. As our sequences are The thin dotted line connecting the plusses streamable one can also use them as an interface (+) is representing the cumulative run time of between a BMC engine and a SAT-solver with- the non-incremental solver, i.e. for each bound k the value displayed is the sum of all run times 1 Available from http://www.minisat.se of the non-incremental solver tests from bound 2 Available from http://users.ics.tkk.fi/swiering 0 to k. This demonstrates the time required DIFTS 2011 48 80 1600 70 1400 60 1200 hyperactive variables 50 1000 time (s) 40 800 30 600 20 400 10 200 0 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 bound bound (a) (b) Figure 1: Benchmark irst.dme6 from [3] 60 10000 9000 50 8000 7000 40 hyperactive variables 6000 time (s) 30 5000 4000 20 3000 2000 10 1000 0 0 0 20 40 60 80 100 0 20 40 60 80 100 bound bound (a) (b) Figure 2: Benchmark bc57sensors.p2neg from [3] 100 3500 3000 80 2500 hyperactive variables 60 2000 time (s) 1500 40 1000 20 500 0 0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 bound bound (a) (b) Figure 3: Benchmark eijk.S1238.S from [3] DIFTS 2011 49 for the standard sequential strategy using a non- a way that is easy to update when the bound on incremental solver. This line is intentionally not the length of the execution paths is extended. influencing the range of the y-axis, as it typically The solver can be thought of as having tuned it- grows so large that it would make the other re- self towards verifying the property holds in the sults hard to see. exact same way over and over. From Fig. 1(a) it can be seen that the shortest counterexample for benchmark irst.dme6 is of Another way to look at the result presented in length 53. The run times of the non-incremental Fig. 3(a) is that by using the standard sequen- SAT-solver clearly show the behavior that in- tial strategy we are aiding the solver in proving spired the opportunistic strategies of [15], i.e. the unsatisfiability of the formula corresponding the run time of the non-incremental solver for to the counterexample of length 150, the largest small satisfiable formulas is much smaller than bound tested here. By forcing it through the se- that of the largest unsatisfiable ones. It may quence of formulas we force a direction on the be observed from Fig. 2(a) that for benchmark search that is natural to our problem descrip- bc57sensors.p2neg the run times for the two tion, and apparently this is helpful for the SAT- smallest satisfiable formulas corresponding to solver. For benchmarks with this kind of run bounds 104 and 105 are relatively large. An easy time behavior there is clearly no hope for any satisfiable formula can however be found a little opportunistic strategies. further ahead at bound 106, thus the use of an opportunistic strategy could possibly be benefi- cial. An incremental solver can be started from Note that all results presented in this arti- any arbitrary bound, and it is possible to pro- cle demonstrate that the use of the incremental ceed by increasing the bound by more than one solver is crucial when performing the standard every time a formula is solved. Using bound sequential strategy. Fig. 3(a) presents run time increments larger than one is one of the sim- behavior for the benchmark eijk.S1238.S which ple strategies we have tried in our experiments. is the encoding of model checking a property that This strategy should still be considered oppor- holds on all execution paths of the model. This tunistic because of the “missing information” it implies that the formulas are unsatisfiable for all causes for the solver, leading it further away from bounds. Here, the crucial role that incremen- the efficiency of incremental solving, and further tal SAT solving often plays in solving BMC is towards non-incremental behavior. Given the even clearer. Whereas a non-incremental solver small margin of error available for opportunis- would take about 100 seconds to find that bound tic approaches for satisfiable benchmarks, and 150 alone is unsatisfiable, the incremental solver no chance of any performance improvement for finds this result for all bounds from 0 to 150 se- many unsatisfiable benchmarks, we need to be quentially in half that time. This is typical be- careful when applying these approaches. They havior for many benchmarks corresponding to are however amongst the most natural ways of model checking a property that holds. It seems diversifying search strategies amongst nodes in that in these cases the solver learns that the an environment with parallel computation re- property holds for all short execution paths in sources. DIFTS 2011 50 5 Parallel SAT solving seems that the interrupted solver is missing more information than just conflict clauses. This was There are two common architectures for parallel one of the reasons to look at the way the activ- SAT-solvers [11]. The first is the classic divide- ity of variables propagates across bounds on the and-conquer approach in which the formula is incremental SAT-solver runs. split into multiple disjoint subformulas each of which are then solved on a different comput- ing node [4, 19]. The second approach is the 6 Variable activity so called portfolio approach [10]. The basic idea is that every computing node is running a SAT- To obtain data on the activity of variables the solver that is attempting to solve the same for- SAT-solver was modified to print the activity of mula. As modern SAT-solvers make some deci- all variables after each bound it completed. For sions randomly their run time varies greatly be- each bound we are interested in which variables tween runs. This makes the portfolio approach are the most active, and especially in whether surprisingly efficient as it is able to decide the this activity remains high across several bounds. satisfiability of the formula as soon as the fastest We consider a variable hyperactive if its activity solver finishes. Further diversification may be is within the highest 2% of variables with non- achieved by using different parameters on differ- zero activity. ent computation nodes. Obviously opportunistic The graphs labeled (b) in this article visual- search strategies provide means for diversifica- ize the hyperactive variables. All variables that tion when we are considering solving incremen- are hyperactive for at least one bound are rep- tally encoded SAT formulas in a parallel envi- resented by an integer value on the y-axis of the ronment. graph. The variables are sorted on the y-axis The current implementation of our paral- by their index such that if we define y(v) as the lelized BMC framework Tarmo can be seen as integer on the y-axis corresponding to the vari- 0 a parallelized incremental SAT-solver using the able with index v then for any v > v we have 0 portfolio approach. Each computing node is run- y(v ) > y(v). ning an incremental SAT-solver in the conven- Just like in the run time graphs the values on tional sequential fashion. The novelty of Tarmo the x-axis of the graph represent bounds. If a is that it allows sharing of conflict clauses be- variable was hyperactive starting from bound k tween SAT-solvers even if they are working on up to but not including bound k 0 > k then a hor- different bounds. The solvers operate otherwise izontal line was drawn in the graph from bound independently, i.e. if one solver solves a formula k to k 0 at the y position corresponding to that this does not stop the other solvers from at- variable. In other words for all variables v and all tempting to solve that same formula. This choice bound intervals [k, k 0 ) on which v is hyperactive was made after observing that interrupting a a line was drawn from (k, y(v)) to (k 0 , y(v)). solver to make it “catch-up” with another breaks One may observe that generally variables with its ability to benefit from incremental SAT to the larger indices become active later. This is be- full extent. As we made this observation in an cause in the solver each newly introduced vari- environment where clause sharing takes place it able is given a larger index than all existing vari- DIFTS 2011 51 20 2500 2000 15 hyperactive variables 1500 time (s) 10 1000 5 500 0 0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 bound bound (a) (b) Figure 4: Benchmark abp4.ptimoneg from [2] ables, and for each bound a set of new variables is bound local. created. For each bound a subset of the new vari- We have generated graphs like the ones pre- ables becomes hyperactive quickly, as the solver sented in this paper for a large set of bench- runs into conflicts on the newly added clauses. marks3 . We observe that on benchmarks with The hyperactive variables in the satisfiable a bound global variable activity the run time of benchmarks irst.dme6 and bc57sensors.p2neg the non-incremental SAT-solver for the largest are displayed in Fig. 1(b) and Fig. 2(b). Note bound solved is smaller than the time spent for that although for each bound some of the new the incremental solver to get to the same bound variables become hyperactive all these vari- and solve it. For benchmarks with a bound lo- ables tend to remain hyperactive throughout the cal variable activity this is never the case and whole process. This means that whenever a thus a opportunistic heuristic will not improve bound is added the solver still runs into new con- performance. flicts regarding variables that represent the state Although we expect all hard satisfiable bench- at smaller timepoints. We say that the activity marks to have a bound global variable activity it of variables is bound global. is not the case that all unsatisfiable benchmarks have a bound local variable activity. The bench- For the benchmark eijk.S1238.S the activity mark abp4.ptimoneg represented in Fig. 6 is an graph looks very different. For each bound the example of an unsatisfiable benchmark with a solver creates conflict clauses including the new bound global variable activity. Apparently the variables, thus creating a new set of hyperactive correctness of the property is not implied within variables, but there are only very few variables a short number of execution steps here, and the for which hyperactivity is maintained. This is in incremental solver needs to evaluate large por- line with observation on the run time behavior of tions of the search space for every bound. Note this benchmark which also indicate that hardly also that for this benchmark an opportunistic ap- any work has to be performed to find unsatisfi- 3 ability. We say that the activity of variables is Available from http://users.ics.tkk.fi/swiering DIFTS 2011 52 proach may help to find unsatisfiable formulas at to the solver. Tolerance to bad choices for such larger bounds faster. an incremental encoding could be achieved by doing this in a parallel environment with some opportunistic nodes. 7 Conclusions In this article we have shown a relation between References the run time of the standard sequential strat- [1] Armin Biere, Alessandro Cimatti, Ed- egy for bounded model checking and the activ- mund M. Clarke, and Yunshan Zhu. Sym- ity of decision variables in solvers employing this bolic model checking without BDDs. In strategy. We can use this observation in a SAT- Rance Cleaveland, editor, TACAS, volume solver to predict during the search whether a 1579 of Lecture Notes in Computer Science, more opportunistic strategy could be beneficial pages 193–207. Springer, 1999. for the search. This is especially useful for par- allel solvers in which different threads may be [2] Armin Biere, Keijo Heljanko, Tommi A. executing different strategies. Junttila, Timo Latvala, and Viktor Schup- It is also easy to envision how these techniques pan. Linear encodings of bounded LTL could be useful for model checkers that use a model checking. Logical Methods in Com- combination of truly different model checking puter Science, 2(5), 2006. techniques such as PdTrav [5]. One could eas- ily engineer a system which would do BMC for [3] Armin Biere and Toni Jussila. Hard- some amount of time, after which the variable ware model checking competition 2007 activity could play a role in the decision on how (HWMCC07). Organized as a satellite event to continue. If the variable activity appears to to CAV 2007, Berlin, Germany, July 3-7, be bound local then the property is likely to hold 2007. for the model and thus we may want to start do- ing a complete model checking technique based [4] Max Böhm and Ewald Speckenmeyer. A on for example k-induction [16], Craig interpola- fast parallel SAT-solver - efficient workload tion [13] or BDDs [12] to prove this. balancing. Ann. Math. Artif. Intell., 17(3- 4):381–400, 1996. Another observation we made is that for de- ciding the satisfiability of the last formula in an [5] Gianpiero Cabodi, Paolo Camurati, Luz incrementally encoded sequence of formulas it Garcia, Marco Murciano, Sergio Nocco, can sometimes be faster to solve all formulas in and Stefano Quer. Trading-off SAT search the sequence. This raises the question whether and variable quantifications for effective un- other applications of SAT solving that currently bounded model checking. In Alessandro rely on solving a single SAT formula could ben- Cimatti and Robert B. Jones, editors, FM- efit from the use of incremental problem encod- CAD, pages 1–8. IEEE, 2008. ings. Such encodings allow enforcing a search direction on the SAT-solver that is natural to [6] Alessandro Cimatti, Edmund M. Clarke, the application and therefore possibly beneficial Enrico Giunchiglia, Fausto Giunchiglia, DIFTS 2011 53 Marco Pistore, Marco Roveri, Roberto Se- in Computer Science, pages 1–12. Springer, bastiani, and Armando Tacchella. NuSMV 2005. 2: An opensource tool for symbolic model checking. In Ed Brinksma and Kim Guld- [14] Matthew W. Moskewicz, Conor F. Madi- strand Larsen, editors, CAV, volume 2404 of gan, Ying Zhao, Lintao Zhang, and Sharad Lecture Notes in Computer Science, pages Malik. Chaff: Engineering an efficient SAT 359–364. Springer, 2002. solver. In DAC, pages 530–535. ACM, 2001. [7] Stephen A. Cook. The complexity of [15] Jussi Rintanen, Keijo Heljanko, and Ilkka theorem-proving procedures. In STOC, Niemelä. Planning as satisfiability: parallel pages 151–158. ACM, 1971. plans and algorithms for plan search. Artif. Intell., 170(12-13):1031–1080, 2006. [8] Martin Davis, George Logemann, and Don- [16] Mary Sheeran, Satnam Singh, and Gunnar ald Loveland. A machine program for Stålmarck. Checking safety properties us- theorem-proving. Commun. ACM, 5:394– ing induction and a SAT-solver. In Warren 397, July 1962. A. Hunt Jr. and Steven D. Johnson, editors, [9] Niklas Eén and Niklas Sörensson. An exten- FMCAD, volume 1954 of Lecture Notes in sible SAT-solver. In Enrico Giunchiglia and Computer Science, pages 108–125. Springer, Armando Tacchella, editors, SAT, volume 2000. 2919 of Lecture Notes in Computer Science, [17] João P. Marques Silva and Karem A. pages 502–518. Springer, 2003. Sakallah. GRASP: A search algorithm for [10] Youssef Hamadi, Saı̈d Jabbour, and propositional satisfiability. IEEE Trans. Lakhdar Sais. ManySAT: A parallel SAT Computers, 48(5):506–521, 1999. solver. JSAT, 6(4):245–262, 2009. [18] Siert Wieringa, Matti Niemenmaa, and [11] Antti Eero Johannes Hyvärinen, Tommi A. Keijo Heljanko. Tarmo: A framework for Junttila, and Ilkka Niemelä. Partitioning parallelized bounded model checking. In search spaces of a randomized search. In Lubos Brim and Jaco van de Pol, editors, Roberto Serra and Rita Cucchiara, editors, PDMC, volume 14 of EPTCS, pages 62–76, AI*IA, volume 5883 of Lecture Notes in 2009. Computer Science, pages 243–252. Springer, [19] Hantao Zhang, Maria Paola Bonacina, 2009. and Jieh Hsiang. PSATO: A distributed [12] Kenneth L. McMillan. Symbolic model propositional prover and its application to checking. Kluwer, 1993. quasigroup problems. J. Symb. Comput., 21(4):543–560, 1996. [13] Kenneth L. McMillan. Applications of Craig interpolants in model checking. In Nico- las Halbwachs and Lenore D. Zuck, edi- tors, TACAS, volume 3440 of Lecture Notes DIFTS 2011 54 DIFTS’11 Keyword Index Keyword Index BMC 46 combinational simplification 30 Data structure 13 Domain Specific Embedded Language (DSEL) 22 Dynamic Spectrum Access 3 empirical results 46 Engine Independence 22 front-end engineering 38 liveness to safety 38 model checking 30 Model checking 13 On-the-fly 13 Policies 3 Real-time 13 Reasoning 3 SAT 46 Satisfiability Modulo Theories (SMT) 22 SMT solving 3 solving strategies 46 stabilization 38 sweeping 30 Timed automata 13 1 DIFTS’11 Author Index Author Index Arkoudas, Konstantine 3 Brayton, Robert K. 38 Chadha, Ritu 3 Chiang, Jason 3 Cleaveland, Rance 13 Drechsler, Rolf 22 Fey, Goerschwin 22 Fontana, Peter 13 Frehse, Stefan 22 Grosse, Daniel 22 Haedicke, Finn 22 Hassan, Zyad 30 Kuehlmann, Andreas 1 Long, Jiang 38 Mishchenko, Alan 38 Morrison, Chris 2 Ray, Sayak 38 Somenzi, Fabio 30 Sterin, Baruch 38 Wieringa, Siert 46 Zhang, Yan 30 1