I. INTRODUCTION

No.

Trading-off incrementality and dynamic restart of multiple solvers in IC3

G. Cabodi

A. Mishchenko

M. Palena

Italy

2265

001 2012

-This paper1addresses the problem of SAT solver performance in IC3, one of the major recent breakthroughs in Model Checking algorithms. Unlike other Bounded and Unbounded Model Checking algorithms, IC3 is characterized by numerous SAT solver queries on small sets of problem clauses. Besides algorithmic issues, the above scenario poses serious performance challenges for SAT solver configuration and tuning. As well known in other application fields, finding a good compromise between learning and overhead is key to performance. We address solver cleanup and restart heuristics, as well as clause database minimality, based on on-demand clause loading: transition relation clauses are loaded in solver based on structural dependency and phase analysis. We also compare different solutions for multiple specialized solvers, and we provide an experimental evaluation on benchmarks from the HWMCC suite. Though not finding a clear winner, the work outlines several potential improvements for a portfolio-based verification tool with multiple engines and tunings.

I. INTRODUCTION

IC3 [1] is a SAT-based invariant verification algorithm for bit-level Unbounded Model Checking (UMC). Since its introduction, IC3 has immediately generated strong interest, and is now considered one of the major recent breakthroughs in Model Checking. IC3 proved to be impressively effective on solving industrial verification problems. Our experience with the algorithm shows that IC3 is the single invariant verification algorithm capable of solving the largest number of instances among the benchmarks of the last editions of the Hardware Model Checking Competition (HWMCC).

IC3 heavily relies on SAT solvers to drive several parts of the verification algorithm: a typical run of IC3 is characterized by a huge amount of SAT queries. As stated by Bradley in [2], the queries posed by IC3 to SAT solvers differ significantly in character from those posed by other SAT-based invariant verification algorithms (such as Bounded Model Checking [3], k-induction [4] [5] or interpolation [6]). Most notably, SAT queries posed by IC3 don’t involve the unrolling of the transition relation for more than one step and are thus characterized by small-sized formulas.

IC3 can be thought as composed of two different layers: at the top level, the algorithm itself drives the verification process by constantly refining a set of over-approximations to forward reachable states with new inductive clauses; at the bottom level, a SAT solving framework is exploited by the top-level algorithm to respond to queries about the system. As shown in [7], these two layers can be separated by means of a clean interface.

Performance of IC3 turns out to be both highly sensitive to the various internal behaviours of SAT solvers, and strictly dependent on the way the top-level algorithm is integrated with the underlying SAT solving framework.

The peculiar characteristics exposed by the SAT queries of IC3 can thus be exploited to improve the overall performance of the algorithm in two different manners: 1) Tuning the internal behaviours of the particular SAT solver employed to better fit IC3 needs. 2) Defining better strategies to manage the SAT solving work required by IC3.

In this paper we address this second issue, proposing and comparing different implementation strategies for handling SAT queries in IC3. The aim of this paper is to identify the most efficient way to manage SAT solving in IC3. To achieve this goal we experimentally compare a number of different implementation strategies over a selected set of benchmarks from the recent HWMCC.

The experimental work has been done by two different research groups, on two different state-of-the-art verification tools, ABC [8] and PdTRAV [9], that share similar architectural and algorithmnic choices in their implementation of IC3.

The focus of this paper is neither on the IC3 algorithm itself nor on the internal details of the SAT solving procedures employed, but rather on the implementation details of the integration between IC3 and the underlying SAT solving framework.

B. Contributions The main contributions of this paper are:

• A characterization of SAT queries posed by IC3. • Novel approaches to solver allocation, loading and clean up in IC3. • An experimental evaluation of performance using two verification tools.

C. Outline

First in Section II we introduce the notation used and give some background on IC3. Then, in Section III we present a systematic characterization of the SAT solving work required by IC3. Section IV introduces the problem of handling SAT queries posed by IC3 efficiently. Both commonly used and novel approaches to the allocation, loading and cleaning up of SAT solvers in IC3 are discussed in Sections V, VI and VII respectively. Experimental data comparing these approaches are presented in Section VIII. Finally, in Section IX we draw some conclusions and give summarizing remarks.

II. BACKGROUND A. Notation

Definition 1. A transition system is the triple S = hM, I, T i where M is a set of boolean variables called state variables of the system, I(M ) is a boolean predicate over M representing the set of initial states of the system and T (M, M ′) is a predicate over M, M ′ that represents the transition relation of the system.

Definition 2. A state of the system is represented by a complete assignment s to the state variables M . A set of states of the system is represented by a boolean predicate over M . Given a boolean predicate F over M , a complete assignement s such that s satisfies F (i.e. s |= F ) represents a state contained in F and is called an F -state. Primed state variables M ′ are used to represent future states and, accordingly, a boolean predicate over M ′ represent a set of future states.

Definition 3. A boolean predicate F is said to be stronger than another boolean predicate G if F → G, i.e. every F -state is also a G-state.

Definition 4. A literal is a boolean variable or the negation of a boolean variable. A disjunction of literals is called a clause while a conjunction of literals is called a cube. A formula is said to be in Conjunctive Normal Form (CNF) if it is a conjunction of clauses.

Definition 5. Given a transition system S = hM, I, T i, if an assignment s, t′ satisfies the transition relation T (i.e. if s, t′ |= T ) then s is said to be a predecessor of t and t is said to be a successor of s. A sequence of states s0, s1, . . . , sn is said to be a path in S if every couple of adjacent states si, si+1, i ≤ 0 < n satisfies the transition relation (i.e. if si, s′i+1 |= T ). Definition 6. Given a transition system S = hM, I, T i, a state s is said to be reachable in S if there exists a path s0, s1, . . . , s, such that s0 is an initial state (i.e. s0 |= I). We denote with Rn(S) the set of states that are reachable in S in at most n steps. We denote with R(S) the overall set of states that are reachable in S. Note that R(S) = Si≥0 Ri(S).

Definition 7. Given a transition system S = hM, I, T i and a boolean predicate P over M (called A safety property), the invariant verification problem is the problem of determining if P holds for every reachable state in S: ∀s ∈ R(S) : s |= P . An algorithm used to solve the invariant verification problem is called an invariant verificatio algorithm.

Definition 8. Given a transition system S = hM, I, T i, a boolean predicate F over M is called an inductive invariant for S if the following two conditions hold: • Base case: I → F • Inductive case: F ∧ T → F ′ A boolean predicate F over M is called an inductive invariant for S relative to another boolean predicate G if the following two conditions hold: • Base case: I → F • Relative inductive case: G ∧ F ∧ T → F ′ Lemma 1. Given a transition system S = hM, I, T i, an inductive invariant F for S is an over-approximation to the set of reachable states R(S).

Definition 9. Given a transition system S = hM, I, T i and a boolean predicate P over M , an inductive strengthening of P for S is an inductive invariant F for S such that F is stronger than P .

Lemma 2. Given a transition system S = hM, I, T i and a boolean predicate P over M , if an inductive strengthening of P can be found, then the property P holds for every reachable state of S. The invariant verification problem can be seen as the problem to find an inductive strengthening of P for S. B. IC3

Given a transition system S = hM, I, T i and a safety property P over M, IC3 aims to find an inductive strengthening of P for S. For this purpose, it maintains a sequence of formulas Fk = F0, F1, . . . Fk such that, for every 0 ≤ i < k, Fi is an over-approximation of the set of states reachable in at most i steps in S. Each of these over-approximations is called a time frame and is represented by a set of clauses, denoted by clauses(Fi). The sequence of time frames Fk is called trace and is maintained by IC3 in such a way that the following conditions hold throughout the algorithm: ( 1 ) F0 = I ( 2 ) Fi → Fi+1, for all 0 ≤ i < k ( 3 ) Fi ∧ T → Fi′+1, for all 0 ≤ i < k ( 4 ) Fi → P , for all 0 ≤ i < k

Condition ( 1 ) states that the first time frame of the trace is special and is simply assigned to the set of initial states of S. The remaining conditions, claim that for every time frame Fi but the last one: ( 2 ) every Fi-state is also a Fi+1-state, ( 3 ) every successor of an Fi-state is an Fi+1-state and ( 4 ) every Fi-state is safe. Condition ( 2 ) is maintained syntactically, enforcing the condition (2’) clauses(Fi+1) ⊆ clauses(Fi). Lemma 3. Let S = hM, I, T i be a transition system, Fk = F0, F1, . . . Fk a sequence of boolean formulas over M and let conditions ( 1-3 ) hold for Fk. Then each Fi, with 0 ≤ i < k, is an over-approximation to the set of states reachable within i steps in S.

Lemma 4. Let S = hM, I, T i be a transition system, P a safety property over M, Fk = F0, F1, . . . Fk a sequence of boolean formulas over M and let conditions ( 1-4 ) hold for Fk. Then P is satisfied up to k − 1 steps in S (i.e. there doesn’t exist any counter-example to P of length less or equal than k − 1).

The main procedure of IC3 is described in Algorithm 1 and is composed of two nested iterations. Major iterations (lines 316) try to prove that P is satisfied up to k steps in S, for increasing values of k. To prove so, in minor iterations (lines 49), IC3 refines the trace Fk computed so far, by adding new relative inductive clauses to some of its time frames. The algorithm iterates until either (i) an inductive strengthening of the property is produced (line 4), or (ii) a counter-example to the property is found (line 7).

Input: S = hM, I, T i ; P (M) Output: SUCCESS or FAIL(σ), with σ counter-example 1: k ← 0 2: Fk ← I 3: repeat 4: while ∃t : t |= Fk ∧ ¬P do 5: s ← Extend(t) 6: if BlockCube(s, Q, Fk) = FAIL(σ) then 7: return FAIL(σ) 8: end if 9: end while 10: Fk+1 ← ∅ 11: k ← k + 1 12: Fk ← Propagate(Fk) 13: if Fi = Fi+1 for some 0 ≤ i < k then 14: return SUCCESS 15: end if 16: until forever

Algorithm 1. IC3(S, P )

At major iteration k, the algorithm has computed a trace Fk such that conditions ( 1-4 ) hold. From Lemma 4, it follows that P is satisfied up to k − 1 steps in S. IC3 then tries to prove that P is satisfied up to k steps as well. This is done by enumerating Fk-states that violate P and then trying to block them in Fk.

Definition 10. Blocking a state (or, more generally, a cube) s in a time frame Fk means proving s unreachable within k steps in S, and consequently refine Fk to exclude it.

To enumerate each state of Fk that violates P (line 4), the algorithm poses SAT queries to the underlying SAT solving framework in the following form:

SAT ?(Fk ∧ ¬P ) (Q1) If Q1 is SAT, a bad state t in Fk can be extracted from the satisfying assignment. That state must be blocked in Fk. To increase performance of the algorithm, as suggested in [7], the bad state t generated this way is first (possibly) extended to a bad cube s. This is done by means of the Extend(t) procedure (line 5), not reported here, that employs ternary simulation to remove some literals from t. The resulting cube s includes t and violates the property P , it is thus a bad cube. The algorithm then tries to block the whole bad cube s rather than t. It is showed in [7] that extending bad states into bad cubes before blocking them dramatically improves IC3 performance.

Once a bad cube s is found, it is blocked in Fk calling the BlockCube(s, Q, Fk) procedure (line 6). This procedure is described in Algorithm 2.

Input: s: bad cube in Fk; Q: priority queue; Fk: trace Output: SUCCESS or FAIL(σ), with σ counter-example 1: add a proof-obligation (s, k) to the queue Q 2: while Q is not empty do 3: extract (s, j) with minimal j from Q 4: if j > k or t 6|= Fj then continue; 5: if j = 0 then return FAIL(σ) 6: if ∃t, v′ : t, v′ |= Fj−1 ∧ T ∧ ¬s ∧ s′ then 7: p ← Extend(t) 8: add (p, j − 1) and (s, j) to Q 9: else 10: c ← Generalize(j, s, Fk) 11: Fi ← Fi ∪ c for 0 < i ≤ j 12: add (j + 1, c) to Q 13: end if 14: end while 15: return SUCCESS

Algorithm 2. BlockCube(s, Q, Fk)

Otherwise, if Q1 is UNSAT, every bad state of Fk has been blocked so far, conditions ( 1-4 ) hold for k + 1 and IC3 can safely move to the next major iteration, trying to prove that P is satisfied up to k + 1 steps. Before moving to the next iteration, a new empty time frame Fk+1 is created (line 10). Initially, clauses(Fk+1) = ∅ and such time frame represent the entire state space, i.e. Fk+1 = Space(S). Note that Space(S) is a valid over-approximation to the set of states reachable within k + 1 steps in S. Then a phase called propagation takes place (line 12). The procedure Propagate(Fk) (Algorithm 4), which is discussed later, handles that phase. During propagation, IC3 tries to refine every time frame Fi, with i < 0 ≤ k, by checking if some clauses of one time frame can be pushed forward to the following time frame. Possibly, propagation refines the outmost timeframe Fk so that Fk ⊂ Space(S). Propagation can lead to two adjacent time frames becoming equivalent. If that happens, the algorithm has found an inductive strengthening of P S (equal to those time frames), so the property P holds for for every reachable state of S and IC3 return success (line 13).

The procedure BlockCube(s, Q, Fk) (Algorithm 2) is responsible for refining the trace Fk in order to block a bad cube s found in Fk. To preserve condition ( 3 ), prior to blocking a cube in a certain time frame, IC3 has to recursively block its predecessor states in the preceding time frames. To keep track of the states (or cubes) that must be blocked in certain time frames, IC3 uses the formalism of proof-obligations. Definition 11. Given a cube s and a time frame Fj , a proofobligation is a couple (s, j) formalizing the fact that s must be blocked in Fj .

Given a proof obligation (s, j), the cube s can either represent a set of bad states or a set of states that can all reach a bad state in one or more transitions. The number j indicates the position in the trace where s must be proved unreachable, or else the property fails.

Definition 12. A proof obligation (s, j) is said to be discharged when s becomes blocked in Fj .

IC3 maintains a priority queue Q of proof-obligations. During the blocking of a cube, proof-obligations (s, j) are extracted from Q and discharged for increasing values of j, ensuring that every predecessor of a bad cube s will be blocked in Fj (j < k) before s will be blocked in Fk. In the BlockCube(s, Q, Fk) procedure, first a proof-obligation encoding the fact that s must be blocked in Fk is added to Q (line 1). Then proof-obligations are iteratively extracted from the queue and discharged (lines 2-14).

Prior to discharge the proof-obligation (s, j) extracted, IC3 checks if that proof-obligation still needs to be discharged. It is in fact possible for an enqueued proof-obligation to become discharged as a result of the discharging of some previously extracted proof-obligations. To perform this check, the following SAT query is posed (line 4):

SAT ?(Fj ∧ s) (Q2)

If the result of Q2 is SAT, then the cube s is still in Fj and (s, j) still needs to be discharged. Otherwise, s has already been blocked in Fj and the procedure can move on to the next iteration.

If the proof-obligation (s, j) still needs to be discharged, then IC3 checks if the time frame identified by j is the initial time frame (line 5). If so, the states represented by s are initial states that can reach a violation of the property P . A counterexample σ to P can be constructed going up the chain of proof-obligations that led to (s, 0). In that case, the procedure terminates with failure returning the counter-example found.

To discharge a proof-obligation (s, j), i.e. to block a cube s in Fj , IC3 tries to derive a clause c such that c ⊆ ¬s and c is inductive relative to Fj−1. This is done by posing the following SAT query (line 6):

SAT ?(Fj−1 ∧ ¬s ∧ T ∧ s′) (Q3)

If Q3 is UNSAT (lines 10-12), then the clause ¬s is inductive relative to Fj−1 and can be used to refine Fj , ruling out s. To pursue a stronger refinement of Fj , the inductive clause found undergoes a process called inductive generalization (line 10) prior to being added to Fi. Inductive generalization is carried out by the procedure Generalize(j, s, Fk) (Algorithm 3), which tries to minimize the number of literals in clause c = ¬s while maintaining its inductiveness relative to Fj−1, in order to preserve condition ( 2 ). The resulting clause is added not only to Fj , but also to every time frame Fi, 0 < i < j (line 11). Doing so discharges the proofobligation (s, j). In fact, this refinement rule out s from every Fi with 0 < i ≤ j. Since the sets Fi with i > j are larger than Fj , s may still be present in one of them and (s, j + 1) may become a new proof-obligation. To address this issue, Algorithm 2 adds (s, j + 1) to the priority queue (line 12).

Otherwise, if Q3 is SAT (lines 7-8), a predecessor t of s in Fj−1 can be extracted from the satisfying assignment. To preserve condition ( 3 ), before blocking a cube s in a time frame Fj , every predecessor of s must be blocked in Fj−1. So, the predecessor t is extended with ternary simulation (line 7) into the cube p, and then both proof-obligations (p, j − 1) and (s, j) are added to the queue (line 8).

Input: j: time frame index; s: cube such that ¬s is inductive relative to Fj−1; Fk: trace Output: c : a sub-clause of ¬s 1: c ← ¬s 2: for all literals l in c do 3: try ← the clause obtained by deleting l from c 4: if 6 ∃t, v′ : t, v′ |= Fj−1 ∧ T ∧ try ∧ ¬try′ then 5: if 6 ∃t |= I ∧ ¬try then 6: c ← try 7: end if 8: end if 9: end for 10: return c

Algorithm 3. Generalize(j, s, Fk)

The Generalize(j, s, Fk) procedure (Algorithm 3) tries to remove literals from ¬s while keeping it relatively inductive to Fj−1. To do so, a clause c intialized with ¬s (line 1) is used to represent the current minimal inductive clause. For every literal of c, the clause try obtained by dropping that literal from c (line 3), is checked for inductiveness relative to Fj−1 by means of the following SAT query (line 4):

SAT ?(Fj−1 ∧ try ∧ T ∧ ¬try′) (Q4)

If Q4 is UNSAT, the iductive case for the reduced formula still holds. Since dropping literals from a relatively inductive clause can break both the inductive case and the base case, the latter must be explicilty checked too for the reduced clause try (line 5). This is done by posing the following SAT query: SAT ?(I ∧ ¬try) (Q5)

If both the inductive case and the base case hold for the reduced clause try, the currently minimal inductive clause c is updated with try (line 6).

The Propagate(Fk) procedure (Algorithm 4) handles the propagation phase. For every clause c of each time frame Fj , with 0 ≤ j < k − 1, the procedure checks if c can be pushed

Algorithm 4. Propagate(Fk)

forward to Fj+1 (line 3). To do so, it poses the following SAT query:

SAT ?(Fj ∧ T ∧ c′) (Q6)

If the result of Q6 is SAT, then it is safe, with respect to condition ( 3 ), to push clause c forward to Fi+1. Otherwise, c can’t be pushed and the procedure iterates to the next clause.

C. Related works

In [2], Aaron Bradley outlined the opportunity for SAT and SMT researchers to directly address the problem of improving IC3’s performance by exploiting the peculiar character of the SAT queries it poses. A description of the IC3 algorithm, specifically addressing implementation issues, is given in [7].

III. SAT SOLVING IN IC3

SAT queries posed by IC3 have some specific characteristics: • Small-sized formulas: they employ no more than a single instance of the transition relation; • Large number of calls: reasoning during the verification process is highly localized and takes place at relativelysmall-cubes granularity; • Separated contexts: each SAT query is relative to a single time frame; • Related calls: subsequent calls may expose a certain correlation (for instance, inductive generalization calls take place on progressively reduced formulas).

We performed an analysis of the implementation of IC3 within the academic model checking suite PdTRAV, closely following the description of IC3 given in [7] (there called PDR: Property Directed Reachability). The experimental analysis lead us to identify six different types of SAT queries that the algorithm poses during its execution. These queries are the ones already outlined in Section II-B. The type of these queries is reported in Table I.

For each of the queries identified, we have measured the average number of calls and the average solving time. These statistics are reported in Table II. The results were collected by running PdTRAV’s implementation of IC3 on the complete set of 310 single property benchmarks of the HWMCC’12, using time and memory limits of 900 s and 2 GB, respectively.

Such statistics can be summarized as follows:

Query Type Target Intersection

Blocked Cube

Relative Induction Inductive Generalization

Base of Induction Clause Propagation

Query SAT ?(Fk ∧ ¬P )

SAT ?(Fi ∧ s) SAT ?(Fi ∧ ¬s ∧ T ∧ s′) SAT ?(Fi ∧ c ∧ T ∧ ¬c′)

SAT ?(I ∧ ¬c)

SAT ?(Fi ∧ T ∧ ¬c′) • SAT calls involved in inductive generalization are by far the most frequent ones. These are in fact the calls that appears at the finest granularity. In the worst case scenario, one call is posed for every literal of every inductive clause found. • Inductive generalization and propagation checks are the most expensive queries in terms of average SAT solving time required. • Target intersection calls are very infrequent and don’t take much time to be solved. • Blocked cube and relative induction checks are close in the number of calls and solving time.

Query

Q1 Q2 Q3 Q4 Q5 Q6

Calls [Number]

483 27891 31172 142327 147248 61114

Subsequent SAT calls in IC3 are often applied to highly different formulas. In the general case, two subsequent calls can in fact be posed in the context of different time frames, thus involving different sets of clauses. In addition, one of them may require the use of the transition relation, while the other may not, and each query can involve the use of temporary clauses/cubes that are needed only to respond to that particular query (e.g. the candidate inductive clause used to check relative inductiveness during cube blocking or inductive generalization).

In the general case, whenever a new query is posed by IC3 to the underlying SAT solving framework, the formula currently loaded in the solver must be modified to accomodate the new query. For this reason, IC3 requires the use of SAT solvers that expose an incremental SAT interface. An incremental SAT interface for IC3 must support the following features: • Pushing and popping clauses to or from the formula. • Specifying literal assumptions.

Many state-of-the-art SAT solvers, like MiniSAT [10], feature an incremental interface capable of pushing new clauses into the formula and allowing literal assumptions. Removing clauses from the current formula is a more difficult task since one have to remove not only the single clause specified, but also every learned clause that has been derived from it. Although solvers such as zchaff [11] directly support clause removal, the majority of the state-of-the-art SAT solvers feature an interface like the one of MiniSAT, in which clause removal can only be simulated. This is done through the use of literal assumptions and the introduction of auxiliary variables known as activation variables, as described in [12]. In such solvers, clauses aren’t actually removed from the formula but only made redundant for the purpose of determining the satisfiability of the rest of the formula. Since such clauses are not removed from the formula, they still participate in the Boolean Constraint Propagation (BCP) and, thus, degrade the overall SAT solving performance. In order to minimize this degradation, each solver employed by IC3 must be periodically cleaned up, i.e. emptied and re-loaded with only relevant clauses. In this work we assume the use of a SAT solver exposing an incremental interface similar to the one of MiniSAT.

Once an efficient incremental SAT solving tool has been chosen, any implementation of IC3 must face the problem of deciding how to integrate the top-level algorithm with the underlying SAT solving layer. Such problem can be divided into the following three sub-problems: • SAT solvers allocation: decide how many different SAT solvers to employ and how to distribute workload among them. • SAT solvers loading: decide which clauses must be loaded in each solver to make them capable of responding correctly to the SAT queries posed by IC3. • SAT solvers clean up: decide when and how often the algorithm must clean up each solver, in order to avoid performance degradation.

V. SAT SOLVERS ALLOCATION

We assume in this work the use of multiple SAT solvers, one for each different time frame. Using multiple solvers, we observed that performance is highly related to: • Solver cleanup frequency: cleaning up the solver means removal of incrementally loaded problem clauses and learned clauses • Clause loading strategy: on-demand loading of transition relation clauses based on topological dependency

A. Specialized solvers

From the statistical results of reported in Table II it’s easy to see that inductive generalization and clause propagation queries are by far the most expensive ones in terms of average SAT solving time. Inductive generalization queries, in addition of being expensive, are also the most frequent type of query posed.

The reason why inductive generalization calls are so expensive can be due to the fact that during the inductive generalization of a clause, at every iteration a slightly changing formula is queried for a satisfying assignment in increasingly larger subspaces. Given two subsequent queries in inductive generalization, in fact, it can be noticed that their formulas can differ only for one literal of the present state clause try and one literal of the next state cube ¬try. As the subspace becomes larger solving times for those queries increases. The average expensiveness of clause propagation calls can be intuitively motivated by noticing that they are posed one time for every clause of every time frame. The innermost time frames are the ones with the largest number of clauses, and thus require the largest number of propagation calls. Unfortunately, given the high number of clauses in those time frames, the CNF formulas upon which such calls act are highly constrained and usually harder to solve. So during propagation there are, in general, more hard queries than simple queries, making the average SAT solving time for those queries high.

In an attempt to reduce the burden of each time frame’s SAT solver, we have experimented the use of specialized solvers for handling such queries. For every time frame, a second solver is instantiated and used to respond a particular type of query (Q4 or Q6). Table III summarize the results of such experiment.

VI. SOLVER LOADING

To minimize the size of the formula to be loaded into each solver, a common approach is to load, for every SAT call that queries the dynamic behavior of the system, only the necessary part of the transition relation.

It is easy to observe that every SAT call that uses the transition relation involves a constraint on the next state variables of the system in the form of a cube c′. Such queries ask if there is a state of the system satisfying some constraints on the present state, which can reach in one transition states represented by c′. Since c′ is a cube, the desired state must have a successor p such that p correspond to c′ for the value of every variable of c′. It’s easy to see that the only present state variables that are relevant in determining if such a state exists, are those in the structural support of the next state variables of c′. It follows that the only portions of the transition relation required to answer such queries are the logic cones of the next state variables of c′.

Such loading strategy, known as lazy loading of transition relation, is commonly employed in various implementations of IC3, as the ones of PdTRAV and ABC. We observed in average 50% reduction in the size of the CNF formula for the transition relation using lazy loading of transition relation.

We noticed that, for these queries, the portions of the transition relation loaded can be further minimized employing a CNF encoding technique, called Plaisted-Greenbaum CNF encoding [13] (henceforth simply called PG encoding). The AIG representation of the transition relation together with the assumptions specified by the next state cube c′ can be viewed as a constrained boolean circuit [14], i.e. a boolean circuit in which some of the outputs are constrained to assume certain values. The Plaisted-Greenbaum encoding is a special These heuristics clean up each solver as soon as the computed encoding that can be applied in the translation of a constrained estimate meets some, often static, threshold. boolean circuit into a CNF formula. Our experiments with IC3 prove that the frequency of

Below we give an informal description of the PG encoding. the cleanups play a crucial role in determining the overall For a complete description refer to [13] or [14]. verification performance. We explored the use of new cleanup

Given an AIG representation of a circuit, a CNF encoding heuristics based on more precise measures of the number of first subdivides that circuit into a set of functional blocks, i.e. irrelevant clauses loaded and able to exploit correlation among gates or group of connected gates representing certain boolean different SAT calls to dynamically adjust the frequency of functions, and introduces a new CNF variable a for each of cleanups. these blocks. For each functional block representing a function For SAT calls in IC3, notice that there are two sources of f on the input variables x1, x2, . . . , xn, a set of clauses is irrelevant clauses loaded in a solver: derived translating into CNF the formula: a ↔ f (x1, x2, . . . , xn)

( 1 ) The final CNF formula is obtained by the conjunction of these sets of clauses. Different CNF encodings differ in the way the gates are grouped together to form functional blocks and in the way these blocks are translated into clauses. The idea of PG encoding is to start from a base CNF encoding, and then use both output assumptions and topological information of the circuit to get rid of some of the derived clauses, while still producing an equi-satisfiable CNF formula. Based on the output constraints and the number of negations that can be found in every path from a gate to an output node, it can be shown that, for some gates of the circuit, an equi-satisfiable encoding can be produced by only translating one of the two sides of the bi-implication ( 1 ). The CNF formula produced by PG encoding will be a subset of the one produced by the traditional encoding.

PG encoding proves to be effective in reducing the size of loaded formulas, but it is not certain whether it is more efficient for SAT solving, since it may have worst propagation behaviour [15].

We observed a 10-20% reduction in the size of the CNF formula for the transition relation.

VII. SOLVERS CLEAN UP

A natural question arises regarding how frequently and at what conditions SAT solvers cleanups should be scheduled. Cleaning up a SAT solver, in fact, introduces a certain overhead. This is because: • Relevant clauses for the current query must be reloaded. • Relevant inferences previously derived must be derived again from those clauses.

A heuristic cleanup strategy is needed in order to achieve a trade-off between the overhead introduced and the slowdown in BCP avoided. The purpose of that heuristic is to determine when the number of irrelevant clauses (w.r.t. the current query) loaded in a solver becomes large enough to justify a cleanup. To do so, a heuristic measure representing an estimate of the number of currently loaded irrelevant clauses is compared to a certain threshold. If that measure exceeds the threshold, then the solver is cleaned up.

Clean up heuristics currently used in state-of-the-art tools, like ABC and PdTRAV, rely on loose estimates of the size of irrelevant portions of the formula loaded into each solver. 1) Deactivated clauses loaded for previous inductive checks (Q3 and Q4 queries). 2) Portions of logic cones loaded for previous calls querying the dynamic behavour of the system.

Cleanup heuristics commonly used, such as the ones used in baseline versions of ABC and PdTRAV, typically take into account only the number of deactivated clauses in the solver to compute an estimate of the number of irrelevant clauses. We investigated the use of a new heuristic measure taking into account the second source of irrelevant clauses, i.e. irrelevant portions of previously loaded cones.

Every time a new query requires to load a new portion of the transition relation, to compute a measure of the number of the irrelevant transition relation’s clauses, the following quantities are computed (assuming that c′ is the cube constraining the next state variables for that query): • A: the number of transition relation clauses already loaded into the solver (before loading the logic cones required by the current query); • S(c′): the size (number of clauses) of the logic cones required for solving the current query; • L(c′): the number of clauses in the required logic cones to be loaded into the solver (equal to the size of the required logic cone minus the number of clauses that such cone shares with previously loaded cones);

A measure of the number of irrelevant transition relation’s clauses loaded for c′, denoted by u(c′), can be computed as follows: u(c′) = A − (S(c′) − L(c′)) ( 2 )

Such a heuristic measure, divided by the number of clauses loaded into the solver, indicates the percentage of irrelevant clauses loaded in the solver w.r.t the current query. In Section VIII we investigate the use new cleanup strategies based on this measure. In order to take into account correlation between subsequent SAT calls, we consider such measure averaged on a time window of the last SAT calls.

VIII. EXPERIMENTAL RESULTS We have compared different cleanup and clause loading heuristics in both PdTRAV and ABC. In this section, we briefly summarize the obtained results. A. PG encoding

A first set of experiments was done on the full set of 310 HWMCC’12 benchmarks [16], with a lower timeout, of 900 seconds, in order to evaluate the use of PG encoding. Results were controversial. A run in PdTRAV, with 900 seconds timeout, showed a reduction in the number of solved instances from 79 to 63 (3 of which previously unsolved). The percentage reduction of loaded transition relation clauses was 21.32%.

A similar run on ABC, showed a more stable comparison, from 80 to 79 solved instances (3 of which previously unsolved). So a first conclusion is that, PG encoding was not able to win over the standard encoding, suggesting it can indeed suffer of worst propagation behaviour. Nonetheless, it was very interesting to observe that the overall number of solved problems grew from 79 to 82 in PdTRAV and from 80 to 85 in ABC.

Different results between the two tools in this experimentation could be partially due to different light-weight preprocessing done by default within them. We started a more detailed comparison, in order to better understand the partially controversial results.

B. Experiments with PdTRAV

We then focused on a subset of 70 selected circuits, for which preprocessing was done off-line, and the tools were run on the same problem instances. In the following tables, the P 0 rows always represent the default setting of PdTRAV. We report number of solver instances (out of 70) and average execution time (time limit 900 seconds). Column labeled N ew, when present, shows the number of instances not solved by P 0.

Configuration P0 (baseline) P1 (Q4 spec.) P2 (Q6 spec.)

Table III shows two different implementations with specialized solvers (so two solvers per time frame): in the first one (P 1) the second solver handles generalization calls while in the second one (P 2) the it handles propagation calls. Overall, we see a little improvement by P 1, with two more instances solved w.r.t P 0, including 4 instances not solved by P 0.

The next two tables show an evaluation of different solver cleanup heuristics. Let a be the number of activation variables in the solver, |vars| the total number of variables in the solver, |clauses| the total number of clauses in the solver, u(c′) the heuristic measure discussed in Section VII and W (x, n) a sliding window containing the values of x for the last n SAT calls. The heuristics compared are the following: • H1: a > 300; • H2: a > 21 |vars|; • H3: avg W > 0.5 • H4: H2 || H3;

Heuristic H1 is the cleanup heuristic used in the baseline versions of both PdTRAV and ABC. The static threshold of 300 activation literals was determined experimentally. Heuristic H2 cleans up each solver as soon as half of its variables are used for activation literals. It can be seen as a first simple attempt to adopt a dynamic cleanup threshold. Heuristic H3 is the first heuristic proposed to take into account the second source of irrelevant clauses described in Section VII, i.e. irrelevant portions of previously loaded cones. In H3 a solver is cleaned up as soon as the percentage of irrelevant transition relation’s clauses loaded, averaged on a window of the last 1000 calls, reaches 50%. Heuristic H4 takes into account both sources of irrelevant clauses, combining H2 and H3.

Table IV shows a comparison among H1 (row P 0), H2, H3, and H4, in rows P 3, P 4, and P 5, respectively. No PG encoding, nor specialized solvers were used. Heuristic H1, that employs a static threshold, was able to solve the largest number of instances. Among dynamic threshold heuristics, both H2 and H3 take into account a single source of irrelevant loaded clauses, respectively the deactivated clauses in H2 and the unused portions of transition relation in H3. Data clearly indicates that H3 has worse performance. This suggests that deactivated clauses play a bigger role in degrading SAT solvers’ performance than irrelevant transition relation’s clauses do. Taking into account only the latter source of irrelevant clauses it’s not sufficient. It can be noticed that heuristic H4, that takes into account both sources, outperforms both H2 and H3. This proves that considering irrelevant clauses arising from previoulsy loaded cones in addition to deactivated clauses can be beneficial. In addition Table IV shows that dynamic heuristics were able solve some instances that can’t be solved by the static heuristic H1 and viceversa. In terms of overall number of solved instances, the static heuristic H1 outperformes our best dynamic heuristic H4. This can be due to the fact that the threshold parameter of H1 results from extensive experimentation while to determine the parameters of H4 (window size and percentage thresholds) a narrower experimentation has been performed.

We then focused on H4, and benchmarked it in different setups. Results are showed in table V.

Here PG encoding was used in configurations P 6 and P 8, single solver per time frame in P 6, additional specialized solver for generalization in P 7 and P 8. We see that the specialized solver configuration appears to perform worse with PG encoding. Also, adding a specialized solver for generalization to the dynamic heuristic H4 doesn’t seem to be as effective as it is when using the static heuristic H1. 3 1 1 This can be due to the fact that irrelevant clauses arising from generalization calls are not taken into account to schedule the clean up of the main solver that, in turn, is cleaned up less frequently.

C. Experiments with ABC

The 70 selected circuits were also benchmarked with ABC, with same pre-processing used in PdTRAV: Table VI report in row A0 the default setting of ABC. Row A1 shows the variant with PG encoding, row A2 shows a run without dynamic TR clause loading. Row A3 finally shows a different period for solver cleanup (1000 variables instead of 300).

Configuration

A0 A1 A2 A3

Overall, results show little variance among different settings, which could suggest lesser sensitivity of ABC to different tunings. Nonetheless, a further experimentation with ABC on the full set of 310 benchmarks (with 300 seconds time limit), showed a 14% improvement in the number of solved problems (from 71 to 81), which indicate a potential improvement for a portfolio-based tool, able to concurrently exploit multiple settings.

IX. CONCLUSIONS

The paper shows a detailed analysis and characterization of SAT queries posed by IC3. We also discuss new ideas for solver allocation/loading/restarting. The experimental evaluation done on two different state-of-the-art academic tools, shows lights and shadows, as no breakthrough or clear winner emerges from the new ideas.

PG encoding showed to be less effective than expected. This is probably because the benefits introduced in terms of loaded formula size will be overwhelmed by the supposed worst propagation behaviour of that formula.

The use of specialized solvers seems to be effective when a static cleanup heuristic is used, less effective when combined with PG encoding or a dynamic heuristic.

Our experiments showed that, when a dynamic cleanup heuristic is used, IC3’s performance can be improved by taking into account both deactivated clauses and irrelevant portions of previously loaded cones. Even if a parameter configuration for H4 that is able to outperform the currently used, well-rounded static heuristic H1 hasn’t been found yet, we believe that a more extensive experimentation could lead to better results.

Nonetheless, the results are more interesting if we consider them from the standpoint of a portfolio-based tool, since the overall coverage (by the union of all setups) is definitely higher.

So we believe that more effort in implementation, experimentation, and detailed analysis of case sudies, needs to be done. We also deem that this work contributes to the discussion of new developments in the research related to IC3.

[1]

A. R.

Bradley , “ SAT-based model checking without unrolling ,” in VMCAI , Austin, Texas, Jan. 2011 , pp. 70 - 87 .

[2]

A. R.

Bradley , “ Understanding IC3,” in SAT, ser . Lecture Notes in Computer Science,

Cimatti and

Sebastiani , Eds., vol. 7317 . Springer, 2012 , pp. 1 - 14 .

[3]

Biere ,

Cimatti ,

E. M.

Clarke ,

Fujita , and

Zhu , “ Symbolic Model Checking using SAT procedures instead of BDDs,” in Proc. 36th Design Automation Conf . New Orleans, Louisiana: IEEE Computer Society, Jun. 1999 , pp. 317 - 320 .

[4]

Sheeran ,

Singh , and G. Sta˚lmarck, “Checking Safety Properties Using Induction and a SAT Solver,” in Proc. Formal Methods in Computer-Aided Design, ser . LNCS,

W. A.

Hunt and S. D. Johnson, Eds., vol. 1954 . Austin, Texas, USA: Springer, Nov. 2000 , pp. 108 - 125 .

[5]

Bjesse and

Claessen , “ SAT-Based Verification without State Space Traversal,” in Proc. Formal Methods in Computer-Aided Design, ser . LNCS , vol. 1954 . Austin, TX, USA: Springer, 2000 .

[6] K. L. McMillan , “ Interpolation and SAT-Based Model Checking,” in Proc. Computer Aided Verification, ser . LNCS , vol. 2725 . Boulder , CO, USA: Springer, 2003 , pp. 1 - 13 .

[7]

´n, A. Mishchenko, and

R. K.

Brayton , “ Efficient implementation of property directed reachability ,” in FMCAD, 2011 , pp. 125 - 134 .

[8]

Mishchenko , “ ABC: A System for Sequential Synthesis and Verification, http://www.eecs.berkeley.edu/∼alanmi/abc/,” 2005 .

[9]

Cabodi ,

Nocco , and

Quer , “ Benchmarking a model checker for algorithmic improvements and tuning for performance,” Formal Methods in System Design , vol. 39 , no. 2 , pp. 205 - 227 , 2011 .

[10]

Ee ´n and N. S¨orensson, “The Minisat SAT Solver , http://minisat.se,” Apr. 2009 .

[11]

Moskewicz ,

Madigan ,

Zhao ,

Zhang , and S. Malik, “Chaff: Engineering an Efficient SAT Solver,” in Proc. 38th Design Automation Conf. Las Vegas , Nevada: IEEE Computer Society, Jun. 2001 .

[12]

´n and N. So¨rensson, “Temporal induction by incremental SAT solving,”

Electr. Notes Theor. Comput. Sci. , vol. 89 , no. 4 , pp. 543 - 560 , 2003 .

[13]

D. A.

Plaisted and

Greenbaum , “ A structure-preserving clause form translation,”

Symb . Comput., vol. 2 , no. 3 , pp. 293 - 304 , 1986 .

[14]

Ja ¨rvisalo, A . Biere, and

Heule , “ Simulating circuit-level simplifications on CNF,”

Autom . Reasoning , vol. 49 , no. 4 , pp. 583 - 619 , 2012 .

[15]

Ee ´n, “Practical SAT: a tutorial on applied satisfiability solving,” Slides of invited talk at FMCAD, 2007 .

[16]

Biere and

Jussila , “ The Model Checking Competition Web Page , http://fmv.jku.at/hwmcc.”