=Paper= {{Paper |id=None |storemode=property |title=Trading-off Incrementality and Dynamic Restart of Multiple Solvers in IC3 |pdfUrl=https://ceur-ws.org/Vol-1130/paper_5.pdf |volume=Vol-1130 |dblpUrl=https://dblp.org/rec/conf/fmcad/PalenaCM13 }} ==Trading-off Incrementality and Dynamic Restart of Multiple Solvers in IC3== https://ceur-ws.org/Vol-1130/paper_5.pdf

Trading-off incrementality and dynamic restart of
multiple solvers in IC3
G. Cabodi (*), A. Mishchenko (**), M. Palena (*)
(*) Dip. di Automatica ed Informatica
Politecnico di Torino - Torino, Italy
(**) Dept. of EECS, University of California, Berkeley, CA, USA

Abstract—This paper1 addresses the problem of SAT solver process by constantly refining a set of over-approximations
performance in IC3, one of the major recent breakthroughs to forward reachable states with new inductive clauses; at the
in Model Checking algorithms. Unlike other Bounded and bottom level, a SAT solving framework is exploited by the
Unbounded Model Checking algorithms, IC3 is characterized
by numerous SAT solver queries on small sets of problem top-level algorithm to respond to queries about the system.
clauses. Besides algorithmic issues, the above scenario poses As shown in [7], these two layers can be separated by means
serious performance challenges for SAT solver configuration of a clean interface.
and tuning. As well known in other application fields, finding Performance of IC3 turns out to be both highly sensitive
a good compromise between learning and overhead is key to
performance. We address solver cleanup and restart heuristics,
to the various internal behaviours of SAT solvers, and strictly
as well as clause database minimality, based on on-demand clause dependent on the way the top-level algorithm is integrated with
loading: transition relation clauses are loaded in solver based the underlying SAT solving framework.
on structural dependency and phase analysis. We also compare The peculiar characteristics exposed by the SAT queries of
different solutions for multiple specialized solvers, and we provide
IC3 can thus be exploited to improve the overall performance
an experimental evaluation on benchmarks from the HWMCC
suite. Though not finding a clear winner, the work outlines several of the algorithm in two different manners:
potential improvements for a portfolio-based verification tool 1) Tuning the internal behaviours of the particular SAT
with multiple engines and tunings.
solver employed to better fit IC3 needs.
I. I NTRODUCTION 2) Defining better strategies to manage the SAT solving
work required by IC3.
IC3 [1] is a SAT-based invariant verification algorithm
for bit-level Unbounded Model Checking (UMC). Since its In this paper we address this second issue, proposing and
introduction, IC3 has immediately generated strong interest, comparing different implementation strategies for handling
and is now considered one of the major recent breakthroughs SAT queries in IC3. The aim of this paper is to identify the
in Model Checking. IC3 proved to be impressively effective on most efficient way to manage SAT solving in IC3. To achieve
solving industrial verification problems. Our experience with this goal we experimentally compare a number of different
the algorithm shows that IC3 is the single invariant verification implementation strategies over a selected set of benchmarks
algorithm capable of solving the largest number of instances from the recent HWMCC.
among the benchmarks of the last editions of the Hardware The experimental work has been done by two different
Model Checking Competition (HWMCC). research groups, on two different state-of-the-art verification
tools, ABC [8] and PdTRAV [9], that share similar architec-
A. Motivations tural and algorithmnic choices in their implementation of IC3.
IC3 heavily relies on SAT solvers to drive several parts of The focus of this paper is neither on the IC3 algorithm
the verification algorithm: a typical run of IC3 is characterized itself nor on the internal details of the SAT solving procedures
by a huge amount of SAT queries. As stated by Bradley in [2], employed, but rather on the implementation details of the
the queries posed by IC3 to SAT solvers differ significantly integration between IC3 and the underlying SAT solving
in character from those posed by other SAT-based invariant framework.
verification algorithms (such as Bounded Model Checking [3],
k-induction [4] [5] or interpolation [6]). Most notably, SAT
B. Contributions
queries posed by IC3 don’t involve the unrolling of the transi-
tion relation for more than one step and are thus characterized The main contributions of this paper are:
by small-sized formulas.
• A characterization of SAT queries posed by IC3.
IC3 can be thought as composed of two different layers:
• Novel approaches to solver allocation, loading and clean
at the top level, the algorithm itself drives the verification
up in IC3.
1 This work was supported in part by SRC Contracts No. 2012-TJ-2328 and • An experimental evaluation of performance using two
No. 2265.001 verification tools.
C. Outline P holds for every reachable state in S: ∀s ∈ R(S) : s |= P .
First in Section II we introduce the notation used and give An algorithm used to solve the invariant verification problem
some background on IC3. Then, in Section III we present a is called an invariant verificatio algorithm.
systematic characterization of the SAT solving work required Definition 8. Given a transition system S = hM, I, T i, a
by IC3. Section IV introduces the problem of handling SAT boolean predicate F over M is called an inductive invariant
queries posed by IC3 efficiently. Both commonly used and for S if the following two conditions hold:
novel approaches to the allocation, loading and cleaning up of • Base case: I → F
SAT solvers in IC3 are discussed in Sections V, VI and VII • Inductive case: F ∧ T → F
′
respectively. Experimental data comparing these approaches
A boolean predicate F over M is called an inductive invariant
are presented in Section VIII. Finally, in Section IX we draw
for S relative to another boolean predicate G if the following
some conclusions and give summarizing remarks.
two conditions hold:
II. BACKGROUND • Base case: I → F
′
• Relative inductive case: G ∧ F ∧ T → F
A. Notation
Lemma 1. Given a transition system S = hM, I, T i, an
Definition 1. A transition system is the triple S = hM, I, T i
inductive invariant F for S is an over-approximation to the
where M is a set of boolean variables called state variables of
set of reachable states R(S).
the system, I(M ) is a boolean predicate over M representing
the set of initial states of the system and T (M, M ′ ) is a Definition 9. Given a transition system S = hM, I, T i and a
predicate over M, M ′ that represents the transition relation boolean predicate P over M , an inductive strengthening of P
of the system. for S is an inductive invariant F for S such that F is stronger
than P .
Definition 2. A state of the system is represented by a
complete assignment s to the state variables M . A set of Lemma 2. Given a transition system S = hM, I, T i and a
states of the system is represented by a boolean predicate boolean predicate P over M , if an inductive strengthening of
over M . Given a boolean predicate F over M , a complete P can be found, then the property P holds for every reachable
assignement s such that s satisfies F (i.e. s |= F ) represents state of S. The invariant verification problem can be seen as
a state contained in F and is called an F -state. Primed the problem to find an inductive strengthening of P for S.
state variables M ′ are used to represent future states and,
B. IC3
accordingly, a boolean predicate over M ′ represent a set of
future states. Given a transition system S = hM, I, T i and a safety prop-
erty P over M, IC3 aims to find an inductive strengthening of
Definition 3. A boolean predicate F is said to be stronger than P for S. For this purpose, it maintains a sequence of formulas
another boolean predicate G if F → G, i.e. every F -state is Fk = F0 , F1 , . . . Fk such that, for every 0 ≤ i < k, Fi is an
also a G-state. over-approximation of the set of states reachable in at most
Definition 4. A literal is a boolean variable or the negation of i steps in S. Each of these over-approximations is called a
a boolean variable. A disjunction of literals is called a clause time frame and is represented by a set of clauses, denoted by
while a conjunction of literals is called a cube. A formula clauses(Fi ). The sequence of time frames Fk is called trace
is said to be in Conjunctive Normal Form (CNF) if it is a and is maintained by IC3 in such a way that the following
conjunction of clauses. conditions hold throughout the algorithm:
(1) F0 = I
Definition 5. Given a transition system S = hM, I, T i, if an (2) Fi → Fi+1 , for all 0 ≤ i < k
assignment s, t′ satisfies the transition relation T (i.e. if s, t′ |= (3) Fi ∧ T → Fi+1 ′
, for all 0 ≤ i < k
T ) then s is said to be a predecessor of t and t is said to be (4) Fi → P , for all 0 ≤ i < k
a successor of s. A sequence of states s0 , s1 , . . . , sn is said
Condition (1) states that the first time frame of the trace is
to be a path in S if every couple of adjacent states si , si+1 ,
special and is simply assigned to the set of initial states of
i ≤ 0 < n satisfies the transition relation (i.e. if si , s′i+1 |= T ).
S. The remaining conditions, claim that for every time frame
Definition 6. Given a transition system S = hM, I, T i, a state Fi but the last one: (2) every Fi -state is also a Fi+1 -state, (3)
s is said to be reachable in S if there exists a path s0 , s1 , . . . , s, every successor of an Fi -state is an Fi+1 -state and (4) every
such that s0 is an initial state (i.e. s0 |= I). We denote with Fi -state is safe. Condition (2) is maintained syntactically,
Rn (S) the set of states that are reachable in S in at most n enforcing the condition (2’) clauses(Fi+1 ) ⊆ clauses(Fi ).
steps. We denote with R(S) the overall S set of states that are Lemma 3. Let S = hM, I, T i be a transition system,
reachable in S. Note that R(S) = i≥0 Ri (S).
Fk = F0 , F1 , . . . Fk a sequence of boolean formulas over
Definition 7. Given a transition system S = hM, I, T i and a M and let conditions (1-3) hold for Fk . Then each Fi ,
boolean predicate P over M (called A safety property), the with 0 ≤ i < k, is an over-approximation to the set of states
invariant verification problem is the problem of determining if reachable within i steps in S.
Lemma 4. Let S = hM, I, T i be a transition system, P a to a bad cube s. This is done by means of the Extend(t)
safety property over M, Fk = F0 , F1 , . . . Fk a sequence of procedure (line 5), not reported here, that employs ternary
boolean formulas over M and let conditions (1-4) hold for simulation to remove some literals from t. The resulting cube
Fk . Then P is satisfied up to k − 1 steps in S (i.e. there s includes t and violates the property P , it is thus a bad
doesn’t exist any counter-example to P of length less or equal cube. The algorithm then tries to block the whole bad cube
than k − 1). s rather than t. It is showed in [7] that extending bad states
into bad cubes before blocking them dramatically improves
The main procedure of IC3 is described in Algorithm 1 and
IC3 performance.
is composed of two nested iterations. Major iterations (lines 3-
Once a bad cube s is found, it is blocked in Fk calling
16) try to prove that P is satisfied up to k steps in S, for
the BlockCube(s, Q, Fk ) procedure (line 6). This procedure
increasing values of k. To prove so, in minor iterations (lines 4-
is described in Algorithm 2.
9), IC3 refines the trace Fk computed so far, by adding new
relative inductive clauses to some of its time frames. The
algorithm iterates until either (i) an inductive strengthening Input: s: bad cube in Fk ; Q: priority queue; Fk : trace
of the property is produced (line 4), or (ii) a counter-example Output: SUCCESS or FAIL(σ), with σ counter-example
to the property is found (line 7). 1: add a proof-obligation (s, k) to the queue Q
2: while Q is not empty do
Input: S = hM, I, T i ; P (M) 3: extract (s, j) with minimal j from Q
Output: SUCCESS or FAIL(σ), with σ counter-example 4: if j > k or t 6|= Fj then continue;
1: k ← 0 5: if j = 0 then return FAIL(σ)
2: Fk ← I 6: if ∃t, v ′ : t, v ′ |= Fj−1 ∧ T ∧ ¬s ∧ s′ then
3: repeat 7: p ← Extend(t)
4: while ∃t : t |= Fk ∧ ¬P do 8: add (p, j − 1) and (s, j) to Q
5: s ← Extend(t) 9: else
6: if BlockCube(s, Q, Fk ) = FAIL(σ) then 10: c ← Generalize(j, s, Fk )
7: return FAIL(σ) 11: Fi ← Fi ∪ c for 0 < i ≤ j
8: end if 12: add (j + 1, c) to Q
9: end while 13: end if
10: Fk+1 ← ∅ 14: end while
11: k ←k+1 15: return SUCCESS
12: Fk ← Propagate(Fk ) Algorithm 2. BlockCube(s, Q, Fk )
13: if Fi = Fi+1 for some 0 ≤ i < k then
14: return SUCCESS
Otherwise, if Q1 is UNSAT, every bad state of Fk has been
15: end if
blocked so far, conditions (1-4) hold for k + 1 and IC3 can
16: until forever
safely move to the next major iteration, trying to prove that
Algorithm 1. IC3(S, P ) P is satisfied up to k + 1 steps. Before moving to the next
iteration, a new empty time frame Fk+1 is created (line 10).
At major iteration k, the algorithm has computed a trace Initially, clauses(Fk+1 ) = ∅ and such time frame represent
Fk such that conditions (1-4) hold. From Lemma 4, it follows the entire state space, i.e. Fk+1 = Space(S). Note that
that P is satisfied up to k − 1 steps in S. IC3 then tries to Space(S) is a valid over-approximation to the set of states
prove that P is satisfied up to k steps as well. This is done by reachable within k + 1 steps in S. Then a phase called prop-
enumerating Fk -states that violate P and then trying to block agation takes place (line 12). The procedure Propagate(Fk)
them in Fk . (Algorithm 4), which is discussed later, handles that phase.
During propagation, IC3 tries to refine every time frame Fi ,
Definition 10. Blocking a state (or, more generally, a cube)
with i < 0 ≤ k, by checking if some clauses of one time
s in a time frame Fk means proving s unreachable within k
frame can be pushed forward to the following time frame.
steps in S, and consequently refine Fk to exclude it.
Possibly, propagation refines the outmost timeframe Fk so that
To enumerate each state of Fk that violates P (line 4), the Fk ⊂ Space(S). Propagation can lead to two adjacent time
algorithm poses SAT queries to the underlying SAT solving frames becoming equivalent. If that happens, the algorithm
framework in the following form: has found an inductive strengthening of P S (equal to those
time frames), so the property P holds for for every reachable
SAT ?(Fk ∧ ¬P ) (Q1 )
state of S and IC3 return success (line 13).
If Q1 is SAT, a bad state t in Fk can be extracted from the The procedure BlockCube(s, Q, Fk ) (Algorithm 2) is re-
satisfying assignment. That state must be blocked in Fk . To sponsible for refining the trace Fk in order to block a bad cube
increase performance of the algorithm, as suggested in [7], s found in Fk . To preserve condition (3), prior to blocking a
the bad state t generated this way is first (possibly) extended cube in a certain time frame, IC3 has to recursively block its
predecessor states in the preceding time frames. To keep track in clause c = ¬s while maintaining its inductiveness relative
of the states (or cubes) that must be blocked in certain time to Fj−1 , in order to preserve condition (2). The resulting
frames, IC3 uses the formalism of proof-obligations. clause is added not only to Fj , but also to every time frame
Fi , 0 < i < j (line 11). Doing so discharges the proof-
Definition 11. Given a cube s and a time frame Fj , a proof-
obligation (s, j). In fact, this refinement rule out s from every
obligation is a couple (s, j) formalizing the fact that s must
Fi with 0 < i ≤ j. Since the sets Fi with i > j are larger
be blocked in Fj .
than Fj , s may still be present in one of them and (s, j + 1)
Given a proof obligation (s, j), the cube s can either may become a new proof-obligation. To address this issue,
represent a set of bad states or a set of states that can all Algorithm 2 adds (s, j + 1) to the priority queue (line 12).
reach a bad state in one or more transitions. The number j Otherwise, if Q3 is SAT (lines 7-8), a predecessor t of s
indicates the position in the trace where s must be proved in Fj−1 can be extracted from the satisfying assignment. To
unreachable, or else the property fails. preserve condition (3), before blocking a cube s in a time
frame Fj , every predecessor of s must be blocked in Fj−1 . So,
Definition 12. A proof obligation (s, j) is said to be dis-
the predecessor t is extended with ternary simulation (line 7)
charged when s becomes blocked in Fj .
into the cube p, and then both proof-obligations (p, j − 1) and
IC3 maintains a priority queue Q of proof-obligations. (s, j) are added to the queue (line 8).
During the blocking of a cube, proof-obligations (s, j) are
extracted from Q and discharged for increasing values of Input: j: time frame index; s: cube such that ¬s is
j, ensuring that every predecessor of a bad cube s will be inductive relative to Fj−1 ; Fk : trace
blocked in Fj (j < k) before s will be blocked in Fk . In Output: c : a sub-clause of ¬s
the BlockCube(s, Q, Fk ) procedure, first a proof-obligation 1: c ← ¬s
encoding the fact that s must be blocked in Fk is added to 2: for all literals l in c do
Q (line 1). Then proof-obligations are iteratively extracted 3: try ← the clause obtained by deleting l from c
from the queue and discharged (lines 2-14). 4: if 6 ∃t, v ′ : t, v ′ |= Fj−1 ∧ T ∧ try ∧ ¬try ′ then
Prior to discharge the proof-obligation (s, j) extracted, IC3 5: if 6 ∃t |= I ∧ ¬try then
checks if that proof-obligation still needs to be discharged. 6: c ← try
It is in fact possible for an enqueued proof-obligation to 7: end if
become discharged as a result of the discharging of some 8: end if
previously extracted proof-obligations. To perform this check, 9: end for
the following SAT query is posed (line 4): 10: return c
SAT ?(Fj ∧ s) (Q2 ) Algorithm 3. Generalize(j, s, Fk )
If the result of Q2 is SAT, then the cube s is still in Fj and
(s, j) still needs to be discharged. Otherwise, s has already The Generalize(j, s, Fk ) procedure (Algorithm 3) tries to
been blocked in Fj and the procedure can move on to the remove literals from ¬s while keeping it relatively inductive
next iteration. to Fj−1 . To do so, a clause c intialized with ¬s (line 1) is used
If the proof-obligation (s, j) still needs to be discharged, to represent the current minimal inductive clause. For every
then IC3 checks if the time frame identified by j is the initial literal of c, the clause try obtained by dropping that literal
time frame (line 5). If so, the states represented by s are initial from c (line 3), is checked for inductiveness relative to Fj−1
states that can reach a violation of the property P . A counter- by means of the following SAT query (line 4):
example σ to P can be constructed going up the chain of
proof-obligations that led to (s, 0). In that case, the procedure SAT ?(Fj−1 ∧ try ∧ T ∧ ¬try ′ ) (Q4 )
terminates with failure returning the counter-example found.
If Q4 is UNSAT, the iductive case for the reduced formula
To discharge a proof-obligation (s, j), i.e. to block a cube
still holds. Since dropping literals from a relatively inductive
s in Fj , IC3 tries to derive a clause c such that c ⊆ ¬s and
clause can break both the inductive case and the base case, the
c is inductive relative to Fj−1 . This is done by posing the
latter must be explicilty checked too for the reduced clause
following SAT query (line 6):
try (line 5). This is done by posing the following SAT query:
SAT ?(Fj−1 ∧ ¬s ∧ T ∧ s′ ) (Q3 )
SAT ?(I ∧ ¬try) (Q5 )
If Q3 is UNSAT (lines 10-12), then the clause ¬s is
inductive relative to Fj−1 and can be used to refine Fj , ruling If both the inductive case and the base case hold for the
out s. To pursue a stronger refinement of Fj , the inductive reduced clause try, the currently minimal inductive clause c
clause found undergoes a process called inductive generaliza- is updated with try (line 6).
tion (line 10) prior to being added to Fi . Inductive gener- The Propagate(Fk) procedure (Algorithm 4) handles the
alization is carried out by the procedure Generalize(j, s, Fk ) propagation phase. For every clause c of each time frame Fj ,
(Algorithm 3), which tries to minimize the number of literals with 0 ≤ j < k − 1, the procedure checks if c can be pushed
Input: Fk : trace Name Query Type Query
Output: Fk : updated trace Q1 Target Intersection SAT ?(Fk ∧ ¬P )
Q2 Blocked Cube SAT ?(Fi ∧ s)
1: for j = 0 to k − 1 do Q3 Relative Induction SAT ?(Fi ∧ ¬s ∧ T ∧ s′ )
2: for all c ∈ Fj do Q4 Inductive Generalization SAT ?(Fi ∧ c ∧ T ∧ ¬c′ )
3: if ∃t, v ′ : t, v ′ |= Fj ∧ T ∧ c′ then Q5 Base of Induction SAT ?(I ∧ ¬c)
4: Fj+1 ← Fj+1 ∪ {c} Q6 Clause Propagation SAT ?(Fi ∧ T ∧ ¬c′ )
5: end if
6: end for Table I: SAT Queries Breakdown in IC3
7: end for
8: return Fk
• SAT calls involved in inductive generalization are by
Algorithm 4. Propagate(Fk ) far the most frequent ones. These are in fact the calls
that appears at the finest granularity. In the worst case
scenario, one call is posed for every literal of every
forward to Fj+1 (line 3). To do so, it poses the following SAT inductive clause found.
query: • Inductive generalization and propagation checks are the
SAT ?(Fj ∧ T ∧ c′ ) (Q6 ) most expensive queries in terms of average SAT solving
If the result of Q6 is SAT, then it is safe, with respect to time required.
condition (3), to push clause c forward to Fi+1 . Otherwise, c • Target intersection calls are very infrequent and don’t take

can’t be pushed and the procedure iterates to the next clause. much time to be solved.
• Blocked cube and relative induction checks are close in
C. Related works the number of calls and solving time.
In [2], Aaron Bradley outlined the opportunity for SAT and
SMT researchers to directly address the problem of improving
IC3’s performance by exploiting the peculiar character of the Query Calls Avg Time
[Number] [%] [ms]
SAT queries it poses. A description of the IC3 algorithm,
Q1 483 0.1 81
specifically addressing implementation issues, is given in [7]. Q2 27891 6.8 219
Q3 31172 7.6 334
III. SAT SOLVING IN IC3 Q4 142327 34.7 575
SAT queries posed by IC3 have some specific characteris- Q5 147248 35.9 112
tics: Q6 61114 14.9 681
• Small-sized formulas: they employ no more than a single
instance of the transition relation; Table II: SAT queries statistics in IC3: Number of calls,
• Large number of calls: reasoning during the verification
percentage, and average time spent in different SAT queries
process is highly localized and takes place at relatively- during an IC3 run.
small-cubes granularity;
• Separated contexts: each SAT query is relative to a single
IV. H ANDLE SAT SOLVING IN IC3
time frame;
• Related calls: subsequent calls may expose a certain Subsequent SAT calls in IC3 are often applied to highly
correlation (for instance, inductive generalization calls different formulas. In the general case, two subsequent calls
take place on progressively reduced formulas). can in fact be posed in the context of different time frames,
We performed an analysis of the implementation of IC3 thus involving different sets of clauses. In addition, one of
within the academic model checking suite PdTRAV, closely them may require the use of the transition relation, while
following the description of IC3 given in [7] (there called the other may not, and each query can involve the use of
PDR: Property Directed Reachability). The experimental anal- temporary clauses/cubes that are needed only to respond to that
ysis lead us to identify six different types of SAT queries that particular query (e.g. the candidate inductive clause used to
the algorithm poses during its execution. These queries are check relative inductiveness during cube blocking or inductive
the ones already outlined in Section II-B. The type of these generalization).
queries is reported in Table I. In the general case, whenever a new query is posed by
For each of the queries identified, we have measured the IC3 to the underlying SAT solving framework, the formula
average number of calls and the average solving time. These currently loaded in the solver must be modified to accomodate
statistics are reported in Table II. The results were collected by the new query. For this reason, IC3 requires the use of
running PdTRAV’s implementation of IC3 on the complete set SAT solvers that expose an incremental SAT interface. An
of 310 single property benchmarks of the HWMCC’12, using incremental SAT interface for IC3 must support the following
time and memory limits of 900 s and 2 GB, respectively. features:
Such statistics can be summarized as follows: • Pushing and popping clauses to or from the formula.
• Specifying literal assumptions. The reason why inductive generalization calls are so ex-
Many state-of-the-art SAT solvers, like MiniSAT [10], fea- pensive can be due to the fact that during the inductive
ture an incremental interface capable of pushing new clauses generalization of a clause, at every iteration a slightly changing
into the formula and allowing literal assumptions. Removing formula is queried for a satisfying assignment in increasingly
clauses from the current formula is a more difficult task since larger subspaces. Given two subsequent queries in inductive
one have to remove not only the single clause specified, generalization, in fact, it can be noticed that their formulas
but also every learned clause that has been derived from can differ only for one literal of the present state clause try
it. Although solvers such as zchaff [11] directly support and one literal of the next state cube ¬try. As the subspace
clause removal, the majority of the state-of-the-art SAT solvers becomes larger solving times for those queries increases.
feature an interface like the one of MiniSAT, in which clause The average expensiveness of clause propagation calls can
removal can only be simulated. This is done through the use of be intuitively motivated by noticing that they are posed one
literal assumptions and the introduction of auxiliary variables time for every clause of every time frame. The innermost
known as activation variables, as described in [12]. In such time frames are the ones with the largest number of clauses,
solvers, clauses aren’t actually removed from the formula and thus require the largest number of propagation calls.
but only made redundant for the purpose of determining the Unfortunately, given the high number of clauses in those time
satisfiability of the rest of the formula. Since such clauses frames, the CNF formulas upon which such calls act are highly
are not removed from the formula, they still participate in constrained and usually harder to solve. So during propagation
the Boolean Constraint Propagation (BCP) and, thus, degrade there are, in general, more hard queries than simple queries,
the overall SAT solving performance. In order to minimize making the average SAT solving time for those queries high.
this degradation, each solver employed by IC3 must be pe- In an attempt to reduce the burden of each time frame’s SAT
riodically cleaned up, i.e. emptied and re-loaded with only solver, we have experimented the use of specialized solvers for
relevant clauses. In this work we assume the use of a SAT handling such queries. For every time frame, a second solver is
solver exposing an incremental interface similar to the one of instantiated and used to respond a particular type of query (Q4
MiniSAT. or Q6 ). Table III summarize the results of such experiment.
Once an efficient incremental SAT solving tool has been
VI. S OLVER LOADING
chosen, any implementation of IC3 must face the problem
of deciding how to integrate the top-level algorithm with the To minimize the size of the formula to be loaded into each
underlying SAT solving layer. Such problem can be divided solver, a common approach is to load, for every SAT call that
into the following three sub-problems: queries the dynamic behavior of the system, only the necessary
• SAT solvers allocation: decide how many different SAT
part of the transition relation.
solvers to employ and how to distribute workload among It is easy to observe that every SAT call that uses the
them. transition relation involves a constraint on the next state
• SAT solvers loading: decide which clauses must be loaded
variables of the system in the form of a cube c′ . Such queries
in each solver to make them capable of responding ask if there is a state of the system satisfying some constraints
correctly to the SAT queries posed by IC3. on the present state, which can reach in one transition states
• SAT solvers clean up: decide when and how often the
represented by c′ . Since c′ is a cube, the desired state must
algorithm must clean up each solver, in order to avoid have a successor p such that p correspond to c′ for the value of
performance degradation. every variable of c′ . It’s easy to see that the only present state
variables that are relevant in determining if such a state exists,
V. SAT SOLVERS ALLOCATION are those in the structural support of the next state variables
We assume in this work the use of multiple SAT solvers, of c′ . It follows that the only portions of the transition relation
one for each different time frame. Using multiple solvers, we required to answer such queries are the logic cones of the next
observed that performance is highly related to: state variables of c′ .
• Solver cleanup frequency: cleaning up the solver means Such loading strategy, known as lazy loading of transition
removal of incrementally loaded problem clauses and relation, is commonly employed in various implementations
learned clauses of IC3, as the ones of PdTRAV and ABC. We observed in
• Clause loading strategy: on-demand loading of transition average 50% reduction in the size of the CNF formula for the
relation clauses based on topological dependency transition relation using lazy loading of transition relation.
We noticed that, for these queries, the portions of the
A. Specialized solvers transition relation loaded can be further minimized employing
From the statistical results of reported in Table II it’s easy a CNF encoding technique, called Plaisted-Greenbaum CNF
to see that inductive generalization and clause propagation encoding [13] (henceforth simply called PG encoding). The
queries are by far the most expensive ones in terms of average AIG representation of the transition relation together with
SAT solving time. Inductive generalization queries, in addition the assumptions specified by the next state cube c′ can be
of being expensive, are also the most frequent type of query viewed as a constrained boolean circuit [14], i.e. a boolean
posed. circuit in which some of the outputs are constrained to assume
certain values. The Plaisted-Greenbaum encoding is a special These heuristics clean up each solver as soon as the computed
encoding that can be applied in the translation of a constrained estimate meets some, often static, threshold.
boolean circuit into a CNF formula. Our experiments with IC3 prove that the frequency of
Below we give an informal description of the PG encoding. the cleanups play a crucial role in determining the overall
For a complete description refer to [13] or [14]. verification performance. We explored the use of new cleanup
Given an AIG representation of a circuit, a CNF encoding heuristics based on more precise measures of the number of
first subdivides that circuit into a set of functional blocks, i.e. irrelevant clauses loaded and able to exploit correlation among
gates or group of connected gates representing certain boolean different SAT calls to dynamically adjust the frequency of
functions, and introduces a new CNF variable a for each of cleanups.
these blocks. For each functional block representing a function For SAT calls in IC3, notice that there are two sources of
f on the input variables x1 , x2 , . . . , xn , a set of clauses is irrelevant clauses loaded in a solver:
derived translating into CNF the formula:
1) Deactivated clauses loaded for previous inductive checks
a ↔ f (x1 , x2 , . . . , xn ) (1) (Q3 and Q4 queries).
2) Portions of logic cones loaded for previous calls query-
The final CNF formula is obtained by the conjunction of these
ing the dynamic behavour of the system.
sets of clauses. Different CNF encodings differ in the way the
gates are grouped together to form functional blocks and in Cleanup heuristics commonly used, such as the ones used
the way these blocks are translated into clauses. The idea of in baseline versions of ABC and PdTRAV, typically take into
PG encoding is to start from a base CNF encoding, and then account only the number of deactivated clauses in the solver to
use both output assumptions and topological information of compute an estimate of the number of irrelevant clauses. We
the circuit to get rid of some of the derived clauses, while investigated the use of a new heuristic measure taking into
still producing an equi-satisfiable CNF formula. Based on the account the second source of irrelevant clauses, i.e. irrelevant
output constraints and the number of negations that can be portions of previously loaded cones.
found in every path from a gate to an output node, it can be Every time a new query requires to load a new portion of the
shown that, for some gates of the circuit, an equi-satisfiable transition relation, to compute a measure of the number of the
encoding can be produced by only translating one of the two irrelevant transition relation’s clauses, the following quantities
sides of the bi-implication (1). The CNF formula produced are computed (assuming that c′ is the cube constraining the
by PG encoding will be a subset of the one produced by the next state variables for that query):
traditional encoding. • A: the number of transition relation clauses already
PG encoding proves to be effective in reducing the size loaded into the solver (before loading the logic cones
of loaded formulas, but it is not certain whether it is more required by the current query);
efficient for SAT solving, since it may have worst propagation ′
• S(c ): the size (number of clauses) of the logic cones
behaviour [15]. required for solving the current query;
We observed a 10-20% reduction in the size of the CNF ′
• L(c ): the number of clauses in the required logic cones to
formula for the transition relation. be loaded into the solver (equal to the size of the required
VII. S OLVERS CLEAN UP logic cone minus the number of clauses that such cone
shares with previously loaded cones);
A natural question arises regarding how frequently and at
what conditions SAT solvers cleanups should be scheduled. A measure of the number of irrelevant transition relation’s
Cleaning up a SAT solver, in fact, introduces a certain over- clauses loaded for c′ , denoted by u(c′ ), can be computed as
head. This is because: follows:
• Relevant clauses for the current query must be reloaded. u(c′ ) = A − (S(c′ ) − L(c′ )) (2)
• Relevant inferences previously derived must be derived
again from those clauses. Such a heuristic measure, divided by the number of clauses
A heuristic cleanup strategy is needed in order to achieve a loaded into the solver, indicates the percentage of irrelevant
trade-off between the overhead introduced and the slowdown clauses loaded in the solver w.r.t the current query. In Sec-
in BCP avoided. The purpose of that heuristic is to determine tion VIII we investigate the use new cleanup strategies based
when the number of irrelevant clauses (w.r.t. the current query) on this measure. In order to take into account correlation
loaded in a solver becomes large enough to justify a cleanup. between subsequent SAT calls, we consider such measure
To do so, a heuristic measure representing an estimate of the averaged on a time window of the last SAT calls.
number of currently loaded irrelevant clauses is compared to a
certain threshold. If that measure exceeds the threshold, then VIII. E XPERIMENTAL R ESULTS
the solver is cleaned up.
Clean up heuristics currently used in state-of-the-art tools, We have compared different cleanup and clause loading
like ABC and PdTRAV, rely on loose estimates of the size heuristics in both PdTRAV and ABC. In this section, we
of irrelevant portions of the formula loaded into each solver. briefly summarize the obtained results.
A. PG encoding • H4: H2 || H3;
A first set of experiments was done on the full set of Heuristic H1 is the cleanup heuristic used in the baseline
310 HWMCC’12 benchmarks [16], with a lower timeout, of versions of both PdTRAV and ABC. The static threshold of
900 seconds, in order to evaluate the use of PG encoding. 300 activation literals was determined experimentally. Heuris-
Results were controversial. A run in PdTRAV, with 900 tic H2 cleans up each solver as soon as half of its variables
seconds timeout, showed a reduction in the number of solved are used for activation literals. It can be seen as a first simple
instances from 79 to 63 (3 of which previously unsolved). The attempt to adopt a dynamic cleanup threshold. Heuristic H3 is
percentage reduction of loaded transition relation clauses was the first heuristic proposed to take into account the second
21.32%. source of irrelevant clauses described in Section VII, i.e.
A similar run on ABC, showed a more stable compari- irrelevant portions of previously loaded cones. In H3 a solver
son, from 80 to 79 solved instances (3 of which previously is cleaned up as soon as the percentage of irrelevant transition
unsolved). So a first conclusion is that, PG encoding was relation’s clauses loaded, averaged on a window of the last
not able to win over the standard encoding, suggesting it can 1000 calls, reaches 50%. Heuristic H4 takes into account both
indeed suffer of worst propagation behaviour. Nonetheless, it sources of irrelevant clauses, combining H2 and H3.
was very interesting to observe that the overall number of
Configuration Solved [#] New [#] Avg Time [s]
solved problems grew from 79 to 82 in PdTRAV and from 80 P0 (H1) 64 137.00
to 85 in ABC. P3 (H2) 60 1 122.19
Different results between the two tools in this experi- P4 (H3) 44 116.28
mentation could be partially due to different light-weight P5 (H4) 62 3 125.94
preprocessing done by default within them. We started a more
detailed comparison, in order to better understand the partially Table IV: Tests on clean up heuristics.
controversial results.
Table IV shows a comparison among H1 (row P 0), H2,
B. Experiments with PdTRAV H3, and H4, in rows P 3, P 4, and P 5, respectively. No PG
We then focused on a subset of 70 selected circuits, for encoding, nor specialized solvers were used. Heuristic H1,
which preprocessing was done off-line, and the tools were that employs a static threshold, was able to solve the largest
run on the same problem instances. In the following tables, number of instances. Among dynamic threshold heuristics,
the P 0 rows always represent the default setting of PdTRAV. both H2 and H3 take into account a single source of irrelevant
We report number of solver instances (out of 70) and average loaded clauses, respectively the deactivated clauses in H2
execution time (time limit 900 seconds). Column labeled and the unused portions of transition relation in H3. Data
N ew, when present, shows the number of instances not solved clearly indicates that H3 has worse performance. This sug-
by P 0. gests that deactivated clauses play a bigger role in degrading
SAT solvers’ performance than irrelevant transition relation’s
Configuration Solved [#] New [#] Avg Time [s] clauses do. Taking into account only the latter source of
P0 (baseline) 64 137.00 irrelevant clauses it’s not sufficient. It can be noticed that
P1 (Q4 spec.) 66 4 144.18
P2 (Q6 spec.) 60 3 134.25
heuristic H4, that takes into account both sources, outperforms
both H2 and H3. This proves that considering irrelevant
Table III: Tests on specialized solvers. clauses arising from previoulsy loaded cones in addition to
deactivated clauses can be beneficial. In addition Table IV
shows that dynamic heuristics were able solve some instances
Table III shows two different implementations with special-
that can’t be solved by the static heuristic H1 and viceversa. In
ized solvers (so two solvers per time frame): in the first one
terms of overall number of solved instances, the static heuristic
(P 1) the second solver handles generalization calls while in
H1 outperformes our best dynamic heuristic H4. This can be
the second one (P 2) the it handles propagation calls. Overall,
due to the fact that the threshold parameter of H1 results from
we see a little improvement by P 1, with two more instances
extensive experimentation while to determine the parameters
solved w.r.t P 0, including 4 instances not solved by P 0.
of H4 (window size and percentage thresholds) a narrower
The next two tables show an evaluation of different solver
experimentation has been performed.
cleanup heuristics. Let a be the number of activation variables
We then focused on H4, and benchmarked it in different
in the solver, |vars| the total number of variables in the solver,
setups. Results are showed in table V.
|clauses| the total number of clauses in the solver, u(c′ ) the
Here PG encoding was used in configurations P 6 and P 8,
heuristic measure discussed in Section VII and W (x, n) a
single solver per time frame in P 6, additional specialized
sliding window containing the values of x for the last n SAT
solver for generalization in P 7 and P 8. We see that the
calls. The heuristics compared are the following:
specialized solver configuration appears to perform worse
• H1: a > 300; with PG encoding. Also, adding a specialized solver for
1
• H2: a > 2 |vars|; generalization to the dynamic heuristic H4 doesn’t seem to
u(c′ )
• H3: avg W |clause| , 1000 > 0.5 be as effective as it is when using the static heuristic H1.
Configuration Solved [#] New [#] Avg Time [s]
static heuristic H1 hasn’t been found yet, we believe that a
P0 (baseline) 64 137.00
P6 (PG) 59 3 208.85 more extensive experimentation could lead to better results.
P7 (Q4 spec) 58 1 111.59 Nonetheless, the results are more interesting if we consider
P8 (PG+Q4 spec) 50 1 178.56 them from the standpoint of a portfolio-based tool, since the
overall coverage (by the union of all setups) is definitely
Table V: Tests on mixed strategies for cleanup heuristic H4. higher.
So we believe that more effort in implementation, experi-
mentation, and detailed analysis of case sudies, needs to be
This can be due to the fact that irrelevant clauses arising from done. We also deem that this work contributes to the discussion
generalization calls are not taken into account to schedule the of new developments in the research related to IC3.
clean up of the main solver that, in turn, is cleaned up less
frequently. R EFERENCES
[1] A. R. Bradley, “SAT-based model checking without unrolling,” in
C. Experiments with ABC VMCAI, Austin, Texas, Jan. 2011, pp. 70–87.
[2] A. R. Bradley, “Understanding IC3,” in SAT, ser. Lecture Notes in
The 70 selected circuits were also benchmarked with ABC, Computer Science, A. Cimatti and R. Sebastiani, Eds., vol. 7317.
with same pre-processing used in PdTRAV: Table VI report in Springer, 2012, pp. 1–14.
row A0 the default setting of ABC. Row A1 shows the variant [3] A. Biere, A. Cimatti, E. M. Clarke, M. Fujita, and Y. Zhu, “Symbolic
Model Checking using SAT procedures instead of BDDs,” in Proc. 36th
with PG encoding, row A2 shows a run without dynamic TR Design Automation Conf. New Orleans, Louisiana: IEEE Computer
clause loading. Row A3 finally shows a different period for Society, Jun. 1999, pp. 317–320.
solver cleanup (1000 variables instead of 300). [4] M. Sheeran, S. Singh, and G. Stålmarck, “Checking Safety Properties
Using Induction and a SAT Solver,” in Proc. Formal Methods in
Computer-Aided Design, ser. LNCS, W. A. Hunt and S. D. Johnson,
Configuration Solved [#] New [#] Avg Time [s] Eds., vol. 1954. Austin, Texas, USA: Springer, Nov. 2000, pp. 108–
A0 64 138.66 125.
A1 63 1 152.18 [5] P. Bjesse and K. Claessen, “SAT–Based Verification without State Space
A2 63 2 158.75 Traversal,” in Proc. Formal Methods in Computer-Aided Design, ser.
A3 64 138.45 LNCS, vol. 1954. Austin, TX, USA: Springer, 2000.
[6] K. L. McMillan, “Interpolation and SAT-Based Model Checking,” in
Proc. Computer Aided Verification, ser. LNCS, vol. 2725. Boulder,
Table VI: Tests on ABC with different strategies. CO, USA: Springer, 2003, pp. 1–13.
[7] N. Eén, A. Mishchenko, and R. K. Brayton, “Efficient implementation
of property directed reachability,” in FMCAD, 2011, pp. 125–134.
Overall, results show little variance among different settings, [8] A. Mishchenko, “ABC: A System for Sequential Synthesis and Verifi-
which could suggest lesser sensitivity of ABC to different cation, http://www.eecs.berkeley.edu/∼alanmi/abc/,” 2005.
tunings. Nonetheless, a further experimentation with ABC on [9] G. Cabodi, S. Nocco, and S. Quer, “Benchmarking a model checker for
algorithmic improvements and tuning for performance,” Formal Methods
the full set of 310 benchmarks (with 300 seconds time limit), in System Design, vol. 39, no. 2, pp. 205–227, 2011.
showed a 14% improvement in the number of solved problems [10] N. Eén and N. Sörensson, “The Minisat SAT Solver, http://minisat.se,”
(from 71 to 81), which indicate a potential improvement for Apr. 2009.
[11] M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik, “Chaff:
a portfolio-based tool, able to concurrently exploit multiple Engineering an Efficient SAT Solver,” in Proc. 38th Design Automation
settings. Conf. Las Vegas, Nevada: IEEE Computer Society, Jun. 2001.
[12] N. Eén and N. Sörensson, “Temporal induction by incremental SAT
IX. C ONCLUSIONS solving,” Electr. Notes Theor. Comput. Sci., vol. 89, no. 4, pp. 543–560,
2003.
The paper shows a detailed analysis and characterization of [13] D. A. Plaisted and S. Greenbaum, “A structure-preserving clause form
SAT queries posed by IC3. We also discuss new ideas for translation,” J. Symb. Comput., vol. 2, no. 3, pp. 293–304, 1986.
[14] M. Järvisalo, A. Biere, and M. Heule, “Simulating circuit-level simpli-
solver allocation/loading/restarting. The experimental evalu- fications on CNF,” J. Autom. Reasoning, vol. 49, no. 4, pp. 583–619,
ation done on two different state-of-the-art academic tools, 2012.
shows lights and shadows, as no breakthrough or clear winner [15] N. Eén, “Practical SAT: a tutorial on applied satisfiability solving,” Slides
of invited talk at FMCAD, 2007.
emerges from the new ideas. [16] A. Biere and T. Jussila, “The Model Checking Competition Web Page,
PG encoding showed to be less effective than expected. http://fmv.jku.at/hwmcc.”
This is probably because the benefits introduced in terms of
loaded formula size will be overwhelmed by the supposed
worst propagation behaviour of that formula.
The use of specialized solvers seems to be effective when a
static cleanup heuristic is used, less effective when combined
with PG encoding or a dynamic heuristic.
Our experiments showed that, when a dynamic cleanup
heuristic is used, IC3’s performance can be improved by taking
into account both deactivated clauses and irrelevant portions of
previously loaded cones. Even if a parameter configuration for
H4 that is able to outperform the currently used, well-rounded