Abstraction-Based Livelock/Deadlock Checking for Hardware Verification In-Ho Moon and Kevin Harer Synopsys Inc. {mooni, kevinh}@synopsys.com ABSTRACT has any incoming edges to the SCC in the state transi- Livelock/deadlock is a well known and important problem tion graph representing a hardware design. However, even in both hardware and software systems. In hardware verifi- though the method in [27] is an improved method from its cation, a livelock is a situation where the state of a design previous work [13] in symbolic approaches, it is still infea- changes within only a smaller subset of the states reachable sible to apply the method to the industrial designs, simply from the initial states of the design. Deadlock is a special due to the capacity problem of symbolic methods. In gen- case in which there is only one state in a livelock. However, eral, any BDD-based method can handle only up to several livelock/deadlock checking has never been actively used in hundred latches without any abstraction or approximation hardware verification in practice, mainly due to the com- techniques, whereas there can be millions of latches in in- plexity of the computation which involves finding strongly dustrial designs. connected components. In this paper, we first present an improved BDD-based This paper presents a practical abstraction-based live- algorithm finding TSCCs from[27] in the following aspects. lock/deadlock checking algorithm for hardware verification. First, initial state is taken into account in finding TSCCs. The proposed livelock/deadlock checking works on FSMs Especially, the improved algorithm handles multiple initial rather than the whole design. For each FSM, we make an states efficiently. Secondly, unreachable TSCCs are distin- abstract machine of manageable size from the cone of influ- guished from reachable TSCCs which are more interesting to ence of the FSM. Once a livelock is found on an abstract designers. Thirdly, we provide more intuitive state classifi- machine, the livelock is justified on the concrete machine cation as main, transient, and livelock groups as opposed to with trace concretization. Experimental results shows that transient and recurrence classes in [27]. In our classification, the proposed abstraction-based livelock checking finds real a set of transient states is further classified into main and livelock errors in industrial designs. transient groups. Recurrence class in [27] is mapped into livelock group in our classification. Main group is an SCC 1. INTRODUCTION that contains the initial state. Transient group is a set of Livelock/deadlock is a well known and important problem states that belong to neither main nor livelock group. There in both hardware and software systems. In hardware verifi- is one or zero main group in a design per one initial state. cation, a livelock is a situation where the state of a design This paper also presents a practical approach for checking changes within only a subset of the states reachable from livelock using abstraction techniques. The proposed live- the initial states of the design. In a state transition graph, a lock checking works on FSMs(Finite State Machines)2 rather livelock is a set of states from which there is no path going than the whole design. For each FSM, we make an abstract to any other states that are reachable from the initial state. machine (by localization reduction [15]) of manageable size Since deadlock is a special case in which there is only one by the improved BDD method from the COI(Cone of Influ- state in a livelock, deadlock checking can be done by livelock ence) of the FSM. Once a livelock is found on an abstract checking. Thus, livelock implies both livelock and deadlock machine, the livelock is justified on the concrete machine in this paper. However, livelock checking1 has never been with trace concretization using SAT (Satisfiability[9]) and actively used in hardware verification in practice, mainly due simulation. When there is no livelock on the abstract ma- to the complexity of the computation which involves find- chine, there is no guarantee for no livelock on the concrete ing SCCs (Strongly Connected Components). Thus, livelock machine. However, the bigger the abstract size is, the more checking has been on hardware designer’s wish list to verify confidence we have that no livelock exists on the concrete their designs. machine. The key benefit of this abstraction-based livelock There have been many approaches on finding SCCs [13, checking is that it enables finding real livelock groups that 27, 28, 3, 20, 12]. Among these work, Xie and Beeral pro- cannot be found by tackling whole design directly. posed a symbolic method finding terminal SCCs (in short, Once an FSM is given, its COI is first computed. Then, TSCCs) using BDDs (Binary Decision Diagrams [4]) in [27]. an abstract machine is computed by finding N influential In a state transition graph, a TSCC is an SCC that does latches from the COI. Influential latches are the latches that not have any outgoing edges to any state outside the SCC. are likely related with the FSM. N is either pre-defined or a Thus, a TSCC becomes a livelock group when the TSCC user-defined number of latches in the abstract machine, or 1 2 Livelock checking is different from liveness checking and FSMs are either automatically extracted [26] or any sets of the difference will be explained in Section 2.3. sequential elements that are user-specified. gradually increased. In general, N is up to a few hundred algorithms. Ravi et al. [20] provided a taxonomy of those latches. Influential latches are computed mainly by approx- symbolic algorithms. One is SCC-hull algorithms (without imate state decomposition [6]. However, in many cases, the enumerating SCCs) [11, 14, 24], and the other is SCC enu- size of COIs is too big for even approximate state decom- meration algorithms [28, 3, 12, 20]. The details are in [20]. position. Thus, a structural abstraction is applied by using connectivity and sequential depth before approximate state 2.2 Finding TSCCs decomposition. This structural abstraction reduces the COI Even though TSCCs are a subset of SCCs in the states of a to a manageable size by approximate state decomposition. design, the algorithms for finding TSCCs can be significantly There is another important hardware property called tog- optimized since not all SCCs are of interest. gle deadlock. A state variable has a toggle deadlock if the This section recapitulates the work on finding TSCCs by state variable initially toggles and the state variable becomes Xie and Beeral [27]. This algorithm classifies all states into a constant after a certain number of transitions. However, either recurrence or transient class. Recurrence class is a set notice that this is not a constant variable since it initially of TSCCs and the rest belongs to transient class. Let S be toggles. Toggle deadlock may or may not happen depending the set of states. With i, j ∈ S, i → j denotes that there is on input stimuli in simulation. Therefore, toggle deadlock is at least one path from i to j. Definition 1 defines forward also an important property to check with formal approaches. set and backward set of a state. Experimental results shows that the proposed abstraction- Definition 1. The forward set of state i ∈ S, denoted by based approach finds real livelock and toggle deadlock errors F (i), is the set of states that have a path from i. That is, from industrial designs. F (i) = {j ∈ S | i → j}. Similarly, the backward set of state The contributions of this paper are in the three aspects. i, denoted by B(i), is the set of states that have a path to i. • Improved algorithm for livelock checking That is, B(i) = {j ∈ S | j → i}. The proposed algorithm improved the existing algo- Lemma 1. Let i, j ∈ S. If j ∈ F (i), then F (j) ⊆ F (i). rithm [27] in many aspects, such as providing new Similarly, if j ∈ B(i), then B(j) ⊆ B(i). state classification with initial state, handling multi- ple initial states, refining the search space efficiently Theorem 1. A state i ∈ S is recurrent if and only if with care states, early termination, and trimming out F (i) ⊆ B(i). In other words, i is transient if and only if transient states. F (i) * B(i). • Abstraction-based livelock checking Theorem 2. If state i ∈ S is transient, then states in This paper presents theories and an implementation B(i) are all transient. If state i is recurrent, on the other on abstraction-based livelock checking to handle large hand, states in F (i) are all recurrent. In the latter case, set designs in practice. F (i) is a recurrence class, and set B(i)\F (i) (if not empty) • Toggle deadlock checking contains only transient states. To the best of our knowledge, this paper presents the Lemma 1, Theorem 1 and 2 are from [27]. Lemma 1 shows first method to solve toggle deadlock problem. a subset relation between two forward sets as well as two The remainder of this paper is organized as follows. Sec- backward sets when j is in either F (i) or B(i). Theorem 1 tion 2 briefly recapitulates finding SCCs and TSCCs, and and 2 show how a state is determined whether the state describes related work. Section 3 describes our improved al- belongs to either recurrence or transient class. Based on gorithm for finding TSCCs. Section 4 explains how livelock Theorem 1 and 2, all TSCCs can be found by performing is checked on FSM using abstraction. Section 5 describes forward and backward reachability iteratively. The detailed how to check toggle deadlocks. Experimental results are algorithm can be found in [27] and our improved algorithm is presented and discussed in Section 6. We conclude with described in Section 3.2 with the comparisons to the original Section 7. algorithm. 2. PRELIMINARIES 2.3 Related work 2.1 Finding SCCs There are two types of properties in model checking; safety Given a graph, G = (V, E) where G is an infinite transition and liveness properties [16]. A safety property represents system of the Kripke structure [8], V is a finite set of states ’something bad never happens’, whereas a liveness property and E ⊆ V × V is the set of edges, a strongly connected represents ’something good eventually happens’. Liveness component (SCC) is a maximal set of state U ⊆ V such checking with a liveness property can be performed by find- that for every pair (u, v) ∈ U , u and v are reachable from ing SCCs [20]. Liveness checking can also be performed by each other, that is, u is reachable from v and v is reachable safety checking with proper transformations [1]. from u [27, 28]. Livelock checking is different from liveness checking in the Finding SCCs has a variety of applications in formal ver- sense that liveness checking requires a liveness property to ification such as Buchi emptiness [11], LTL model check- work on a design, whereas livelock checking does not require ing [25], CTL model checking with fairness constraints [10], any property and works on a design directly. Liveness checking [16, 1], and so on. There have been many publications on finding all SCCs [24, The traditional approach to find SCCs is to use Tarjan’s 28, 3, 12]. Even though all TSCCs can be found by any of method [23]. Since this method manipulates the states of the these approaches on finding all SCCs, it is not necessary to graph explicitly, even though it runs in linear time in the size find all SCCs for finding all TSCCs since we are interested of the graph, the size of the graph grows exponentially as in finding only all TSCCs for livelock checking. the number of state variables grows. Hachtel et al. [13] proposed a symbolic approach to find To overcome this state explosion problem in explicit al- all recurrence classes concurrently identifying all TSCCs by gorithms, there have been many publications on symbolic computing transitive closure [17] on the transition graph with the Markov chain. Due to the complexity of transi- states, the reachable states are a through i inside the dot- tive closure, this approach takes significantly more time and ted rectangle. The unreachable states are j through o out- memory than a reachability-based approach does. side the dotted rectangle. There are five SCCs that are Qadeer et al. [19] proposed an algorithm to find single {a, b, c}, {e, f, g}, {h, i}, {j, k, l}, and{m, n, o}. Since a is the TSCC in the context of safe replacement in sequential equiv- initial state, {a, b, c} becomes the main group. {h, i} and alence checking [21, 22]. In this approach, multiple TSCCs {m, n, o} are TSCCs and only {h, i} is a livelock group (reach- are not considered. able STSCC) since it is reachable from a. {m, n, o} is called Xie and Beeral proposed a reachability-based algorithm an unreachable TSCC. The rest states, {d, e, f, g, j, k, l}, be- to find all TSCCs iteratively [27]. This is also a symbolic long to the transient group in which the states are contained approach that outperforms the method in [13]. However, in neither the main group nor the TSCCs. this approach does not consider initial states. None of the above previous work on finding TSCCs was 3.2 Finding Livelock used in real designs in practice, due to the design sizes. Our We first define transition relation in Definition 3 to explain abstraction-based approach is the first in publication to han- our algorithms to check livelock. dle large designs in practice. Definition 3. Let x = {x1 , . . . , xn }, y = {y1 , . . . , yn }, Case et al. [5] proposed a method finding transient sig- and w = {w1 , . . . , wp } be sets of variables ranging over B = nals using ternary simulation. A transient signal is a toggle {0, 1}. A (finite state) machine is a pair of boolean functions deadlock on over-approximate reachable states. The toggle ⟨Q(x, w, y), I(x)⟩, where Q : B 2n+p → B is 1 if and only if deadlock checking in this paper finds transients signals in there is a transition from the state encoded by x to the state exact reachable states. encoded by y under the input encoded by w. I : B n → B is 1 if the state encoded by x is an initial state. Q(x, w, y) is 3. IMPROVED LIVELOCK CHECKING called transition relation. The sets x, y, and w are called the 3.1 State Classification present state, next state, and input variables, respectively. The state classification in [27] consists of one transient The procedure ComputeF orwardSet in Figure 2 is a mod- class and one or more recurrence classes. However, in hard- ified version of the procedure f orward set in [27] in order ware verification, initial states are given to verify the hard- to compute forward set of a given state s only within the ware behavior only in reachable state space. One problem of given care states careSet in the procedure and to perform the state classification in [27] is that there is no distinction early termination when stop is not ZERO (empty BDD). ⇓ between reachable TSCCs and unreachable TSCCs from the represents a restrict operator [7] that is used to minimize initial states. Also, the reachable TSCCs may vary depend- the transition relation with respect to careSet in Line 2. ing on initial states. The minimized transition relation is denoted by Q̃. In Line We propose a new state classification that is shown in Fig- 7, y ← x represents that y variables are replaced by x vari- ure 1, assuming that there is one single initial state. Han- ables by BDD substitution. Early termination is another dling multiple initial states is explained in Section 3.3. big difference from f orward set in [27] and is used in Fig- Definition 2. STSCC is a sink TSCC that has incoming ure 3. This is to bail out computing forward set as soon as edges from any states outside the TSCC. any newly reached state intersects with the states in stop We first define sink TSCC (in short, STSCC) in Defini- as in Line 11. BddIteConstant is a BDD ITE(if-then-else) tion 2. The new state classification consists of main group, operation without creating a new BDD node. O is an array transient group, livelock groups (reachable STSCCs) and of states to store newly reached states at each iteration and unreachable TSCCs for a given initial state. The transient O is called onion rings. These onion rings are used later in class in [27] is further classified into main group or tran- Section 3.3. ComputeF orwardSet returns the forward set sient group. Main group is an SCC containing the initial F (s) and the onion rings O. state and there exists either one or no main group. The ComputeForwardSet(Q, careSet, s, stop) { recurrence classes in [27] are further classified into livelock 1 F (s) = ZERO; groups (reachable STSCCs) and unreachable TSCCs. When 2 Q̃(x, w, y) = Q(x, w, y) ⇓ careSet; there is no livelock, there exists only one SCC which is the 3 f rontier(x) = s; 4 Put s in O; main group. 5 while (f rontier(x) ̸= ZERO) { 6 image(y) = ∃x, w. Q̃(x, w, y) ∧ f rontier(x); Unreachable TSCC 7 image(x) = image(y)|y←x ; 8 F (s) = F (s) ∨ image(x); 9 f rontier = image(x) ∧ ¬F (s); k l m n 10 Put f rontier in O; 11 if (BddIteConstant(f rontier, stop, ZERO) != ZERO) j o 12 break; Reachable States 13 } 14 return (F (s), O); c g i } Figure 2: Computing forward set. a b d e f h Livelock Group ComputeBackwardSet is a dual procedure to Compute Main Group Transient Group (Reachable STSCC) F orwardSet, except not using stop and not computing the onion rings O. Figure 3 is a procedure for finding TSCCs from the given Figure 1: State classification. set of states S. The procedure F indT SCCs is a modified In Figure 1, there are states a through o and a is the version of the procedure State classif ication in [27]. The initial state that is marked with thick circle. Among all modified procedure utilizes care states careSet, assuming S is not necessarily all state space. T is a set of transient states Line 11. TR represents the set of transient states that are in S, and R is an array of TSCCs in S. P ickOneState in reachable from s. In Line 12, R and TR are computed by Line 5 picks a random state from careSet as a seed state to calling F indT SCCs with careSet. If s ∈ / M (means that find a TSCC. In Line 7, early termination is used in comput- the main group is empty), s is added to TR in Line 13-14. ing the forward set F (s), by setting stop in Figure 2 as the TU represents the set of transient states that are unreachable negation of B(s). This is because while we compute F (s) from s and TU is computed in Line 15. T is computed by within B(s) for the state s, once any state outside B(s) union of TR and TU in Line 16. is reachable from s, all states in B(s) are transient. An- other big difference is trimming transient states in Line 12 3.3 Multiple Initial States and 16. T rimT ransient(Q, careSet, T, dir) trims out the It is possible for a design to have multiple initial states transient states from the current care states by the given when some of the state variables do not have concrete initial direction (dir) that is either PREFIX, SUFFIX, or BOTH. values. In the presence of multiple initial states, finding PREFIX(SUFFIX) means to trim out the lasso prefix(suffix) livelock groups has to be devised correctly to avoid false states. This is the same technique used in finding SCCs [20]. positives and redundant computations. Finally, F indT SCCs returns R (a set of TSCCs) and T (a Figure 5 shows an example with multiple initial states. set of transient states). In this example, there are six states, S = {a, b, c, d, e, f }. FindTSCCs(Q, S) { There are two SCCs, {a, b, c} and {d, e, f }. We can see 1 R = { }; that {d, e, f } is a TSCC. a and d are initial states, I = 2 T = ZERO; {a, d}, as shown with thick circles. Suppose that we com- 3 careSet = S; 4 while (careSet ̸= ZERO) { pute livelock by calling F indLivelock(Q, S, I). Then, we 5 s = PickOneState(careSet); get F (I) = B(I) = M = {a, b, c, d, e, f } and R = {} which 6 B(s) = ComputeBackwardSet(Q, careSet, s); is not correct since there is a reachable TSCC. Now, let 7 F (s) = ComputeForwardSet(Q, careSet, s, ¬B(s)); 8 if (F (s) ⊆ B(s)) { us try to call F indLivelock for each single initial state. 9 R = R ∪ F (s); First for the initial state a, we get F (a) = {a, b, c, d, e, f } 10 T = T ∨ (B(s) ∧ ¬F (s)); and B(a) = {a, b, c}. This gives us Ma = {a, b, c} and 11 careSet = careSet ∧ ¬B(s); 12 TrimTransient(Q, careSet, T , PREFIX); Ra = {d, e, f }. There is a livelock group Ra for the initial 13 } else { state a. Now, for the initial state d, F (d) = {d, e, f } and 14 T = T ∨ (s ∨ B(s)); B(d) = {a, b, c, d, e, f }. This gives us Md = {d, e, f } and 15 careSet = careSet ∧ ¬(s ∨ B(s)); 16 TrimTransient(Q, careSet, T , BOTH); Rd = {} and TU = {a, b, c}. There is no livelock group for 17 } the initial state d. Therefore, we can see that livelock check- 18 } ing has to be applied for each single initial state separately 19 return (R, T ); } in the presence of multiple initial states. Figure 3: Finding TSCCs. FindLivelock(Q, S, s) { c f 1 (F (s), O) = ComputeForwardSet(Q, S, s, ZERO); 2 B(s) = ComputeBackwardSet(Q, S, s); 3 reached = F (s) ∨ s; a b d e 4 if (F (s) ⊆ B(s)) { 5 M = F (s); 6 R = { }; Figure 5: Multiple initial states. 7 T = ZERO; 8 } else { 9 M = F (s) ∧ B(s); Theorem 3. When there are two initial states (i0 and 10 careSet = F (s) ∧ ¬(M ∨ s); i1 ), if i1 is included in the reached states from i0 , the livelock 11 TrimTransient(Q, careSet, T , PREFIX); groups from i1 are a subset of the livelock groups from i0 . 12 (R, TR ) = FindTSCCs(Q, careSet); 13 if (s ∈ / M) Proof. Since i1 is included in the reached states from 14 TR = TR ∨ s; i0 , i1 is in either main, transient, or livelock groups from 15 TU = B(s) ∧ ¬M ; 16 T = TR ∨ TU ; i0 . When i1 is in the main group, the same livelock groups 17 } from i1 are obtained. When i1 is in the transient group, all 18 return (M, R, T, reached, O); or a subset of the livelock groups i1 is obtained. When i1 } is in one of the livelock groups, the livelock group including Figure 4: Finding livelock. i1 becomes the main group from i1 , and no livelock group Figure 4 shows the procedure to perform our new state exists from i1 since the other livelock groups from i0 become classification. As explained in Section 3.1, we find main unreachable TSCCs from i1 . From the above three cases, no group (M ), transient group (T ), and livelock groups (R) new livelock group is obtained from i1 compared to the ones from the given initial state (s) within the given care states from i0 . Therefore, the livelock groups from i1 are a subset (S). F indLivelock starts computing forward set F (s) and of the livelock groups from i0 . backward set B(s) in Line 1 and 2. In Line 3, reached is the Theorem 3 says that when there is large number of initial reached states from s in S. If F (s) ⊆ B(s) in Line 4, there states, we can skip livelock checking for any initial states is no livelock in S. In this case, F (s) becomes the main that are already in the forward sets of other initial states. group and both R and T are set to empty in Line 5-7. If In Figure 5, livelock checking for the initial state d can be F (s) * B(s) in Line 8, there must exist at least one livelock skipped because of d ∈ F (a), assuming that a is used first. group. In this case, M is computed by intersecting F (s) and However, there is an order dependency on which initial state B(s) in Line 9. careSet is set to a subset of F (s) in Line is used first. If d is used first, we still need to run live- 10. The lasso prefix states in careSet are trimmed out in lock checking with a. In practice, the number of calls to F indLivelock is greatly reduced because of Theorem 3 in propose a framework for abstraction-based livelock checking the presence of multiple initial states. on an abstracted COI of the FSM. Once we find a livelock on Figure 6 is the top-level procedure that checks livelock the abstract machine, we justify whether the livelock exists with multiple initial states. CheckLivelock takes transition on the concrete machine. Notice that a livelock on the ab- relation(Q), a set of states(S), a set of initial states(I), and stract machine can be mapped into more than one livelock a concrete machine(C) as procedure inputs. The use of C on the concrete machine. is explained in Section 4. CheckLivelock first finds live- Figure 7 shows how an abstract machine is obtained from lock groups in the reachable states in Line 1-17 and then it the COI of an FSM. Suppose an FSM that has two state finds TSCCs in the unreachable states in Line 18-23. The variables f and g. Then, we compute the COI of the FSM. while loop (Line 6-17) performs livelock checking for a cur- Suppose that there are state variables {a, b, c, d, e} in the rent initial state s until all initial states are covered with COI of the FSM. The size of the abstract machine is pre- iteration index k. For this, remaining is initially set to defined and let us suppose that the size is N . Then, a set I in Line 3 and updated by eliminating the newly reached of influential latches from the COI is computed from the states reachedk from remaining in Line 13. reached is the FSM variables. The minimum abstract machine is the FSM reached states from all initial states. reached is initially set itself and the maximum abstract machine is the concrete to ZERO in Line 1 and updated by adding reachedk that is machine. In this example, N =4 and we get the abstract the reached states from s in Line 12. Then, the next initial machine {f, g, d, e}. state is chosen from remaining in Line 15. TU is the union of unreachable transient states from each initial state. TU is Abstract Machine initially set to ZERO in Line 2 and updated by adding the COI of FSM FSM unreachable states of Tk in Line 14. For the current initial a d f state s, F indLivelock is called in Line 7. |Rk | represents b e g the number of livelock groups in Rk in Line 8. For each c Rkj , a trace tracejk is generated in Line 9 and the livelock Concrete Machine is reported with the trace in Line 10. Generating trace is explained in Section 4.2 and reporting livelock is explained Figure 7: Abstract machine. in Section 4.3. Theorem 4. If any state in a livelock group on an ab- CheckLivelock(Q, S, I, C) { stract machine is reachable from the initial state on the con- 1 reached = ZERO; 2 TU = ZERO; crete machine, the livelock exists on the concrete machine. 3 remaining = I; Proof. Since the abstraction is an over-approximation, 4 k = 0; the set of all transitions on the abstract machine is a superset 5 s = PickOneState(I); 6 while (s ̸= ZERO) { of the set of all transitions on the concrete machine. Since 7 (Mk , Rk , Tk , reachedk , Ok ) = FindLivelock(Q, S, s); there is no path from any state in the livelock group to any 8 for (j = 0; j < |Rk |; j++) { state in the main group on the abstract machine, there is 9 tracejk = GenerateTrace(C, Rk j , s, Ok ); still no path from any projected states of the livelock group 10 ReportLivelock(s, Mk , Rk , Tk , tracejk ); j 11 } on the concrete machine to any projected states of the main 12 reached = reached ∨ reachedk ; group on the concrete machine. Now, suppose that the live- 13 remaining = remaining ∧ ¬reachedk ; lock does not exist on the concrete machine. In order for the 14 TU = TU ∨ (Tk ∧ ¬reachedk ); 15 s = PickOneState(remaining); livelock not to exist on the concrete machine, the only con- 16 k++; dition is that there is no path from the projected main group 17 } to the projected livelock group on the concrete machine. In 18 careSet = ¬(reached ∨ TU ); other words, the projected livelock group has to be unreach- 19 if (careSet ̸= ZERO) { 20 Rk = FindTSCCs(Q, careSet); able from the initial state. However, this contradicts the 21 for (j = 0; j < |Rk |; j++) assumption that any state of the livelock group is reachable j 22 ReportUnreachLivelock(Rk ); from the initial state on the concrete machine. Therefore, 23 } } the livelock group still exists on the concrete machine. Figure 6: Checking livelock. Thanks to Theorem 4, this abstraction-based livelock finds Once all reachable livelock groups are found, we next find a livelock on small abstract machine using BDD-based sym- unreachable TSCCs. We set the care states careSet to the bolic method, then justifies the existence of the livelock on negation of all visited states so far in Line 18, then call the concrete machine by trace concretization in Section 4.2, F indT SCCs with careSet in Line 20. If there is any un- by using SAT techniques that can handle large designs. The reachable TSCC, the TSCC is reported in Line 22. abstraction-based livelock checking is an incomplete method in the sense that it does not provide the proof of no livelock 4. LIVELOCK CHECKING ON FSM unless the checking is performed on a concrete machine. No To check whether a livelock exists in a design or not, the livelock on an abstract machine does not guarantee no live- checking should be done on the whole design. However, this lock on the concrete machine. However, the abstraction- is infeasible due to the size of the design in practice. Thus, based livelock checking enables finding real livelock errors we propose a practical method for checking livelock on FSMs on industrial large designs. on the design. Even when we check livelock on an FSM, the entire COI 4.1 Causality Checking logic of the FSM must be considered in order to get an ex- Let V be the set of state variables in an abstract machine act result on livelock. However, this is still computationally for livelock checking. Suppose that R(V ) is the reached very expensive or not feasible, in most real designs. Thus, we states in the abstract machine and L(V ) is a livelock group containing a TSCC. Also, suppose that v is a state variable using the onion rings Ok in Figure 6. The second step is to in V . We are interested in whether v contributes to the generate an abstract trace. Starting from the target state, livelock as in Definition 4. This is called variable causality. an abstract trace can be computed by applying BDD-based Definition 4. When a livelock exists in the abstract ma- pre-image computation iteratively until the initial state is chine, a variable v in V contributes to the livelock if the live- reached. The third step is to generate a concrete trace by lock disappears by eliminating v from the abstract machine. making a BMC (Bounded Model Checking [2]) problem from In other words, there is no livelock in another abstract ma- the abstract trace, in order to see whether the livelock group chine that is composed of the variables, V \v. is reachable on the concrete machine. An efficient approach for concretization was proposed in [18] Equation 1 shows a condition for existence of livelock. L(V ) ⊂ R(V ) (1) 4.3 Reporting Livelock Now, let R̃ be the quantified reached states and L̃ be the Once a concrete trace is generated for a livelock group, quantified livelock states with respect to a state variable v, the livelock is real on the concrete machine. We report the as shown in Equation 2 and 3. livelock group with the state classification mentioned in Sec- tion 3.1. A livelock group is reported with its initial state, R̃(V \v) = ∃v . R(V ) (2) the main group, transient group, and the unreachable states L̃(V \v) = ∃v . L(V ) (3) in terms of the number of states and the percentage in each Then, it is determined by Equation 4 to check whether group on the abstract machine. the variable v contributes to the livelock. Theorem 5 says By looking at the transient and livelock groups, we can that if Equation 4 holds, v contributes to the livelock. see what fraction of the state space is in problematic zone. A good design is expected to have only one main group per L̃(V \v) ⊂ R̃(V \v) (4) one initial state without any transient and livelock groups, Theorem 5. When a livelock group is found on an ab- unless the design has an intended reset sequence to a normal stract machine (L(V ) ⊂ R(V )), if L̃(V \v) ⊂ R̃(V \v) holds mode. for a variable v, the variable v contributes to the livelock. Proof. Let M1 be the machine consisting of V and sup- 5. TOGGLE DEADLOCK CHECKING pose that a livelock group exists in M1 . Let M2 be the ma- There is another important design property, called toggle chine consisting of (V \v) by eliminating v from M1 . Also, deadlock that is related to livelock. A livelock may occur for let T1 (T2 ) be the set of transitions in M1 (M2 ), respectively. multiple state variables of a design, whereas a toggle dead- Since M2 is an over-approximated machine from M1 , M2 has lock may occur on a single state variable. A state variable more transitions than M1 (T1 ⊂ T2 ). Let Td be the differ- has a toggle deadlock if the variable initially toggles, but ence between T1 and T2 . If there is any transition (in Td ) the variable gets stuck at a constant value after a certain that makes a path from any state in the livelock to any state number of cycles. in the main group in M2 , the livelock group merges into the Figure 8 shows an example of toggle deadlock. There are main group and both groups become a single SCC, yielding two state variables {a, b} and four states {s0 , s1 , s2 , s3 } as in L̃(V \v) = R̃(V \v). Thus, M2 becomes a machine without the example. Provided that s0 is the initial state, the main the livelock. This means that v is a necessary variable to group is {s0 , s1 } and the livelock group is {s2 , s3 }. Once the have the livelock in M1 . Therefore, if L̃(V \v) = R̃(V \v), v state transition reaches to s2 that is a state in the livelock contributes to the livelock. group, the value of b gets stuck at 1, whereas a still toggles. This causality checking can also be applied to a set of Thus, we say that b has a toggle deadlock. variables, especially with FSM variables, in order to report a=0, b=0 a=0, b=1 whether the livelocks are related with the FSM. Let F be the s0 s3 set of variables in an FSM and C be the set of variables in the COI of the FSM. Suppose that R(F, C) is the reached states s1 s2 in the abstract machine and L(F, C) is a livelock group con- a=1, b=0 a=1, b=1 taining a TSCC. The quantified reached states and the quan- tified livelock states are computed in Equation 5 and 6 with Figure 8: Toggle deadlock. respect to the FSM variables, respectively. R̃(C) = ∃F . R(F, C) (5) Theorem 6. If there is no STSCC in a design, there is no toggle deadlock on any variable. L̃(C) = ∃F . L(F, C) (6) Proof. To be a toggle deadlock, a variable is supposed to Then, Equation 7 shows the causality checking with the toggle at a cycle and to hold the value forever from the cycle. FSM variables to check whether the FSM variables con- No STSCC implies that there is only main group in the tribute to the livelock. design. If a variable appears as constant in the main group, L̃(C) = R̃(C) (7) the variable is a constant. However, the main group does not have any prefix behavior. This means it is not possible for 4.2 Trace concretization the variable to get toggled before the main group. Therefore, Once a livelock group is found on an abstract machine, we no STSCC implies no toggle deadlock. need to justify whether the livelock group is reachable on the concrete machine. This can be done by the following three Theorem 6 shows that toggle deadlock occurs in the pres- steps. The first step is to pick a target state in the livelock ence of a livelock. It is also possible that there is no toggle group. The target state is chosen randomly from the livelock deadlock on a design that has a livelock. Thus, toggle dead- group, but is one of the closest states to the initial states by lock on a state variable can be computed by two steps. First, Statistics Results Design L I F T COI N Llk Dlk New1 New2 TraceGen Time Mem Ops Time Mem Ops Time Len D1 1163 1330 7 - 632 30 0 - 16:20 107.1 44 15:31 107.9 41 - - 60 0 - 41:11 136.7 66 41:31 137.7 66 - - 90 0 - 2:30:33 224.1 81 2:24:50 224.3 81 - - 120 - - time-out (> 24h) time-out (> 24h) - - D2 385 352 25 - 68 30 0 - 0:01 21.0 12 0:01 21.1 12 - - 60 0 - 0:40 38.7 28 0:40 38.7 28 - - 68 12032 - 4:47:52 95.7 386439 0:53:58 96.5 70329 - - D3-F1 32541 912 2 - 28941 30 1 - 0:49 140.4 54 0:49 140.5 55 6:37 66 D3-F2 32541 912 4 - 28930 4 1 - 0:09 129.1 9 0:09 129.1 12 1:54 14 - 4 28930 30 - 1 1:31 132.8 168 1:31 132.8 172 2:11 14 Table 1: Experimental results. we find STSCCs on an abstract machine from the state vari- image/pre-image computations(Ops). N ew1 is the proposed able. The abstract machine is made in the same way as in method without the trimming technique, whereas N ew2 is livelock checking on FSM. Secondly, we evaluate the value the proposed method with the trimming technique. The of the state variable in the livelock if the livelock exists. times are in the form of hh:mm:ss and the memory con- Figure 9 shows the procedure that checks toggle deadlock sumptions are in M-byte. The final two columns(T raceGen) on a given state variable t. CheckT oggleDeadlock takes show the results on trace generation on concrete machine for transition relation (Q), a set of states (S), a set of initial the livelock or toggle deadlock found by N ew2, and T ime states (I), a concrete machine (C), and the state variable shows the time spent for trace generation and Len shows (t) as procedure inputs. CheckT oggleDeadlock is similar the trace length. to CheckLivelock in Figure 6. For each reachable livelock We have chosen 3 industrial designs (D1, D2, and D3). group Rkj in Line 6, T estT oggleDeadlock checks whether the For each design, we have run livelock or toggle deadlock value of t toggles or not in the livelock group and returns dlk checking on several sizes of abstract machines with the mul- and c in Line 7. dlk represents whether the state variable is tiples of 30 latches. We have set the maximum run time to in toggle deadlock or not, and c is the constant value (0 or 24 CPU hours. 1) in the case of toggle deadlock. In D1, there is one FSM automatically extracted. The CheckToggleDeadlock(Q, S, I, C, t) { FSM consists of 7 latches and contains 632 latches in its COI. 1 remaining = I; We can see that the run time is exponentially increased, 2 k = 0; 3 s = PickOneState(I); depending on the size of the abstract machine. On this 4 while (s ̸= ZERO) { design, the livelock checking became infeasible when N =120. 5 (Rk , reachedk , Ok ) = FindLivelock(Q, S, s); In D2, there is also one FSM that was user-specified. The 6 for (j = 0; j < |Rk |; j++) { 7 (dlk, c) = TestToggleDeadlock(Rk j , t); FSM consists of 25 latches and contains only 68 latches in 8 if (dlk) { its COI. This design has a livelock group. However, the j j 9 tracek = GenerateTrace(C, Rk , s, Ok ); livelock was not detected when N=30 and N=60. The live- j 10 ReportToggleDeadlock(s, Rk , tracejk , c); lock was detected only when all the latches in the COI were 11 } included in the abstract machine. In other words, the ab- 12 } 13 remaining = remaining ∧ ¬reachedk ; stract machine is the concrete machine at the FSM point of 14 s = PickOneState(remaining); view. Since the livelock was found on the concrete machine, 15 k++; trace concretization is not required since the abstract trace 16 } } in Section 4.2 is already a concrete trace. Figure 9: Checking toggle deadlock. D2 is the only design showing a significant performance difference between N ew1 and N ew2 in the table. This is be- 6. EXPERIMENTAL RESULTS cause this design has many transient states as well as many We have implemented the proposed livelock checking and livelock groups. In this case, the trimming technique signif- toggle deadlock checking algorithms. Table 1 shows our ex- icantly reduced the number of image/pre-image operations perimental results on livelock and toggle deadlock checking, from 386K to 70K (5.5X reduction) that gave big speed-up generated on a 1.4 GHz Intel processor machine with 4 GB from 5 hours to 1 hour (5X speed-up). This shows that the memory running Red Hat Linux. trimming technique helps the performance when there are The first column lists the design names. The next five many transient states. When there is no transient states, columns present the statistics on the designs, in terms of the the trimming technique becomes a pure overhead as shown number of latches (L), the number of inputs (I), the number in D3. However, the overhead is almost negligible from the of latches in FSM (F ), the number of toggle signals to check experiment. (T ), and the number of latches in the COI of either FSM In D3, there are two FSMs (F 1 and F 2). F 1 is composed and a toggle signal (COI). The next three columns show the of 2 latches and a livelock was found with N =30 within 49 results on livelock and toggle deadlock checking. The col- seconds. The livelock was justified by trace concretization umn with N shows how many latches were in the abstract that took 397 seconds, and the trace length was 66. F 2 is machine. The column with Llk shows how many livelock composed of 4 latches and a livelock was found with N =4 groups are found and the column with Dlk shows how many (the FSM itself) in 9 seconds. The livelock was also justi- toggle deadlock are found. The next six columns compare fied by trace concretization that took 114 seconds, and the the performance between two methods (N ew1 and N ew2), trace length was 14. We have also tried the toggle dead- in terms of time(T ime), memory(M em), and the number of lock checking on F 2 separately from the livelock checking. [5] M. Case, H. Mony, J. Baumgartner, and R. Kanzelman. A toggle deadlock was found in 90 seconds and the concrete Enhanced verification by temporal decomposition. In Formal Methods in Computer Aided Design, pages 37–54, 2009. trace was generated in 131 seconds. D3 shows the value of [6] H. Cho, G. D. Hachtel, E. Macii, M. Poncino, and F. Somenzi. abstraction-based livelock and toggle deadlock checking. Automatic state space decomposition for approximate fsm Table 2 shows a comparison on finding all SCCs with four traversal based on circuit analysis. IEEE Transactions on algorithms (XB [28], Lockstep [3], Skeleton [12], IXB [20]) Computer-Aided Design, 15(12):1451–1464, Dec. 1996. [7] O. Coudert and J. C. Madre. A unified framework for the on the design D2 from Table 1. In this design, the number of formal verification of sequential circuits. In Proceedings of the recurrent states is 2.07e8 and the number of transient states International Conference on Computer-Aided Design, pages is 1.2e6 that is only 0.6% of all states. However, it turned out 126–129, Nov. 1990. [8] O. G. E. M. Clarke and D. Peled. Model Checking. The MIT that how to handle these transient states efficiently is the key Press, 1999. factor in the performance. One main difference between XB [9] N. Een and N. Sorensson. MiniSat. and IXB is that IXB trims out those transient states as much http://www.cs.chalmers.se/Cs/Research/FormalMethods/ as possible. This trimming technique makes the IXB method MiniSat. [10] E. A. Emerson and C. Lei. Modalities for model checking: outperform on this design: faster in time (more than 15X) Branching time logic strikes back. Science of Computer and fewer number of image operations (more than 10X) than Programming, 8:275–306, 1987. the other methods. This explains why N ew2 outperformed [11] E. A. Emerson and C.-L. Lei. Efficient model checking in on D2 in Table 1. Table 2 also shows why livelock checking is fragments of the propositional mu-calculus. In Proceedings of the First Annual Symposium of Logic in Computer Science, done by finding TSCCs instead of SCCs. Finding all livelock pages 267–278, June 1986. groups took 54 minutes, whereas finding all SCCs took 100 [12] R. Gentilini, C. Piazza, and A. Policriti. Computing strongly minutes (2X) even with IXB. connected components in a linear number of symbolic steps. In SODA ’03: Proceedings of the fourteenth annual ACM-SIAM Method Time Memory Ops SCCs States symposium on Discrete algorithms, pages 573–582, 2003. XB 84:07:56 98.2 1013333 [13] G. D. Hachtel, E. Macii, A. Pardo, and F. Somenzi. Markovian Lockstep 45:55:53 237.3 2590724 19458 2.08e8 analysis of large finite state machines. IEEE Transactions on Skeleton 26:01:54 266.5 2609008 Computer-Aided Design, 15(12):1479–1493, Dec. 1996. IXB 1:39:47 92.5 102990 [14] R. Hojati, H. Touati, R. P. Kurshan, and R. K. Brayton. Efficient ω-regular language containment. In Computer Aided Table 2: Finding all SCCs in D2. Verification, pages 371–382, Montréal, Canada, June 1992. [15] R. P. Kurshan. Computer-Aided Verification of Coordinating 7. CONCLUSIONS Processes. Princeton University Press, Princeton, NJ, 1994. We have presented a framework for abstraction-based live- [16] L. Lamport. Proving the correctness of multiprocess programs. lock and toggle deadlock checking, in order to handle large IEEE Transactions on Software Engineering, SE-3(2):125–143, Mar. 1977. designs in practice. Since exact livelock and toggle deadlock [17] Y. Matsunaga, P. C. McGeer, and R. K. Brayton. On checking is infeasible on real designs directly, our approach computing the transitive closure of a state transition relation. is to check livelock and toggle deadlock on abstract machine In Proceedings of the Design Automation Conference, pages of either an FSM or a toggle signal. Once we find a livelock 260–265, June 1993. [18] K. Nanshi and F. Somenzi. Constraints in one-to-many or toggle deadlock, we justify the livelock or toggle deadlock concretization for abstraction refinement. In Proceedings of the on the concrete machine by concretizing the abstract trace Design Automation Conference, pages 569–574, 2009. on the concrete machine. [19] S. Qadeer, R. K. Brayton, V. Singhal, and C. Pixley. Latch Even though the proposed approach does not prove the redundancy removal without global reset. In Proceedings of the International Conference on Computer Design, pages non-existence of livelock or toggle deadlock on a design un- 432–439, 1996. less the design is small enough to handle, this approach finds [20] K. Ravi, R. Bloem, and F. Somenzi. A comparative study of livelocks or toggle deadlocks on the design if there exists. symbolic algorithms for the computation of fair cycles. In W. A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods To the best of our knowledge, it is the first approach to in Computer Aided Design, pages 143–160. Springer-Verlag, use the abstraction-based livelock checking and also the first Nov. 2000. LNCS 1954. approach for checking toggle deadlock. The experimental re- [21] V. Singhal. Design replacements for sequential circuits. Ph.D. sults showed that the abstraction-based approach finds live- dissertation, University of California at Berkeley, 1996. [22] V. Singhal, C. Pixley, A. Aziz, and R. K. Brayton. Theory of lock errors on the real designs. safe replacements for sequential circuits. IEEE Transactions As future work, we are interested in improving the con- on Computer-Aided Design, 20(2):249–265, Feb. 2001. cretization, finding more accurate influential latches, and [23] R. Tarjan. Depth first search and linear graph algorithms. optimizing the computations with multiple FSMs or toggle SIAM Journal of Computing, 1:146–160, 1972. [24] H. J. Touati, R. K. Brayton, and R. P. Kurshan. Testing signals by considering the overlaps in their COIs. language containment for ω-automata using BDD’s. Information and Computation, 118(1):101–109, Apr. 1995. 8. REFERENCES [25] M. Y. Vardi and P. Wolper. An automata-theoretic approach to [1] A. Biere, C. Artho, and V. Schuppan. Liveness checking as automatic program verification. In Proceedings of the First safety checking. In International Workshop in Formal Symposium on Logic in Computer Science, pages 322–331, Methods for Industrial Critical Systems, pages 160–177, 2002. Cambridge, UK, June 1986. [2] A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symbolic model [26] T.-H. Wang and T. Edsall. Practical FSM analysis for verilog. checking without BDDs. In Fifth International Conference on In IVC-VIUF ’98: Proceedings of the International Verilog Tools and Algorithms for Construction and Analysis of HDL Conference and VHDL International Users Forum, Systems (TACAS’99), pages 193–207, Amsterdam, The pages 52–58, 1998. Netherlands, Mar. 1999. LNCS 1579. [27] A. Xie and P. A. Beeral. Efficient state classification of [3] R. Bloem, H. Gabow, and F. Somenzi. An algorithm for finite-state markov chains. IEEE Transactions on strongly connected component analysis in n log n symbolic Computer-Aided Design, 17(12):1334–1339, Dec. 1998. steps. In Formal Methods in Computer Aided Design, pages [28] A. Xie and P. A. Beeral. Implicit enumeration of strongly 37–54, 2000. connected components and an application to formal [4] R. Bryant. Graph-based algorithms for boolean function verification. IEEE Transactions on Computer-Aided Design, manipulation. IEEE Transactions on Computers, 19(10):1225–1230, Oct. 2000. C-35(8):677–691, Aug. 1986.