Rank-Based Simulation on Acyclic Graphs Raffaella Gentilini1 , Carla Piazza2 , and Alberto Policriti2 1 Dip. di Matematica e Informatica, Università di Perugia, Via Vanvitelli 1, Perugia (IT) 2 Dip. di Matematica e Informatica, Università di Udine, Via Le Scienze 206, Udine (IT). {raffaella.gentilini}@dmi.unipg.it, {piazza|policriti}@dimi.uniud.it Abstract. The simulation preorder is widely used both as a behavioral relation in concurrent systems, and as an abstraction tool to reduce the state space in model checking, were memory requirement is clearly a critical issue. Therefore, in this context a simulation algorithm should address both time and space efficiency. In this paper, we rely on the notion of rank to design an efficient simulation algo- rithm. It turns out that such algorithm outperforms—both in terms of time and in terms of space—the best simulation algorithms in the literature, on the class of acyclic graphs. 1 Introduction The simulation preorder [12] is a behavioral refinement relation on labeled graphs, widely used as a formal tool supporting the design and the automated reasoning on complex systems. In particular, simulations plays a role in two tasks that are often cru- cial to guarantee the success of a formal method for system design or computer aided verification: the system refinement and the system abstraction [10]. In this context, the behavior of a system or a set of programs implementing a collection of cooperating units is naturally modeled as a (labeled) graph, whose nodes describe the possible states and arrows represent actions. Given a specification of a system as a labeled graph G1 , the simulation preorder provides a formal tool for checking wether G1 is correctly im- plemented (or refined) by the concrete system G2 . Moreover, the induced equivalence can be used as an abstraction tool to cope with the intricacies buried in the modeling activity and to control the sheer size of the obtained structures. In particular, space re- quirements underly the notorious state-explosion problem in model checking [5], a fully automatic (and quite efficient in time) formal method for verifying finite-state systems1 with respect to temporal logics specifications. Abstraction methods for model checking are required to be preservative with respect to the logic language used for specifying the properties of the system. An abstraction method is said to be weakly preservative for a temporal logic L if whenever a property p of L is true in the abstract structure, p holds also in the concrete model. An abstraction method is said to be strongly preser- vative for a temporal logic L if both true and false L-properties are preserved from the abstract structure to the concrete model. Grumberg et al. [13] proved that the simula- tion preorder is weakly preservative for ACTL∗ and ACTL, the universal fragments of 1 The labeled graphs used to model the system under verification are called Kripke structures in the context of Model Checking the branching temporal logics CTL and CTL∗ [4], as well as for the universal frag- ment of the µ-calculus. In [11], it was shown that the simulation equivalence strongly preserves both the universal and the existential fragment of the µ-calculus. As a conse- quence, it strongly preserves its sublogics ACTL∗ , ECTL∗ , ECTL and ACTL, widely used for model checking. The latter preservation results combined with the existence of a number of polynomial algorithms for computing (the maximal) simulation on a labeled graph [2, 3, 6, 10], explains the appealing of simulation-based abstraction meth- ods in model checking, also w.r.t. other popular behavioral refinement relations such as language equivalence and bisimulation [16]. In fact, language equivalence provides strong preservation of linear temporal properties and large reductions, however its com- plexity is exponential, whereas the complexity of bisimulation and simulation is poly- nomial [14, 7, 8]. On the other hand, bisimulation has the advantage (w.r.t. simulation and language equivalence) of preserving more expressive logics. However this is also a disadvantage, since the abstract structure is required to be so close to the original model that the reductions allowed are far less powerful. State of the Art Among the algorithms for computing the simulation preorder, the most well known ones are by Henzinger, Henzinger and Kopke [10], Bloom and Paige [1], Bustan and Grumberg [2, 3], Tan and Cleaveland [6], Gentilini, Piazza, and Poli- criti [9, 20], and Ranzato and Tapparo [17, 18]. Given a (labelled) graph G with |V | nodes and |E| edges, let |V≡S | be the size of the maximum simulation (equivalence) on G. The algorithm by Ranzato and Tapparo [17] runs in O(|E||V≡S |) time and O(|E||V ||V≡S |) space. It is the best up-to-date simulation procedure as far as time complexity is concerned. On the other hand, the algorithm in [9] (that originally had a minor flow, subsequently corrected in [20]) has the best up-to-date space complexity— O(|V≡S |2 +|V | log(|V≡S ))—and runs in O(|E||V≡S |2 ) time. In [18], Ranzato and Tap- paro proposed a new simulation algorithm featuring an improvement w.r.t. the space- complexity of their previous procedure, while slightly worsening the time-performance (of a cubic factor w.r.t. |V≡S |). Our Contribution We propose a simulation algorithm that has optimal performances w.r.t. both time and space on acyclic graphs, outperforming [17, 18, 9]. Namely, our algorithm uses O(|E||V≡S |) time and O(|V≡S |2 + |V | log(|V≡S |)) bits to compute a simulation preorder on a given acyclic graph. The time/space improvement w.r.t. [17, 18, 9] relies on computing the maximum simulation proceeding by rank. 2 Preliminaries In this section we introduce the basic notations we use in the rest of the paper. Definition 1. Let V be a set and Q ⊆ V × V a binary relation over V : – Q is said to be a preorder over V if and only if Q is reflexive and transitive; – Q is said to be a partial order over V if and only if Q is reflexive, antisymmetric, and transitive; – Q is said to be acyclic if and only if its transitive closure is antisymmetric. We will use Q+ to refer to the transitive closure of Q and Q∗ to refer to the reflexive and transitive closure of Q. Notice that if a relation is acyclic, then it is antisymmetric, while the converse does not hold (unless it is transitive). Definition 2. A triple G = hV, E, Σi is said to be a labelled graph if and only if G− = hV, Ei is a finite graph and Σ is a partition over V . We say that two nodes v1 , v2 ∈ V have the same label if they belong to the same class of Σ. An equivalent way to define labelled graphs is to use a labelling function ℓ : V → L, where L is a finite set of labels (inducing of a partition ΣL of V ). Given a node v ∈ V we will use [v]Σ (or [v], if Σ is clear from the context) to denote the class of Σ to which v belongs. Example 1. A Kripke Structure is a labelled graph and, vice-versa, each connected la- belled graph can be seen as a Kripke Structure in which two worlds satisfy the same set of atomic propositions if and only if their labels are equal. Definition 3. Let G = hV, E, Σi be a labelled graph. A relation ≤⊆ V × V is said to be a simulation over G if and only if: 1. v ≤ u → [v]Σ = [u]Σ ; 2. (v ≤ u ∧ vEv1 ) → ∃u1 (uEu1 ∧ v1 ≤ u1 ). In this case we also say that u simulates v. We say that u and v are sim-equivalent (u ≡s v) if there exist two simulations ≤1 and ≤2 , such that n ≤1 m and m ≤2 n. Notice that a simulation can be neither reflexive nor transitive (e.g. the empty relation is always a simulation), however the reader can easily verify that given an arbitrary simulation its reflexive and transitive closure is always a simulation. A simulation ≤s over G is said to be maximal if for all the simulations ≤ over G it holds ≤⊆≤s . Given a labelled graph G = hN, E, Σi there always exists a unique maximal simulation ≤s over G. Moreover ≤s is a preorder [12]. Example 2. Consider the labelled graph G = hV, E, Σi depicted in Figure 1, where V = {x, y, z}, E = {(x, y), (x, z), (y, z)}, and Σ = {α = {x, y}, β = {z}}. The maximum simulation preorder on G is given by I ∪ {(y, x)}, where I denotes the identity relation over V . α x y z β Fig. 1. A labelled graph. Given a labelled graph G, the simulation problem consists in determining the max- imum simulation preorder on G, and can be elegantly encoded in terms of a coars- est partition pair problem [9]. Such a formulation is the engine of the space efficient procedure in [9, 20] and relies on the fundamental notions of partition pair (PP), PP refinement and PP stability, recalled below. Definition 4 (Partition Pairs). Let V be a set. A partition pair on V is a pair hΣ, Ri, where Σ is a partition on V and R is a reflexive relation on Σ. Given a set V , each preorder relation P on V induces a corresponding partition pair hV≡P , P i, where ≡P is the equivalence relation ≡P = {(u, v) | u ≺P v ∧ v P u}, and P = {(α, β) ∈ V≡P | ∃u ∈ α, ∃v ∈ β.(u P v)}. In particular, given a labelled graph G = hV, E, Σi, we denote by hV≡S , Si the partition pair on V corresponding to the maximum simulation preorder S of G consistent w.r.t. Σ. Definition 5. Let hΣ, Ri, hΠ, P i be two partition pairs on V : hΠ, P i ⊑ hΣ, Ri ⇔ Π is finer than Σ and P ⊆ R(Π) where R(Π) denotes the relation on Π induced by R ⊆ Σ × Σ, i.e.: ∀α, β ∈ Π((α, β) ∈ R(Π) ⇔ ∃α′ , β ′ ((α′ , β ′ ) ∈ R ∧ α ⊆ α′ ∧ β ⊆ β ′ )) Given two sets of nodes α, β ⊆ V we write α →∃ β to denote that there exists a node a ∈ α which reaches a node b ∈ β, i.e., (a, b) ∈ E. Similarly, α →∀ β denotes that each node in α reaches a node in β. Definition 6 (Stability). Let G = hV, Ei be a graph, let hΣ, Ri be a partition pair on V . hΣ, Ri is said stable w.r.t. the transition relation of the graph E iff: [ [ ∀α, β ∈ Σ(α →∃ β ⇒ {δ | δ ∈ R(α)} →∀ {δ | δ ∈ R(β)} Definition 7 (Coarsest Partition Pair Problem (CPPP)). Let G = hV, E, Σi be a labelled graph, and consider the identity relation I on Σ. The coarsest partition pair problem asks to determine the coarsest partition pair hΠ, P i ⊑ hΣ, Ii stable w.r.t. E. Lemma 1 (CPPP as Simulation Problem). Let G = hV, E, Σi be a labelled graph. The coarsest partition pair problem is well defined and admits as unique solution the partition pair hV≡S , Si, corresponding to the maximum simulation preorder on G con- sistent w.r.t. Σ. Proof. We show that the unique solution to the CPPP is the partition pair hV≡S , Si ⊑ hΣ, Ii corresponding to the maximum simulation preorder S on G = hV, Ei consis- tent w.r.t. Σ. We start by proving that hV≡S , Si is stable w.r.t. E. Let α, β ∈ V≡S and ′ ′ assume that α →∃ β. Then, there S exist two nodes s ∈ α, s ∈ β such ′that s → s . Consider an arbitrary node p ∈ {δ | δ ∈ S(α)}. Since S s S p and s → s , there exists a node p′ suchSthat p → p′ and s′ S p′ . Hence, S p ′ ∈ {δ | δ ∈ S(β)}. S Our arbitrary choice of p ∈ {δ | δ ∈ S(α)} guarantees that {δ | δ ∈ S(α)} →∀ {δ | δ ∈ S(β)}, i.e. hV≡S , Si is stable w.r.t E. To conclude our thesis, assume by absurd that there exists a partition pair hΠ, P i ⊑ hΣ, Ii stable w.r.t. E and such that ¬(hΠ, P i ⊑ hV≡S , Si). Consider the relation rank (s′ ). Let r = rank (s) > rank (s′ ). SinceVrank (s) = r, we can determine a sequence of r nodes s1 , . . . , sr such that s → s1 1≤i1 , denote by γi−1 the only class in Σi−1 such that γi−1 ⊇ γ. 1. In order to prove our inductive step for item (3), consider hΣi=rk>1 , simi=rk>1 i and let α ∈ Σi such that α → β ∧Srank (β) < i − 1. By rankS(β) < i − 1 and Lemma 4 we have βi−1 = β and {δ | (β, δ) ∈ simi } = {δ | (βi−1 , δ) ∈ simi−1 } = simβ. Hence, if rank (β) < (i − 1) S − 1 we can conclude our thesis exploiting the inductive hypothesis for which {δ | (αi−1 , δ) ∈ simi−1 } →∀ S Otherwise, assume rank (β) = i − 1 and suppose by contradiction that simβ. ¬( {δ | (α, δ) ∈ simi } →∀ simβ). Let α∗ ⊃ α be the superclass of α at the moment in which β gets selected at line 13 with rk = i − 1. Within the execution of the most internal foreach-loop at line 14, if α∗ contains some state that does not reach simβ, then α∗ gets split into the two subclasses: α∗1 := α∗ \ simβ and α∗ := α∗ \ α1 . By α →∃ β ⊆ simβ, we have that the statement α ⊆ α∗ is true both before and after such a split. Moreover, the loop at line 19 processes each class γ ∈ sim∗i−1 (α∗ ⊇ α). If γ contains some state that does not reach simβ, then γ gets split into the two subclasses γ1 := γ \simβ and γ := γ \α1 , and γ1 is removed from the simulators of α∗ ⊇ α (line 23). Hence, once β have been considered at line 13 within the i − 1-th execution of the for-loop at line 12, we have that each node belonging to a class simulating α∗ has a successor in simβ. Each subsequent refinement of α∗ or its set of simulators will S maintain this property, and thus we get to the contradiction of our assumption ¬( {δ | (α, δ) ∈ simi } →∀ simβ 2. We now proceed proving the inductive step for item (1). Let i = rk > 1 and consider the hΣi , simi i. Let v ∈ V such that rank (v) < rk − 1. Then, item (1) holds by Lemma 4 and by inductive hypothesis. Let v such that rank (v) = rk − 1. Then, by inductive hypothesis on item (2) we have: [ [ [v]≡S ⊆ α ∈ Σ ∧ {u | v S u} ⊆ {β | (α, β) ∈ sim} (1) Moreover, the inductive step already proved on item (3) ensures that: [ [ (α →∃ β ∧ rank (β) < rk − 1) ⇒ {γ | (α, γ) ∈ sim} →∀ {γ | (β, γ) ∈ sim} (2) Since rank (α) = rk − 1, any class reached by α has rank strictly lower than rk − 1. Hence, equations 1 and 2 guarantee that: [ [ [v]≡S = α ∈ Σ ∧ {u | v S u} = {β | (α, β) ∈ sim} completing our inductive step for item (1). 3. Let i = rk > 1 and consider v such that rank(v) ≥ rk. If rank(v) > rk, than [v]Σi = [v]Σi+1 ⊆ α by inductive hypothesis. If rank(v) = rk, there are two cases to consider. In the first case, [v]Σi = [v]Σi+1 (i.e. the class of v gets not split within the i-th iteration of the loop at line 13) and we are done. In the second case, [v]Σi gets split into α, α1 within the i-th execution of the loop at line 13. By contradiction, assume there exist two states u ∈ α, u′ ∈ α1 such that u ≡S u′ . By definition of α, α1 , u has a successor into a class β of rank r < rk, and u′ has no successor in any class β ′ ∈ sim(β). By inductive hypothesis, this implies that there exists z such that (u, z) ∈ E and u′ has no successor z ′ such that z ≤S z ′ , contradicting our hypothesis u ≡S u′ . ⊔ ⊓ 4.2 Complexity We finally establish the complexity of our simulation algorithm on acyclic graphs. In particular, Theorem 2 shows that our procedure uses O(|E||V≡S |) time and requires O(|V≡S |2 + |V | log(|V≡S |)) bits to compute a simulation preorder on a given acyclic labelled graph G = hV, E, Σi. Therefore, it has optimal performances w.r.t. both time and space on acyclic graphs, outperforming [17, 18, 9]. Theorem 2. The algorithm SolveCPPP(G = hV, Ei, hΣ, Ii) performs O(|V≡S ||E|) steps and uses O(|V≡S |2 + |V | log(|V≡S |)) bits to compute the solution to the coarsest partition pair problem hV≡S , Si. Proof. Let r = max{rank (v) | v ∈ V }. The cost of the while-loop at line 3 is O(r ∗ |E|) = O(|V≡S ||E|). The cost of the loop at line 12, excluded the execution of the innermost if-statement at line 21, is: [ r O(Σ1=1 Σβ∈V≡ ,rank (β)=i−1 (|pre(β)| ∗ |Σi+1 | + |pre( {δ | (β, δ) ∈ sim}|) = S = O(|V≡S ||E|) In fact, consider a class β such that rank (β) = i − 1. It is possible to distinguish with S marks the classes that reach (resp.S do not reach/reach with all their nodes) the set {δ | (β, δ) ∈ sim} at the cost O(pre( {δ | (β, δ) ∈ S sim}). Moreover, the same cost allows one to appropriately mark each node in pre( {δ | (β, δ) ∈ sim}. Then, fixed β, rank (β) = i−1, the cost S of executing lines 14–24 without considering the innermost if-statement, is O(|pre( {δ | (β, δ) ∈ sim}| + Σα∈pre(β) |sim(α)|). The innermost if-statement at lines 21–23 is executed only upon the creation of a new class γ1 and cost globally O(|V≡S ||E|). In fact, each execution of lines 21–23 for the creation of the new classes γ1 , γ \ γ1 from γ, requires only to scan the nodes in γ and the classes in sim(γ), sim−1 (γ). As far as space complexity is concerned we refer to bit complexity without consider- ing the space required by the graph G, since it is never modified (see, e.g., [15, 2, 3]). In particular, Σ is stored through an array of length |V | associating to each node its class. Hence, it requires O(|V | log(|V≡S |)) bits. notRanked is a |V | array of bits, labeling with 1 the nodes which do not have a rank. rank is stored in a |V | array of lists, where the ith list keeps the classes at rank i. Hence it requires O(|V | + |V≡S | log(|V≡S |)) bits. Finally, the relation sim is stored in a bit matrix whose size grows up to O(|V≡S |2 ) bits. ⊔ ⊓ 5 Conclusion We presented an algorithm for computing the maximum simulation quotient and the simulation preorder on acyclic graphs. Our algorithm is based on a characterization of the simulation problem as coarsest partition pair problem and on the notion of rank. The notion of rank is a standard one in well-founded set theory (see, e.g., [19]) and has been exploited in [7, 8] to define an algorithm for computing the maximum bisimulation. The algorithm presented in [7, 8] has a linear time complexity in the acyclic case, while in the general case it allows one to focus the computation on subgraphs of the given graph. The algorithm we presented here has optimal space/time performances on acyclic graphs. On the one hand, its generalization to the general case is still an open problem. Unfortunately, when moving from acyclic to cyclic graphs the notion of simulation is not “compositional”, since loops allow to simulate paths of arbitrary length. On the other hand, another open problem concerns the possibility of developing a linear time simulation algorithm for the acyclic case. Again, there seems to be an in- trinsic higher complexity in the notion of simulation w.r.t. bisimulation which does not allow to reach the linear time complexity. In particular, while the notion of bisimulation on acyclic graphs corresponds to equality on well-founded sets, simulation in terms of set theory is a sort of recursive inclusion. References 1. Bard Bloom and Robert Paige. Transformational design and implementation of a new efficient solution to the ready simulation problem. Science of Computer Programming, 24(3):189–220, June 1995. 2. D. Bustan and O. Grumberg. Simulation based minimization. In D.A. McAllester, editor, Proc. 17th Int’l Conference on Automated Deduction (CADE’00), volume 1831 of LNCS, pages 255–270. Springer, 2000. 3. D. Bustan and O. Grumberg. Simulation based minimization. ACM Transactions on Com- putational Logic (TOCL), 4:181–206, 2003. 4. E. Clarke and E. Emerson. Design and synthesis of synchronization skeletons using branching-time temporal logics. Logic of Programs, pages 52–71, 1981. 5. E. Clarke, O. Grumberg, and D. Peled. Model Checking. Elsevier/MIT press, 2001. 6. R. Cleaveland and L. Tan. Simulation revisited. In T. Margaria and W. Yi, editors, Proc. 7th Int’l Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’01), volume 2031 of LNCS, pages 480–495. Springer, 2001. 7. A. Dovier, C. Piazza, and A. Policriti. A fast bisimulation algorithm. In G. Berry, H. Comon, and A. Finkel, editors, Proceedings of Computer Aided Verification (CAV’01), volume 2102 of LNCS, pages 79–90. Springer, 2001. 8. A. Dovier, C. Piazza, and A. Policriti. An efficient algorithm for computing bisimulation equivalence. Theoretical Computer Science, 311:221–256, 2004. 9. R. Gentilini, C. Piazza, and A. Policriti. From bisimulation to simulation: Coarsest partition problems. J. Autom. Reasoning, 31(1):73–103, 2003. 10. M. R. Henzinger, T. A. Henzinger, and P. W. Kopke. Computing simulations on finite and infinite graphs. In 36th Annual Symposium on Foundations of Computer Science (FOCS’95), pages 453–462. IEEE Computer Society Press, 1995. 11. K. Laiseaux, S. Graf, J. Sifakis, A. Bouajjani, and S. Bensalem. Preserving abstractions for the verification of concurrent systems. Formal Methods in System Design, 6(1):11–44, 1995. 12. R. Milner. A calculus of communicating systems. In G. Goos and J. Hartmanis, editors, Lecture Notes on Computer Science, volume 92. Springer, 1980. 13. O. Grumberg and D.E. Long. Model checking and modular verification. ACM Transactions on Programming Languages and systems, 16(3):843–871, May 1994. 14. R. Paige and R. E. Tarjan. Three partition refinement algorithms. SIAM Journal on Comput- ing, 16(6):973–989, 1987. 15. C. H. Papadimitriou. Computational complexity. Addison-Wesley Publishing Company, Inc., 1994. 16. D. Park. Concurrency on automata and infinite sequences. Theoretical Computer Science, pages 167–183, 1981. 17. F. Ranzato and F. Topparo. A new efficient simulation equivalence algorithm. In Proceedings of Logics in Computer Science (LICS’07), pages 171–180, 2007. 18. F. Ranzato and F. Topparo. Saving space in a time efficient simulation algorithm. In Proceed- ings of Int. Conference on Application of Concurrency to System design (ACSD’09), pages 60–69, 2009. 19. J. E. Rubin. Set Theory for the Mathematician. New York: Holden-Day, 1967. 20. R. van Glabbeek and B. Ploeger. Correcting a space-efficien simulation algorithm. In Pro- ceedings of Int. Conference on Computer Aided Verification (CAV’08), pages 517–529, 2008.