Decision Diagrams for Petri Nets: which Variable Ordering? Elvio Gilberto Amparore1 , Susanna Donatelli1 , Marco Beccuti1 , Giulio Garbi1 , Andrew Miner2 1 Università di Torino, Dipartimento di Informatica, Italy {amparore,susi,beccuti}@di.unito.it 2 Iowa State University, Iowa, USA asminer@iastate.edu Abstract. The efficacy of decision diagram techniques for state space generation is known to be heavily dependent on the variable order. Or- dering can be done a-priori (static) or during the state space generation (dynamic). We focus our attention on static ordering techniques. Many static decision diagram variable ordering techniques exist, but it is hard to choose which method to use, since only fragmented information about their performance is available. In the work reported in this paper we used the models of the Model Checking Contest 2016 edition to provide an extensive comparison of 14 different algorithms, in order to better un- derstand their efficacy. Comparison is based on the size of the decision diagram of the generated state space. The state space is generated using the Saturation method provided by the Meddly library. Keywords: decision diagrams, static variable ordering, heuristic opti- mization, saturation. 1 Introduction Binary Decision diagram (BDD) [10] is a well-known data structure that was ex- tensively used in industrial hardware verification thanks to its ability of encoding complex boolean functions on very large domains. In the context of discrete event dynamic systems in general, and of Petri nets in particular, BDDs and its ex- tensions (e.g. Multi-way Decision Diagram -MDD) were proposed to efficiently generate and store the state space of complex systems. Indeed, symbolic state space generation techniques exploit Decision Diagrams (DDs) because they al- low to encode and manipulate entire sets of states at once, instead of storing and exploring each state explicitly. The size of DD representation is known to be strongly dependent on vari- able orders: a good ordering can significantly change the memory consumption and the execution time needed to generate and encode the state space of a system. Unfortunately finding the optimal variable ordering is known to be NP- complete [9]. Therefore, efficient DD generation is usually reliant on various heuristics for the selection of (sub)optimal orderings. In this paper we will only 32 PNSE’17 – Petri Nets and Software Engineering consider static variable ordering, i.e. once the variable ordering l is selected, the MDD construction starts without the possibility of changing l. In the litera- ture several papers were published to study the topic of variable ordering. An overview of these works can be found in [25], and in the recent work in [19]. In particular the latter work considers a new set of variable ordering algorithms, based on Bandwidth-reduction methods [27], and observes that they can be successfully applied for variable ordering. We also consider the work published in [18] (based on the ideas in [28]), which are state-of-the-art variable ordering methods specialized for Petri nets. The motivation of this work was to understand how these different algorithms for variable orderings behave. Also, we wanted to investigate whether the avail- ability of structural information on the Petri net model could make a difference. As far as we know there is no extensive comparison of these specific methods. In particular we have addressed the following research objectives: 1. Build an environment (a benchmark ) in which different algorithms can be checked on a vast number of models. 2. Investigate whether structural information like P-semiflows can be exploited to define better algorithms for variable orderings. 3. What are the right metrics to compare variable ordering algorithms in the most fair manner. To achieve these objectives we have built a benchmark in which 14 different algorithms for variable orderings have been implemented and compared on state space generation of the Petri nets taken from the models of the Model Checking context (both colored and uncolored), 2016 edition [16]. The implementation is part of RGMEDD [6], the model-checker of GreatSPN [5], and uses MDD saturation [12]. The ordering algorithms are either taken from the literature (both in their basic form and with a few new variations) or they were already included in GreatSPN. Figure 1, left, depicts the benchmark workflow. Given a net system S = (N , m0 ) all ordering algorithms in A are run (box 1), then the reachability set RSl of the system is computed for each ordering l ∈ L (box 2) and algorithms are ranked according to some MDD metrics MM(RSl ), (box 3). The best algorithm a∗ is then the best algorithm for solving the PN system S = (N , m0 ) (box 4) and its state space RSl could be the one used to check properties. This workflow allows to: 1) provide indications on the best performing algo- rithm for a given model and 2) compare the algorithms in A on a large set of models to possibly identify the algorithm with the best average performances. The problem of defining a ranking among algorithms (or of identifying the “best” algorithm) is non-trivial and will be explored in Section 3. Figure 1, right, shows a high level view of the approach used to compare variable ordering algorithms in the benchmark. Columns represents algorithms, and rows represents model instances, that is to say a Petri net model with an associated initial marking. A square in position (j, k) represents the state space generation for the j th model instance done by GreatSPN using the variable Amparore et.al.: Decision Diagrams for Petri Nets: which Variable Ordering? 33 instances. System hN , m0 i Static Variable Ordering Algorithms Model Models: z }| { Run variable ordering (1) a1 a2 a3 a4 ah 1 ah 9 ( algorithm a 2 A in ··· =Model instances model mk that are not solved L = {set of instances ;by any algorithm orderings} in 1 ··· in A. 9 8 l 2 L: reorder V (2) in 2 ··· > > > > according to l and .. .. .. .. .. .. > = Model instances .. . . . . . . that are solved compute MDD RSl . . i6 ··· > by some > > algorithms in A. {RSl | l 2 L} 8 > > i5 ··· ; > > Rank algorithms A (3) model m2 i < 9 ··· > > according to MDD 4 > > instances > > > > metrics of {RSl }. : > = Model instances i3 ··· that are solved a⇤ 2 A ( > by all algorithms i2 ··· > > in A. model m1 > > > a⇤ is best algorithm (4) instances i1 ··· > ; for hN , m0 i. Assign score points to a. = instance solved by an . = instance not solved by an . Fig. 1. Workflow for analysis and testing of static variable ordering algorithms. ordering computed by algorithm ak . A black square indicates that the state space was generated within the given time and memory limits. In the analysis of the results from the benchmark we shall distinguish among model instances for which no variable ordering was good enough to allow Great- SPN to generate the state space (only white squares on the model instance row, as for the first two rows in the figure), model instances for which at least one variable ordering was good enough to generate the state space (at least one black square in the row), and model instances in which GreatSPN generates the state space with all variable orderings (all black squares in the row), that we shall consider “easy” instances. In the analysis it is also important to distinguish whether we evaluate order- ing algorithms w.r.t. all possible instances or on a representative set of them. Figure 1, right, highlights that instances are not independent, since they are often generated from the same “model” that is to say the same Petri net N by varying the initial marking m0 or some other parameter (like the cardinality of the color classes). As we shall see in the experimental part, collecting measures over all instances, in which all instances have the same weight, may lead to a distortion of the observed behaviour, since the number of instances per model can differ significantly. A measure “per model” is therefore also considered. This work could not have been possible without the models made available by the Model Checking Contest, the functions of the Meddly MDD library and the GreatSPN framework. We shall now review them in the following. Model Checking Contest. The Model Checking Contest[16] is a yearly scien- tific event whose aim is to provide a comparison among the different available verification tools. The 2016 edition employed a set of 665 PNML instances gen- erated from 65 (un)colored models, provided by the scientific community. The participating tools are compared on several examination goals, i.e. state space, 34 PNSE’17 – Petri Nets and Software Engineering reachability, LTL and CTL formulas. The MCC team has designed a score sys- tem to evaluate tools that we shall employ as one of the considered benchmark metrics for evaluating the algorithms, as evaluating the orderings can be reduced to evaluating the same tool, GreatSPN, in as many variations as the number of ordering algorithms considered. Meddly library. Meddly (Multi-terminal and Edge-valued Decision Diagram Li- brarY) [8] is an open-source library implementation of Binary Decision Diagrams (BDDs) and several variants, including Multi-way Decision Diagrams (MDDs, implemented “natively”) and Matrix Diagrams (MxDs, implemented as MDDs with an identity reduction rule). Users can construct one or more forests (col- lections of nodes) over the same or different domains (collections of variables). Several “apply” operations are implemented, including customized and efficient relational product operations and saturation [12] for generating the set of states (as an MDD) reachable from an initial set of states according to a transition relation (as an MxD). Saturation may be invoked either with an already known (“pre-generated”) transition relation, or with a transition relation that is built “on the fly” during saturation [13], although this is currently a prototype im- plementation. The transition relation may be specified as a single monolithic relation that is then automatically split [23], or as a relation partitioned by lev- els or by events [14], which is usually preferred since the relation for a single Petri net transition tends to be small and easy to construct. GreatSPN framework. GreatSPN is a well-known collection of tools for the de- sign and analysis of Petri net models [5, 6]. The tools are aimed at the qualitative and quantitative analysis of Generalized Stochastic Petri Net (GSPN) [1] and Stochastic Symmetrical Net (SSN) through computation of structural proper- ties, state space generation and analysis, analytical computation of performance indices, fluid approximation and diffusion approximation, symbolic CTL model checking, all available through a common graphical user interface [4]. The state space generation [7] of GreatSPN is based on Meddly. In this paper we use the collection of variable ordering heuristics implemented in GreatSPN. This collec- tion has been enlarged to include all the variable ordering algorithms described in Section 2. The paper is organized as follows: Section 2 reviews the considered algo- rithms; Section 3 describes the benchmark (models, scores and results); Section 4 considers a parameter estimation problem for optimizing one of the best meth- ods found (Sloan) and Section 5 concludes the paper outlining possible future directions of work. 2 The set A of variable ordering algorithms In this section we briefly review the algorithms considered by the benchmark. Although our target model is that of Petri nets, we describe the algorithms in a more general form (as some of them were not defined specifically for Petri nets). We therefore consider the following high level description of the model. Amparore et.al.: Decision Diagrams for Petri Nets: which Variable Ordering? 35 Let V be the set of variables, that translates directly to MDD levels. Let E be the set of events in the model. Events are connected to input and output variables. Let V in (e) and V out (e) be the sets of input and output variables of event e, respectively. Let E in (v) and E out (v) be the set of events to which variable v participates as an input or as an output variable, respectively. For some models, structural information is known in the form of disjoint variables partitions named structural units. Let Π be the set of structural units. With Π(v) we denote the unit of variable v. Let V (π) be the set of vertices in unit π ∈ Π. In this paper we consider three different types of structural unit sets. Let ΠPSF be the set of units corresponding to the P-semiflows of the net, obtained ignoring the place multiplicity. On some models, structural units are known because they are obtained from the composition of smaller parts. We mainly refer to [17] for the concept of Nested Units (NU). Let ΠNU be this set of structural units, which is defined only for a subset of models. Finally, structural units can be derived using a clustering algorithm. Let ΠCl be such set of units. We will discuss clustering in section 2.5. Following this criteria, we subdivide the set of algorithms A into: The set AGen , that do not use any structural information; The set APSF that requires ΠPSF ; The set ANU that require ΠNU . Since clustering can be computed on almost any model, we consider method that use ΠCl as part of AGen . In our context, the set of MDD variables V corresponds to the place set of the Petri net, and the set of events is the transition set of the Petri net. Let l : V → N be a variable order, i.e. an assignment of consecutive integer values to the variables V . 2.1 Force-based orderings The FORCE heuristic, introduced in [3], is a n-dimensional graph layering tech- nique based on the idea that variables form an hyper-graph, such that variables connected by the same event are subject to an “attractive” force, while variables not directly connected by an event are subject to a “repulsive” force. It is a simple method that can be summarized as follows: Algorithm 1 Pseudocode of the FORCE heuristic. Function FORCE: Shuffle the variables randomly. repeat K times: for each event e ∈ E: 1 P compute center of gravity cog e = |e| v∈e l(v) for each variable v ∈ V : 1 P compute hyper-position p(v) = E(v) e∈E(v) cog e Sort vertices according the their p(v) value. Weight obtained variable order using metric function f (l). return the variable order that had the best metric among the K generated permutations. 36 PNSE’17 – Petri Nets and Software Engineering Algorithm 1 gives the general skeleton of the FORCE algorithm. One of the most interesting part of this method is that it can be seen as a factory of variable orders, that generates K different orders. The algorithm starts by shuffling the variables set, then it iterates K times trying to achieve a convergence. The version of FORCE that we tested uses K = 200. In our experience, FORCE does not converge stably to a single permutation on most models, and it generates a new permutation at every iteration. After having produced a candidate variable order l, a metric function f (l) → R is used to estimate the quality of that order. The chosen order is then the one that maximises (or minimises) the target metric function. In our benchmark we tested the following metric functions: – Point-Transition Spans: the PTS score is given by the sum of all distances between the center of gravities cog e and theirconnected variables. The  for- 1 P P mula is the following: PTS(l) = |E|·|V | e∈E v∈V (e) p(v) − cog e . – Normalized Event Span: the NES score [26] is based on the idea of measuring the sum of the event spans of each event e. The event span span l (e) of event e for the variable order l is defined as the largest distance in l between every pair of variables v, v 0 ∈ V (e). The metric is then defined as a normalized 1 P span(e)+1 sum of these spans: NES(l) = |E| e∈E |V | . – Weighted Event Span: WES [26] is a modified version of NES where events closer to the top of the variable order weigh more. This modification is based on the assumption that the Saturation algorithm performs better when i most events are close to the bottom  of the i order. The formula of WES is: P WESi (l) = |E| 1 e∈E span(e)+1 |V | · 2·top(e) |V | , with i = 1. In addition to the evaluation metrics, structural information of the model can be used to establish additional center of gravity. We tested three variations of the FORCE method, which are: – FORCE-*: Events are used as centers of gravity, as described in Algorithm 1, where * is one among PTS, NES or WES. – FORCE-P: P-semiflows are used as center of gravities instead of the events. The method is only tested for those models that have P-semiflows. It uses the WES metric to select between the K generated orderings. – FORCE-NU: Structural units are used as center of gravities, along with the events. This intuitively tries to keep together those variables that belong to the same structural unit. Again, WES is used for the metric function. The set A of algorithms considered in the benchmark includes: FORCE-PTS, FORCE-NES and FORCE-WES in AGen ; the method FORCE-P in APSF ; and the method FORCE-NU in ANU , for a total of 5 variations of this method. 2.2 Bandwidth-reduction methods Bandwidth-reduction(BR) methods are a class of algorithms that try to permute a sparse matrix into a band matrix, i.e. where most of the non-zero entries are Amparore et.al.: Decision Diagrams for Petri Nets: which Variable Ordering? 37 confined in a (hopefully) small band near the diagonal. It is known [11] that re- ducing the event span in the variable order generally improves the compactness of the generated MDD. Therefore, event span reduction can be seen as an equiv- alent of a bandwidth reduction on a sparse matrix. A first connection between bandwidth-reduction methods and variable ordering has been tested in [19] and in [22] on the model bipartite graph. In these works the considered BR meth- ods where: The Reverse Cuthill-Mckee [15]; The King algorithm [20]; and The Sloan algorithm [27]. The choice was motivated by their ready availability in the Boost-C++ and in the ViennaCL library. In particular, Sloan, which is the state- of-the-art method for bandwidth reduction, showed promising performances as a variable ordering method. In addition, Sloan almost always outperforms [19] the other two BR methods, but for completeness of our benchmark we have decided to replicate the results. We concentrate our review on the Sloan method only, due to its effectiveness and its parametric nature. The goal of the Sloan algorithm is to condensate the entries of a symmetric square matrix A around its diagonal, thus reducing the matrix bandwidth and profile [27]. It works on symmetric matrices only, hence it is necessary to impose some form of translation of the model graph into a form that is accepted by the algorithm. The work in [22] adopts the symmetrization of the dependency graph of the model, i.e. the input matrix A for the Sloan algorithm will have (|V | + |E|)×(|V | + |E|) entries. We follow instead a different approach. The size of A is U , with |V | ≤ U ≤ |V | + |E|. Every event e generates entries in A: when |V in (e)|×|V out (e)| < T , with T a threshold value, all the cross-product of V in (e) vertices with the V out (e) vertices are nonzero entries in A. If instead |V in (e)|×|V out (e)| ≥ T , a pseudo-vertex ve is added, and all V in (e)×{ve } and {ve }×V out (e) entries in A are set to be nonzero. Usually U will be equal to V , or just slightly larger. This choice better represents the variable–variable interaction matrix, while avoiding degenerate cases where highly connected event could generate a dense matrix A. In our implementation, T is set to 100. The matrix is finally made symmetric using: A0 = A + AT . As we shall see in section 3, the computational cost of Sloan remains almost always bounded. Algorithm 2 Pseudocode of the Sloan algorithm. Function Sloan: Select a vertex u of the graph. Select v as the most-distant vertex to u with a graph visit. Establish a gradient from 0 in v to d in u using a depth-first visit. Initialize visit frontier Q = {v} repeat until Q is empty: Remove from the frontier Q the vertex v 0 that minimizes P (v 0 ). Add v 0 to the variable ordering l. Add the unexplored adjacent vertices of v 0 to Q. 38 PNSE’17 – Petri Nets and Software Engineering A second relevant characteristic of Sloan is its parametric priority function P (v 0 ), which guides variable selection in the greedy strategy. A very compact pseudocode of Sloan is given in Algorithm 2. A more detailed one can be found in [21]. The method follows two phases. In the first phase it searches a pseudo- diameter of the A matrix graph, i.e. two vertices v, u that have an (almost) maximal distance. Usually, an heuristic approach based on the construction of the root level structure of the graph is employed. The method then performs a visit, starting from v, exploring in sequence all vertices in the visit frontier Q that maximize the priority function: P (v 0 ) = −W1 · incr (v 0 ) + W2 · dist(v, v 0 ) where incr (v 0 ) is the number of unexplored vertices adjacent to v 0 , dist(v, v 0 ) is the distance between the initial vertex v and v 0 , and W1 and W2 are two integer weights. The weights control how Sloan prioritises the visit of the local cluster (W1 ) and how much the selection should follow the gradient (W2 ). Since the two weights control a linear combination of factors, in our analysis we shall W1 consider only the ratio W 2 . In Section 4 we will concentrate on the best selection of this ratio. We use the name SLO, CM and KING to refer to the Sloan, Cuthill- Mckee and King algorithms in the AGen set. GreatSPN uses the Boost-C++ implementations of these methods. 2.3 P-semiflows chaining algorithm In this subsection we propose a new heuristic algorithm exploiting the ΠPSF set of structural units obtained by the P-semiflows computation. A P-semiflow is a positive, integer, left annuler of the incidence matrix of a Petri net, and it is known that, in any reachable marking, the sum of tokens in the net places, weighted by the P-semi-flow coefficients, is constant and equal to the weighted sum of the initial marking (P-invariant). Its main idea is to maintain the places shared between two ΠPSF units (i.e. P-semiflows) as close as possible in the final MDD variable ordering, since their markings cannot vary arbitrarily. The pseudo-code is reported in Algorithm 3. The algorithm takes as input the ΠPSF set and returns as output a variable ordering (stored in the ordered l). Initially, the πi unit sharing the highest number of places with another unit is removed by ΠPSF and saved in πcurr . All its places are added to l. Then the main loop runs until ΠPSF becomes empty. The loop comprises the following operations. The πj unit sharing the highest number of places with πcurr is selected. All the places of πj in l, which are not currently C (i.e. the list of currently discovered common places) are removed. The common places between πi and πj not present in C are appended to l. Then the places present only in πj are added to l. After these steps, C is updated with the common places in πi and πj , and πj is removed by ΠPSF . Finally πcurr becomes πj , completing the iteration. This algorithm is named P and belongs to the APSF set. Amparore et.al.: Decision Diagrams for Petri Nets: which Variable Ordering? 39 Algorithm 3 Pseudocode of the P-semiflows chaining algorithm. Function P-semiflows(ΠPSF ): l = ∅ is the ordered list of places. C = ∅ is the set of current discovered common places. Select a unit πi ∈ ΠPSF s.t. max{i,j}∈|ΠPSF | πi ∩ πj with i 6= j ΠPSF = ΠPSF \ {πi } πcurr = πi Append V (πcurr ) to l repeat until ΠPSF is empty: Select a unit πj ∈ ΠPSF s.t. maxj∈|ΠPSF | πcurr ∩ πj Remove (l ∩ V (πj )) \ C to l Append V (πcurr ∩ πj ) \ C to l Append V (πj ) \ (C ∩ V (πcurr )) to l Add V (πcurr ∩ πj ) to C πcurr = πj ΠPSF = ΠPSF \ {πj } return l 2.4 The Noack and the Tovchigrechko greedy heuristics algorithms The Noack [24] and the Tovchigrechko [28] methods are greedy heuristics that build up the variable order sequence by picking, at every iteration, the variable that minimizes an objective function. A detailed description can be found in [18]. A pseudo-code is given in Algorithm 4. Algorithm 4 Pseudocode of the Noack/Tovchigrechko heuristics. Function NOACK-TOV: S = ∅ is the set of already selected places. for i from 1 to |V |: compute weight W (v) = f (v, S) for each v 6∈ S. find v that maximizes W (v). l(i) = v. S ← S ∪ {v}. return the variable order l. The main difference between the Noack and the Tovchigrechko methods is the weight function f (v, S), defined as: X  X  fNoack (v, S) = g1 (e) + z1 (e) + g2 (e) + c2 (e) e∈E out (v) e∈E in (v) k1 (e)∧k2 (e) k1 (e)∧k2 (e) X X X X fTov (v, S) = g1 (e) + c1 (e) + g2 (e) + c2 (e) out out in in e∈E (v) e∈E (v) e∈E (v) e∈E (v) k1 (e) k2 (e) k1 (e) k2 (e) 40 PNSE’17 – Petri Nets and Software Engineering where the sub-terms are defined as:  max 0.1, |S∩V in (e)| in g1 (e) = |V in (e)| , g2 (e) = 1+|S∩V (e)| |V in (e)|   max 0.1, 2·|S∩V out (e)| max 0.2, 2·|S∩V out (e)| c1 (e) = |V out (e)| , c2 (e) = |V out (e)| out z1 (e) = 2·|S∩V (e)| |V out (e)| , k1 (e) = |V in (e)| > 0, k2 (e) = |V out (e)| > 0 Few technical information is known about the criteria that were followed for the definition of the fNoack and fTov functions. An important characteristic is that both functions have different criteria for input and output event conditions, i.e. they do not work on the symmetrized event relation, like the Sloan method. The set A of algorithms considered in the benchmark includes four variations of these two algorithms: the variants NOACK and TOV are the standard implemen- tations, as described in [18]. In addition, we tested whether these methods could benefit from an additional weight function that favours structural units. The two methods NOACK-NU and TOV-NU employ this technique. In this case, function f (v, S) is modified to add a weight of 0.75 · |S ∩ V (Π(v))| to variable v. In our experiments we tried various values for the weight, but we generally observed that these modified methods do not benefit from this additional information. 2.5 Markov Clustering heuristic The heuristic MCL is based on the idea of exploring the effectiveness of clustering algorithms to improve variable order technique. The hypothesis is that in some models, it could be beneficial to first group places that belong to connected clusters. For our tests we selected the Markov Cluster algorithm [29]. The method works as a modified version of Sloan, were clusters are first ordered according to their average gradient, and then places belonging to the same clusters will appear consecutively on the variable ordering, following the cluster orders. This method is named MCL and belongs to the AGen set. 3 The Benchmark The considered model instances are that of the Model Checking Contest, 2016 edition [16], which consists of 665 PNML models. We discarded 257 instances that our tool was not capable to solve in the imposed time and memory limits, because either the net was too big or the RS MDD was too large under any considered ordering. Thus, we considered for the benchmark the set I made of 386 instances, belonging to a set M of 62 models. These 386 instances run for the 14 tested algorithms for 75 minutes, with 16GB of memory and a decision diagram cache of 226 entries. In the 386 instances of I two sub-groups are iden- tified: The set IPSF ⊂ I of instances for which P-semiflows are available, with 328 instances generated from a subset MPSF of 55 models; The set INU ⊂ I of instances for which nested units are available, with 65 instances generated from a subset MNU of 15 models. Amparore et.al.: Decision Diagrams for Petri Nets: which Variable Ordering? 41 The overall tests were performed on OCCAM [2] (Open Computing Cluster for Advanced data Manipulation), a multi-purpose flexible HPC cluster designed and maintained by a collaboration between the University of Torino and the Torino branch of the National Institute for Nuclear Physics. OCCAM counts slightly more than 1100 cpu cores including three different types of computing nodes: standard Xeon E5 dual-socket nodes, large Xeon E5 quad-sockets nodes with 768 GB RAM, and multi-GPU NVIDIA nodes. Our experiments took 221 hours of computation using 3 standard Xeon E5 dual-socket nodes. Scores. Typically, the most important parameter that measures the performance of variable ordering is the MDD peak size, measured in nodes. The peak size rep- resents the maximum MDD size reached during the computation, and it therefore represents the memory requirement. It is also directly related to the time cost of building the MDD. For these reasons we preferred to use the peak size alone rather than a weighted measure of time, memory and peak size, that would make the result interpretation more complex. The peak size is, however, a quantity that is strictly related to the model instance. Different instances will have unre- lated peak sizes, often with different magnitudes. To treat instances in a balanced way, some form of normalized score is needed. We consider three different score functions: for all of them the score of an algorithm is first normalized against the other algorithms on the same instance, which gives a score per instance, and then summed over all instances. Let i be an instance, solved by algorithms A = {a1 , . . . , am } with peak nodes Pi = {pa1 (i), . . . , pam (i)}. The scores of an algorithm a for an instance i are: – The Mean Standard Score of instance i is defined as: MSSa (i) = pa (i)−µ σA (i) A (i) , where µA (i) and σA (i) are the mean and standard deviations for instance summed over the all algorithms that solve instance i. i} – The Normalized Score for instance i is defined as: NSa (i) = 1 − min{p∈P pa (i) , which just rescales the peak nodes taking the minimum as the scaling factor. – The Model Checking Contest score1 for instance i defined as: MCCa (i) = 48 if a terminates on i in the given time bound, plus 24 if pa (i) = min{p ∈ Pi }. The final scores used for ranking algorithms over a set I 0 ⊆ I is then determined as the sum over I 0 of the scores per instance: P – The Mean Standard Score of algorithm a: MSSa = |I10 | i∈I 0 MSSa (i) P – The Normalized Score of algorithm a: NSa = |I10 | i∈I 0 NSa (i) P – The Model Checking Contest score of a: MCCa = |I10 | i∈I 0 MCCa (i) MSS requires a certain amount of samples to be significant. Therefore, we apply it only for those model instances were all our tested algorithms terminated in the time bound. For those instances where not all algorithm terminated, we apply NS. We decided to test the MCC score to check if it is a good or a biased metric, when compared to the standard score. 1 We actually use a simplified version where answer correctness is not considered. 42 PNSE’17 – Petri Nets and Software Engineering Table 1. Performance of the ordering algorithms using the MCC2016 models. Num. instances Average scores Algorithm a A applied solved optimal uniq. NSa MSS∗a NS∗a MCCa FORCE-PTS AGen 386 259 23 1 0.699 -0.071 0.552 33.63 FORCE-NES AGen 386 267 27 0 0.657 -0.275 0.505 34.88 FORCE-WES1 AGen 386 263 23 0 0.663 -0.228 0.505 34.13 CM AGen 386 290 69 2 0.586 -0.076 0.449 40.35 KING AGen 386 284 36 2 0.635 -0.032 0.504 37.55 SLO AGen 386 340 71 5 0.534 -0.125 0.471 46.69 NOACK AGen 386 312 51 0 0.535 -0.052 0.425 41.96 TOV AGen 386 325 75 2 0.524 -0.043 0.435 45.07 MCL AGen 386 250 27 4 0.767 0.676 0.640 32.76 FORCE-P APSF 328 230 33 0 0.604 -0.292 0.435 36.07 P APSF 328 258 26 3 0.729 0.623 0.656 39.00 FORCE-NU ANU 65 38 11 1 0.711 -0.364 0.506 31.01 NOACK-NU ANU 65 47 3 0 0.777 0.049 0.692 34.70 TOV-NU ANU 65 45 1 0 0.773 0.122 0.672 32.86 Table 1 describes the general summary of the benchmark results. For each algorithm, we report again its requirement class (AGen , APSF , ANU ). The table reports the number of instances where: the algorithm is applied, the algorithm finishes in the time and memory bounds, the number of times the algorithm finds the best ordering among the others, and the number of times the algorithm is the only one capable of solving an instance. The last four columns report: the NS score on all instances NSa ; the MSS score on completed instances (MSS∗a ); the NS score on completed instances (NS∗a ); and the MCCa score per instance. Remember that for NS and MSS, a lower score is better, while for MCC a higher score is better. From the table, it emerges that TOV and SLO are the methods that perform better, with a clear margin. Other methods, like FORCE-P and FORCE-NU have the worst average behaviour, even if they perform well when they are capable of solving an instance. Considering the column of “optimal” instances, some methods seem to perform well, like CM, but as we shall see later in this section, this is a bias caused by the uneven number of instances per model (i.e. some models have more instances than others). To our surprise, both MCL and P methods show only a mediocre average performance even if there is a certain number of instances where they perform particularly well. In general, most instances are solved by more than one algorithm. In the rest of the section we go into model details, first considering the performance of the algorithms, and then by ranking the methods considering either instances(I, IPSF , INU ) or models(M, MPSF , MNU ). Efficiency of variable ordering algorithms. Table 2 reports the times required to generate the variable ordering. The table reveals several problems and inef- Amparore et.al.: Decision Diagrams for Petri Nets: which Variable Ordering? 43 Table 2. Time distribution for the tested variable ordering methods, ordered by E[t]. Method Name t<0.01 0.0110 d.n.f. E[t] σ[t] KING 559 44 1 0 0 0.006 0.056 CM 558 45 1 0 0 0.006 0.059 MCL 522 80 2 0 22 0.016 0.129 FORCE-NES 371 220 13 0 4 0.090 0.423 FORCE-WES1 367 222 15 0 4 0.095 0.432 FORCE-PTS 368 221 15 0 4 0.098 0.442 SLO 510 87 5 2 0 0.211 3.602 FORCE-NU 409 131 38 26 0 0.708 2.235 TOV 434 142 21 7 0 0.854 9.092 NOACK 435 142 20 7 0 0.869 9.200 NOACK-NU 433 144 19 8 0 0.879 9.505 TOV-NU 430 146 20 8 0 0.927 9.849 FORCE-P 217 281 78 28 74 1.016 2.474 P 510 54 14 26 103 14.621 159.891 ficiencies of some of these methods in our implementation. In particular, our implementation of the Noack and Tovchigrechko methods are simple prototypes, and have at least the cost of O(|V |2 ). The P algorithm also shows a severe per- formance problem, with a cost that does not match its theoretical complexity. Surprisingly, the cost of Sloan is quite high. The library documentation states that its cost is O(log(m) · |E|) with m = max{degree(v) | v ∈ V }, but the numer- ical analysis seems to suggest that the library implementation could have some hidden inefficiency. 3.1 Results of the benchmark Figure 2 shows the benchmark results, separated by instance class and algorithm class. The plots on the left (1, 3, and 5) report the results on the instances that are solved by all algorithms in the given time and memory limits (“Easy instances” hereafter), while those on the right report the results for all instances solved by at least one algorithm (“All instances” hereafter). In the left plots we report the three tested metrics, while on the right plots we discard the MSS, since the available samples for each instance may vary and could be too low for the Gaussian assumption. To fit all scores in a single plot we have rescaled the score values in the [0, 1] range. Algorithms are sorted by their NS score, best one on the left. The top row (plot 1 and 2) considers the AGen methods on all I instances. The center row considers the AGen ∪ APSF methods on the IPSF instances. The bottom row considers the AGen ∪ ANU methods on the INU instances. Plots 1, 3, and 5 consider 187, 145 and 11 instances, respectively, which are the “easy” instances. All three scores seem to provide (almost) consistent rankings. The only ex- ception is plot 5. We suspect that this discrepancy can be explained by the small number of available instances (i.e. 11). The MCC metric is not very significant In[5857]:= GraphicsGrid@8 8GenScorePlot4@2, 187D, GenScorePlot2@6, 386D<, 8GenScorePlot4@8, 145D, GenScorePlot2@12, 328D<, 8GenScorePlot4@14, 11D, GenScorePlot2@18, 65D< <, ImageSize Ø 730, H*AspectRatioØ1.05,*LSpacings Ø 8- 45, 0 W1 . We shall now refer to the logLP@X_D := In[6305]:= parametric of Sloan with W1<, ListLogLogPlot@8881, variation 1 =8LargeVal, 1, W2 = 16 LargeVal<<, as SLO:1/16.X<, Joined Ø 8True, False