Machine Learning and Disk-based Methods for Qualitative Verification of Markov Decision Processes Mohammadsadegh Mohagheghi1[0000−0001−8059−3691] and Khayyam Salehi2[0000−0002−3379−798X] 1 Department of Computer Science, Vali-e-Asr University of Rafsanjan, Rafsanjan, Iran 2 Department of Computer Science, Shahrekord University, Shahrekord, Iran mohagheghi@vru.ac.ir Abstract. State explosion is a well-known challenge of model checking. In this paper, we propose several approaches to improve the standard techniques for verifying the qualitative reachability properties of Markov decision processes. For the first approach, we use two heuristics to re- duce the total number of iterations of the standard iterative methods. The second approach uses the secondary hard disks for storing the infor- mation of a model and uses the main memory for a back-ward technique on the standard methods. The third approach uses a machine learning technique to classify the state space of a model to the related classes. While this approach does not need any memory overhead, its running time is much less than the running time of the standard approaches and can be used to cope with the state explosion problem. Keywords: Probabilistic model checking, Qualitative reachability, Markov decision processs, Machine learning. 1 Introduction Probabilistic model checking is a formal verification technique, used to verify the correctness of computer systems with some probabilistic aspects [1]. Markov de- cision processes (MDPs) are a well-known formalism for modeling systems that have both non-deterministic and probabilistic behaviors [11, 9, 18]. The main challenge of model checking is the state space explosion problem. In this prob- lem, the number of states of the model grows exponentially in the number of modules of the underlying system [1, 6]. Several approaches are proposed to cope with this problem. Symbolic model checking is a well-established approach that uses a compact data structure to store the information of a model. Binary de- cision diagram (BDD) is one of the most successful compact data structures that is widely used in successful model checking tools [6]. Multi-terminal BDD (MTBDD) has been developed and widely used for verification of probabilistic systems [17]. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). However, iterative computations required in probabilistic model checking are computationally expensive where BDD-based data structures are used. As an alternative, sparse matrix representation is used as an explicit approach that stores the information of state-values and transition probabilities of the model. A hybrid approach uses MTBDD for storing the information of transitions and an explicit array for state-values [17]. In this paper, we focus on the verification of qualitative reachability proper- ties of MDP models and propose several techniques to improve the performance of standard iterative algorithms for this class of properties. A qualitative prop- erty states that the extremal (minimal or maximal) probability of reaching a set of states is 0 or 1 [1, 9]. Some graph-based computations are needed to com- pute the related set of states. While BDD-based implementation for computing qualitative reachability properties in MDPs is available in several states of the art model checkers [10, 12], their running time is a main drawback of the BDD- based techniques. To alleviate this drawback, we propose several improvements for computing qualitative reachability properties of MDPs. The forward and backward approaches for the standard iterative algorithms for computing quali- tative reachability properties are compared in [16]. While the backward approach is much faster than the forward one, its memory overhead is the main obstacle for this approach. As the first contribution of the paper, we propose several improvements to the forward approach. The general idea of these approaches is to reduce the number of iteration of the standard algorithms. We use the sparse explicit data structure for implementations of these improvements and consider those models that are small enough to load overall information in the main memory. Furthermore, we propose two approaches in the case of larger models that can not be loaded in the main memory. As the second contribution of the paper, we use a disk-based approach to store the transition probability information of a model and load its graphical information to the main memory. In this case, we apply the backward approach to reduce the running time. As the third contribution, we apply machine learning approaches to compute the set of states for which the extremal reachability probability is 0 or 1. For each class of MDP models, we consider several small models for training a classifier and use it to classify the states of a large model to the related sets. Although this technique does not guarantee to find the exact sets, our experiments show promising results with nearly zero error. The main benefit of this approach is its scalability for very large models. 1.1 Related Works The first algorithms for qualitatively verifying the reachability properties of MDP models were proposed in [7]. The time complexity of these algorithms is linear or quadratic in the size of the model for the minimal or maximal reach- ability probabilities, respectively [1, 7]. BDD-based and sparse explicit imple- mentations of these algorithms have been proposed and studied in [17]. The worst-case time complexity for qualitative verification of maximal reachability probability is improved in [4]. Decomposing an MDP to its strongly connected components (SCCs) and computing reachability probabilities of the states of each SCC has been proposed in [13]. In this case, qualitative verification is performed implicitly for each SCC. Although this technique presents promising results in the running time, its focus is on the computations of quantitative reachability properties and does not guarantee the correctness for the results in the case of qualitative verification. The backward and forward approaches for the standard algorithms for qualitative verification of MDPs are studied and compared in [15]. The backward approach for the sparse explicit data structure is supported by the STORM model checker [8]. A linear-time heuristic is proposed in [16] to approximate the qualitative reachability properties in the case of maximal probabilities. In Section 3, we use this heuristic to reduce the total number of iterations for the forward approach. Disk-based techniques have been used in [14] for verification of quantitative properties of very large DTMC and MDP models. In [3] a learning-based approach is used for statistical model checking of MDPs. This approach reduces the visited states of a model and is implemented in the explicit engine of PRISM. In [20], machine learning is employed to approximate the probabilities of an unknown MDP model. In [19] machine learning is used to extrapolate the optimal actions of MDP models. Although the proposed ap- proach in [19] proposes some promising results, it is limited to only a case study and its generality for other models is not explained. 2 Preliminaries In this section, the main concepts and definitions of probabilistic model checking and qualitative verification of reachability properties are reviewed. More details about the probabilistic model checking and its variant are available in [1, 9, 6] 2.1 Probability Distribution and Markov Decision Process A probabilityPdistribution on a finite set X is defined as a function P r : X → [0, 1] where x∈X P r(x) = 1. We use D(x) to denote the set of all probability distributions on X. Definition 1. Markov Decision Processes. A Markov Decision Process (MDP) is a tuple M = (S, s0 , Act, P, G) where: – S is a finite set of states. – s0 ∈ S is the initial state. – Act is a finite set of actions. For every state s ∈ S, one or more elements of Act are defined as enabled actions and are shown by Act(s). – P : S × Act → D(s) is a probabilistic transition function. We use (s, α, s0 ) for a transition from the source state s to the destination s0 by the action α and P (s, α, s0 ) for the probability of this transitions. For each state s ∈ S and enabled action α ∈ Act(s) a probability distribution is defined. – G ⊂ S is the (non-empty) set of goal states. The size of X is defined as the number of its states and transitions and is shown by |M |. MDPs are widely used in mathematics, engineering, economics, and man- agement to model decision making under uncertainty in stochastic environments [11, 18]. The actions of MDPs are used to model non-deterministic concepts of a system and its related environment. The probabilistic transition function is used to model the stochastic and probabilistic behavior of the environment. In formal verification, a high level language is used for modeling computer systems. Every model includes one or several modules. Some variables in a bounded domain are defined for each module. A possible valuation of the vari- ables in a module presents a state of the module for the system. The PRISM language is used as the standard modeling language for probabilistic systems [12]. The interpreter of the model checker constructs the resulting MDP (states, actions, and transition probabilities) according to the definitions of its modules. More details about the PRISM language are available in [17, 12]. 2.2 Semantic of MDP For an MDP M and one of its states s ∈ S, the MDP performs a transition in two steps: – It non-deterministically selects an enabled action α ∈ Act(s). – It randomly selects the destination state s0 ∈ S with probability P (s, α, s0 ). For any state s ∈ S and enabled action α ∈ Act(s), we use Post(s, α) for the set of α-successors of s, Post(s) for all its possible successors, and Pre(s) for its predecessor states: . Post(s, α) = {s0 ∈ S | P (s, α, s0 ) > 0}, (1) . Post(s) = ∪α∈Act(s) Post(s, α), (2) . Pre(s) = {s0 ∈ S|s ∈ P ost(s0 )}. (3) A path in M shows a possible run and is defined as a non-empty (finite or α α infinite) sequence of states and actions of the form π = s0 →0 s1 →1 ... where for every i ≥ 0, we have si ∈ S and αi ∈ Act(si ) and si+1 ∈ Post(si , αi ). We use P aths to denote the set of all infinite paths of M that start from a state s ∈ S. For the subset of finite paths of P aths we use F P aths . We also use π(i) to denote the (i + 1) − th state in the path π, i.e., π(i) = si . To reason about the probabilistic behavior of an MDP M, one should resolve its non-deterministic choices. To do so, a deterministic function is used that maps an action α ∈ Act to each path of M. This function is called adversary (scheduler or policy) and is defined as [9, 6] : Definition 2. (Adversary). An adversary of an MDP M is a function σ : α α αi−1 F P aths → Act that for every finite path π = s0 →0 s1 →1 ... → si selects an enabled action αi ∈ Act(si ). An adversary σ is called memory-less if it depends only on the last state of the path. We use F P athσS for the set of all finite paths σ(s) σ(sn−1 ) of the form s → s1 ...sn−1 → sn that start from s. In this paper, we only use memory-less adversaries that are sufficient for anal- ysis of reachability properties. We use AdvM for the set of all (memory-less) adversaries of M . 2.3 Reachability Probabilities The main class of properties that are used in the probabilistic model checking of MDPs includes the reachability properties. The computation of these properties is usually reduced to the computation of the reachability ones. For MDPs, a reachability probability is defined as the extremal (minimal or maximal) prob- ability of reaching a goal state. For an adversary σ ∈ AdvM and a state s ∈ S, we define reachσs (G) as the set of all finite paths, starting from s ∈ S, ending in G, and selecting actions due to the adversary σ. Formally we define: reachσs (G) = {π ∈ F P athσs | last(π) ∈ G, and ∀i < |π| : π(i) ∈ / G} (4) where last(π) is the last state in π. For an adversary σ and a path π ∈ F P athσs , a probability measure prσ (π) is defined as the product of probabilities of tran- sitions between the states of π: n−1 prσ (π) = Πi=0 P (si , σ(si ), si+1 ) (5) This probability measure is used to formally define the extremal reachability probabilities. For any state s ∈ S, the minimal and maximal probability of reaching G from s over AdvM are denoted by P rsmin (♦G) and P rsmax (♦G) and are formally defined as: prsmin (♦G) = infσ∈AdvM P r(reachσs (G)), (6) prsmax (♦G) = supσ∈AdvM P r(reachσs (G)) (7) where P r(reachσs (G)) is used for the total probability of reaching G from s under the adversary σ and is formally defined: X P r(reachσs (G)) = P rσ (π) (8) π∈reachσ s (G) Graph-based computations are used to detect the set of states of an MDP for which the extremal reachability probability is exactly 0 or 1. The properties that relate to the computation of these states are called qualitative reachability properties. On the other hand, the properties related to the computations of reachability probabilities for the remaining states are called quantitative reach- ability properties. 2.4 Qualitative Reachability Properties The following sets are defined for the extremal qualitative reachability probabil- ities: 0 Smin = {s ∈ S|P rsmin (♦G) = 0}, 1 Smin = {s ∈ S|P rsmin (♦G) = 1}, 0 max Smax = {s ∈ S|P rs (♦G) = 0}, Smax = {s ∈ S|P rsmax (♦G) = 1}. 1 The computation of these sets are necessary for qualitative verification of MDPs. For quantitative reachability probabilities, there is no need to have these sets but they can improve the precision of the computed values [9]. For the case of extremal expected rewards (where the maximal or minimal expectation of ac- cumulated rewards until reaching a goal state is considered), the computations 0 1 of the Smax and Smax are needed to guarantee the convergence of the com- puted values. Graph-based iterative methods, which only consider the graphical structure of the underlying models are used to compute these sets. Iterative nu- merical methods are used for the computation of quantitative properties [1, 6]. Algorithm. 1 and Algorithm. 2 describe the standard iterative algorithms for 0 1 computing the Smax and Smax [9]. 0 Algorithm 1 The standard algorithm for computing Smax Input: MDP M = (S, s0 , Act, P, G) 0 Output: The set Smax = {s ∈ S | prsmax (♦G) = 0} 1: R := G; 2: repeat 3: R0 := R; 4: R := R0 ∪ {s ∈ S | post(s) ∈ R0 }; 5: until R 6= R0 ; 6: return S \ R; In each iteration, the Algorithm. 1 adds a state s ∈ S/R to R if at least one state s0 ∈ P ost(s) has been added to R in the previous iteration. The algorithm continues until the case where no new state is added to R. The last set R is the fixed point of the computations and contain all states that can reach to at least one of the goal states. For each s ∈ R we have P rsmax (♦G) > 0. The remaining states (S/R) are those that can not reach to any goal states and correspond to 0 the definition of Smax . 1 For Algorithm. 2, two nested loops are used to compute the Smax set. The outer loop starts with S and iteratively removes those states s ∈ S for which P rsmax (G) < 1. For these outer loop iterations, a sequence of Ri sets are induced 1 where S = R0 ⊂ R1 ⊂ ... ⊂ Rn = Smax . To compute each Ri set the inner loop starts from G (line 6 of Algorithm. 2) and iteratively adds each state s0 ∈ S to Ri if s0 can reach to one of the goal states with probability one via the states of Ri−1 . The remaining states after reaching the fixed point (the states in S/Rn ) do 1 Algorithm 2 The standard algorithm for computing Smax Input: MDP M = (S, s0 , Act, P, G) 1 Output: The set Smax = {s ∈ S | prsmax (♦G) = 1} 1: R := S; 2: repeat 3: R0 := R; 4: R:= G; 5: repeat 6: R00 := R; 7: R := R00 ∪ {s ∈ S | ∃α ∈ Act(s).(post(s, α) ⊂ R0 ∧ post(s, α) ∩ R00 6= ∅)}; 8: until R 6= R00 ; 9: until R 6= R0 ; 10: return R; 1 not belong to Smax . The correctness of the standard algorithms for qualitative reachability properties are available in [1, 7]. For an MDP M with |S| states, the time complexity of algorithms 1 and 2 are in O(|M |) and O(|M | · |S|) respectively. In these cases, we suppose that for each state s ∈ R0 the algorithms can determine any states of P re(s) in a constant time [1]. For this purpose, an standard approach is to restore the information of the model in a backward approach, i.e. for each state s ∈ S the method should restore the list of states in P re(s). The main drawback of the backward approach is its memory overhead which limits it to small models. On the other hand, the forward implementation of these algorithms is proposed and used in PRISM [12]. The forward approach need not any additional memory, but may increase the number of iterations and the run- ning time of the computations. The memory consumption and running time of the forward and backward approaches for the standard algorithms of qualita- tive verification of MDPs are compared in [16]. In this paper, we only consider the maximal qualitative reachability properties. The standard algorithms for the minimal case can be found in [1, 9]. All proposed techniques of this paper can also be applied to the case of minimal reachability properties. 3 The Proposed Methods for Accelerating Qualitative Verification of MDPs In this section, we propose three approaches to improve the performance of the standard algorithms for computing qualitative reachability properties in MDPs. 3.1 Reducing Iterations of the Forward Approaches We use two heuristic techniques to reduce the number of iterations of the for- ward implementation of the standard algorithms for qualitative verification of reachability properties in MDPs. Although the idea of these heuristics has been used in the previous works for other classes of properties for MDPs, to the best of our knowledge, they were not applied for computing qualitative reachability properties. The first heuristic (called h1 for the remaining parts) is to merge separate sets of the standard algorithms to one set. One drawback of the algorithms 1 and 2 (which are proposed in [1, 9]) is that they separate the computed sets in the current iterations from the set of the previous iteration. The Algorithm 1 uses two sets R and R0 while it can keep all added states to one set R. In the latter case, any added state in each iteration can be used to add (probably) some other states in the same iteration. In Algorithm 2, the R and R00 of the inner loop can be merged to one set. The second heuristic (called h2) reduces the total number of iterations us- ing a good state ordering for selecting the remaining states of each algorithm. Several state ordering techniques have been proposed in the previous works for quantitative reachability properties [5] or discounted accumulated rewards [21]. In both cases, the proposed methods use the P re(s) states of each state s. To avoid the memory overhead of the information of the P re sets, we use a forward approach for state ordering. The idea of this approach is to consider the graph of the model in a forward manner and perform a breadth-first search on the graph. The visited states of the model (nodes of the related graph) are stored in a list and the iterative methods use the list in reverse order. The algorithm of this method is proposed in Algorithm. 3. This algorithm uses an array StateOrder to store the selected states. In each step, it selects a state from the beginning of the array and adds its non-selected successor states to the end of the array. The algorithm terminates when all states of the model have been added to the StateOrder array. The main advantage of our proposed state ordering is that it does not need to store any additional information of the model and as a result it does not have any memory overhead. Note that the backward approach does not need to compute the state ordering explicitly. It considers states for adding to the R sets in the right way. The iterative methods should use the states in the StateOrder array in reverse order (from end to the beginning of the array). In general, this state ordering can reduce the number of iterations of the iterative methods because it accelerates the propagation of the related information between the states of the model. This state ordering is used in line 4 of Algorithm. 1 and line 7 of Algorithm. 2 where the algorithms update the R sets in each iteration. 3.2 Disk-based Backward Technique for Qualitative Verification To compensate the memory overhead of the backward approach, the secondary storage can be used. The idea of this approach is shown in Algorithm. 4. It first uses the MTBDD-based approach to construct the model and directly store the information of the model to the hard disk. Furthermore, it computes the information of the backward graph and loads them into the main memory. This structure does not include the information of transition probabilities and is used Algorithm 3 Forward method for state ordering Input: MDP M = (S, s0 , Act, P, G) Output: A state ordering array Stateorder[] 1: l := 0; 2: h := 1; 3: Stateorder[l] := s0 ; 4: while l < h do 5: s := Stateprder[l]; 6: for all si ∈ post(s) do 7: if si ∈ Stateorder then 8: Stateorder[h] := si ; 9: h := h + 1; 10: end if 11: end for 12: end while 13: return ; 0 1 to compute the Smax and Smax sets. Finally, the information of the remaining 0 1 states (S/(Smax ∪Smax )) will be loaded into the main memory if they are needed for quantitative verification. The main advantage of this approach is avoiding any memory overhead while it uses the backward approach for qualitative ver- ification. However, this approach applied only to those models for which the information of the backward graph can be loaded into the main memory. Algorithm 4 The Disk-base Approach Input: An MTBDD representation for an MDP M = (S, s0 , Act, P, G) 1 0 Output:The sets Smax and Smax 1: Store the information of M from MTBDD to the hard disk; 2: Compute the backward graph of M and load it to the main memory; 1 0 3: Apply the backward technique for computing the sets Smax and Smax ; 4: Free the memory of the backward graph; 5: Load the information of M for the remaining states to the main memory; 3.3 Using Machine Learning for Qualification Verification of MDPs Machine learning plays an important role in nowadays life. Classification is a act of categorizing a given set of data into classes. In machine learning, classifica- tion is supervised learning in which it infers from the data given to it and makes new observations or classifications based on the pattern extracted from training data [2]. There are many algorithms used in classifying new data. Some types of classification in machine learning can be summarized as Naive Bayes, Stochastic Gradient Descent, K-Nearest Neighbors, Decision Tree, Random Forest, Artifi- cial Neural Network, Support Vector Machine, and so on. As the main contribution of this paper, we propose a new approach which uses machine learning techniques to verify the qualitative reachability properties of MDPs. Our technique uses the fact that some parameters define the bound of some variable domains of MDP modules. The size of MDPs for a high-level definition changes according to the values of these parameters. In Fig.1 the initial definition of the Consensus model is proposed. This model includes four modules (N = 4). Each module contains two variables (pc1 and coin1 ). A counter is defined as a global variable. The model has a parameter (K) which determines the range of the counter variable. According to the values of this parameter, we can have some models with different size. Fig. 1. A high-level definition of an MDP model in PRISM environment In general and for each class of models, we can consider small values for its parameters and have some small MDPs. A standard method can compute the qualitative reachability properties of these small models. As a result, the set S 0 1 ? 0 of state of a model are classified into the Smax , Smax , and Smax = S/(Smax 1 ∪ Smax ) sets. We use the small models for training a classifier. Furthermore, 0 1 it is used to classify the state space S of a large model to its Smax , Smax , ? and Smax sets. We consider the variables and parameters of each model as its properties and its states as the samples for training. Also, for each variable with parametric domain (like the counter variable in Fig.1), we consider a property as the difference between its domain bound and its values. This additional property helps the classifier to improve its performance. 4 Experimental Results To show the feasibility and applicability of the proposed approaches, we consider several test case models. These test cases are selected from the PRISM standard case studies and are widely used in the literature [9]. For the forward approach, we implement our improvements in the PRISM model checker. We use its sparse engine, which is implemented in C. We use a machine with a Core-i7 Intel pro- 0 cessor with 8GB of RAM. In Table 1, the experiments for computing Smin and 0 Smax sets are demonstrated. The first column of the table presents the model’s name. For Consensus, Zeroconf, and Wlan classes, we consider a maximal reach- ability property and for the Firewire class, we consider a minimal reachability property. More information about these classes are available in [12]. The MDP size of each model is the number of its states and transitions. We consider the running time and the number of iterations of Algorithm. 1 for the MTBDD-based method as is used in PRISM. We also demonstrate the running time of the sparse implementation of this algorithm. For our proposed heuristics, we report the running time and the number of iterations of the forward method with these heuristics. The running time in all tables are in seconds. Table 1. Case study models, number of iterations and running times for computing 0 0 Smin and Smax sets Model Parameter MDP MTBDD sparse forward with h1 forward with h2 (parameters) values Size time iters time time iters time iters 6 , 18 604K 2.25 979 1.1 .47 511 .01 7 Consensus 8 , 18 2378K 10.9 1305 6.3 4.3 681 .1 8 N,K 10 , 10 4725K 14.8 911 7.5 7.1 477 .3 9 10 9787K 6 140 3.7 1.6 98 2.14 12 Zeroconf 14 14.4M 7.5 156 3.9 2.14 110 4.9 12 K 18 17.8M 10.1 172 8.6 122 4.9 12 4.4 e 4,2500 7052K 18.4 207 .7 .26 76 .8 65 Wlan 5, 1000 8905K 10.4 399 1.24 .44 140 1.39 129 (K,ttm) 6, 200 18264K 17 783 2.2 .73 268 2 257 firewire 12,600 28.35M 30 373 13.6 12.4 368 3.1 5 (delay, 30,400 115M 161 409 52.7 48 404 9.7 4 deadline) 30,600 227.7M 345 409 110.3 100.5 404 25.3 4 In Table. 2, we show the running time for the standard iterative method and 1 1 our heuristics for computing the Smin and Smax sets. For Disk-based technique, the running times include the time for writing a model in the secondary disk, constructing the backward graph, computing qualitative reachability properties using the backward graph, and finally loading the information of the remaining part of each model (for the S ? sets) to the main memory. For the case of S 0 sets, all running times are less than 1 second and we do not separate their running times. Table 2. Case study models, number of iterations and running times for computing 1 1 Smin and Smax sets Model Parameter MTBDD sparse forward with h1 forward with h2 Disk-based (parameters) values time iters time time iters time iters time 6, 18 323 94.6K 141 85 56.5K 2.6 1118 7.2 Consensus 8, 18 1513 169K 1934 1359 101K 35.3 1788 17.7 (N,K) 10, 10 1738 88K 1767 1551 52.5K 113 1486 24 10 121 2357 104 30.1 1200 66.1 213 4.9 Zeroconf 14 184.3 2525 74.9 36.6 1385 52.4 209 5.1 (K) 18 128.4 2665 97.9 58.6 1481 107.8 192 6.5 s 4 , 2500 58.3 414 9.2 3.3 226 3.1 65 2.2 Wlan 5, 1000 30.8 798 17.2 5.5 418 4.1 206 2.7 (K,ttm) 6, 200 39.1 1566 29.2 9.1 802 8 256 2.9 firewire 12,600 33 261 11.2 9.7 255 .96 4 6.3 (delay, 30,400 74.2 169 14.3 11.4 156 2.1 3 17.9 deadline) 30,600 189 253 54.6 51.5 244 5.2 4 31.4 The results of Table 1 and 2 show that our proposed heuristics for the for- ward implementation of the standard algorithms improves their performance for computing qualitative reachability properties of MDPs. The best results are for the Consensus and firewire cases. For the wlan and zeroconf models, our heuris- tics reduce the number of iterations by one or two orders of magnitude. In most cases, the Disk-based method outperforms the others. Note that this approach is more useful for large models, where the information of a model cannot be loaded to the main memory. To apply machine learning for computing qualitative reachability properties of MDPs, we use Scikit-learn package of Python. For each class of models, we consider two or three small models for training and use decision tree classifier of Python. For each training MDP samples, PRISM is used to separate the state 1 0 ? to the Smax , Smax and Smax classes. We use the exportstates PRISM command to store the state-property information of states to related files. The inputs to the training phase are the state-separed and state-property files. To test the precision of this technique, we consider several test cases for each class of models. The results of this technique are proposed in Table. 3. To compare the running time of the proposed methods for computing the 1 Smax sets we consider several models of the Consensus example with N=6 and different values for the K parameter in range [20,100]. The results are proposed in Fig. 2 where the horizontal axix shows the value of K and the vertical axis shows the running time in logarithmic mode. The results show that our proposed heuristics reduce the running times by at leat two orders of magnitude. 5 Conclusion and future work In this paper, we propose several techniques to improve the performance of the iterative computations for the qualitative reachability properties of MDPs. We Table 3. Experimental results of the proposed machine learning approach for comput- 0 1 ing Smax and Smax sets Model Param Sample Reading Training Test test Precision False (parameters) value parameters samples time time parameter time iters detects Consensus 6 K=3,4 .9 .1 K = 50 82 100% 2 N 8 K=3,4 2.4 .27 K = 35 17.1 100% 2 Zeroconf - K=2,3 6.2 .62 K = 14 2.14 99.8% 26,070 Wlan 5 ttm=50,100 43 5.9 900 18.7 100% 0 (K) 6 ttm=50,100 89.5 11.3 600 26.5 100% 0 firewire 12 ddl=200,250 7.2 2.2 ddl=800 57 100% 0 (delay) 30 ddl=200,250 48.4 11 ddl=600 145.8 100% 0 1 Fig. 2. The running times for computing Smax sets for the Consensus case studies. first propose two heuristics for the forward implementation of the standard algo- rithms. The running times of our experiments show promising results for these heuristics. Because of the memory limitation of the explicit methods, we propose a disk-based method to save memory for graph-based computations. As the main contribution, we propose a method using machine learning to classify the state space of a model to the related classes. The experiments show promising results for this technique. For future works, we plan to extend our machine learning method to other problems of probabilistic model checking. As an example, this technique can be used to approximate the optimal policy of a large MDP model, which can be used for improving the current techniques for statistical model checking. References 1. Baier, C., Katoen, J.P.: Principles of model checking. MIT press (2008) 2. Bishop, C.M.: Pattern recognition and machine learning. springer (2006) 3. Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Křetı́nskỳ, J., Kwiatkowska, M., Parker, D., Ujma, M.: Verification of markov decision processes using learning algorithms. In: International Symposium on Automated Technology for Verification and Analysis. pp. 98–114. Springer (2014) 4. Chatterjee, K., Henzinger, M.: Faster and dynamic algorithms for maximal end- component decomposition and related graph problems in probabilistic verification. In: Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms. pp. 1318–1336. SIAM (2011) 5. Ciesinski, F., Baier, C., Größer, M., Klein, J.: Reduction techniques for model checking markov decision processes. In: 2008 Fifth International Conference on Quantitative Evaluation of Systems. pp. 45–54. IEEE (2008) 6. Clarke, E.M., Henzinger, T.A., Veith, H., Bloem, R.: Handbook of model checking, vol. 10. Springer (2018) 7. De Alfaro, L.: Formal verification of probabilistic systems. Ph.D. thesis, Stanford University (1997) 8. Dehnert, C., Junges, S., Katoen, J.P., Volk, M.: A storm is coming: A modern probabilistic model checker. In: International Conference on Computer Aided Ver- ification. pp. 592–600. Springer (2017) 9. Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated verification tech- niques for probabilistic systems. In: International School on Formal Methods for the Design of Computer, Communication and Software Systems. pp. 53–113. Springer (2011) 10. Hahn, E.M., Li, Y., Schewe, S., Turrini, A., Zhang, L.: IscasMC: a web-based probabilistic model checker. In: International Symposium on Formal Methods. pp. 312–317. Springer (2014) 11. Katoen, J.P.: The probabilistic model checking landscape. In: Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science. pp. 31–45 (2016) 12. Kwiatkowska, M., Norman, G., Parker, D.: Prism 4.0: Verification of probabilistic real-time systems. In: International conference on computer aided verification. pp. 585–591. Springer Berlin Heidelberg (2011) 13. Kwiatkowska, M., Parker, D., Qu, H.: Incremental quantitative verification for markov decision processes. In: 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN). pp. 359–370. IEEE (2011) 14. Mehmood, R.: Disk-based techniques for efficient solution of large Markov chains. Ph.D. thesis, University of Birmingham Birmingham (2004) 15. Mohagheghi, M.: Improving pre-computation for verification of markov decision processes. International Journal of New Technologies in Science and Engineering 5(5) (2018) 16. Mohagheghi, M., Salehi, K.: Improving graph-based methods for computing qual- itative properties of markov decision processes. Indonesian Journal of Electrical Engineering and Computer Science 17(3), 1571–1578 (2020) 17. Parker, D.A.: Implementation of symbolic model checking for probabilistic systems. Ph.D. thesis, University of Birmingham (2003) 18. Puterman, M.L.: Markov decision processes: discrete stochastic dynamic program- ming. John Wiley & Sons (2014) 19. Rataj, A., Woźna-Szcześniak, B.: Extrapolation of an optimal policy using sta- tistical probabilistic model checking. Fundamenta Informaticae 157(4), 443–461 (2018) 20. Tappler, M., Aichernig, B.K., Bacci, G., Eichlseder, M., Larsen, K.G.: L∗ -based learning of markov decision processes. In: International Symposium on Formal Methods. pp. 651–669. Springer (2019) 21. Wingate, D., Seppi, K.D.: Prioritization methods for accelerating mdp solvers. Journal of Machine Learning Research 6(May), 851–881 (2005)