Formal Semantics for Probabilistic Verification of Stochastic Regular Expressions Sinem Getir1 , Esteban Pavese1 , and Lars Grunske1 Software Engineering, Humboldt University Berlin, Germany Abstract. Modelling and verification of software systems is an effective phase of system development, as it can uncover failures in design early in the development process. There is an increasing need for languages and processes that allow for the specification of uncertainty that allow, for example, the modelling of the unknown behaviour of a user, or the stochastic failure rate of hardware components. In this paper we introduce a formal semantics on Stochastic Regular Expressions (SREs) over probabilistic action logics for quantitative ver- ification. We provide the recursive calculation of the language generated by an SRE, enhanced to reuse local results for global verification of sys- tem specifications. Furthermore, we demonstrate how to model systems with SREs and how to perform reachability analysis with Probabilistic Action-based Computational Tree Logic (PACTL*). Keywords: stochastic regular expressions, probabilistic formalism, probabilistic model checking, probabilistic verification 1 Introduction The analysis of systems with a probabilistic behaviour plays an important role in several applications, such as software engineering, speech recognition, digital communications and computational biology among others. On the other hand, regular expressions have spread through all of theoretical computer science and enjoy plentiful applications in the field of natural language processing, including parsing, deep language models, model inference and machine translation. Several studies have been conducted in probabilistic version of regular ex- pressions that is studied as probabilistic concurrent Kleene algebra in [12] and extended with additional Kleene theorems in the application of quantitative rea- soning on database queries [3]. Kartzow and Weidner [7,17] define a Monadic Second-Order Logic (MSOL) and a constraint logic with temporal properties for data analysis for probabilistic regular expressions. Additionally, Weidner specified Probabilistic Regular Ex- pressions for infinite strings with ω-properties in his thesis [16]. In this paper, we define a semantics of stochastic regular expressions in the context of probabilistic model checking, employing a probabilistic extension of Action-based Computation Tree Logic (ACTL*) to reason about temporal prop- erties quantitatively. Probabilistic model checking [8] is a technique developed in order to automatically perform such assessments, and has been successfully applied in recent years [10]. Probabilistic models used for model checking can be at different levels of abstraction, such as Markov Chains [2], Markov Decision Processes [2] and Stochastic Petri Nets [11] among others. In contrast to state- based representations, we introduce an approach and focus on stochastic regular expressions as an input model for probabilistic model checking applications. 2 Stochastic Regular Expressions (SRE) In this section, we briefly recall SRE [14] and action logic [1]. 2.1 Syntax of SRE The syntax of a Stochastic Regular Expression (SRE) E over an alphabet Σ is defined recursively as follows: X E := α Ei [ni ] E1 : E2 E ∗f (1) i with α ∈ Σ ∪ {ε}, ni ∈ N0 , f ∈ [0, 1] ⊂ R and every term Ei is a SRE, such that: 1. Atomic Action α: α is an atomic action that belongs to the alphabet Σ. P 2. Choice i Ei [ni ]: One of the provided terms Ei [ni ] is probabilistically chosen according to calculated probabilities from occurence values. ni denotes the occurrence value or choice rate for each term, such that the i-th term is chosen with probability Pninj . Occurrence value or choice rate is defined as j the number of cases that the node is chosen statistically. 3. Concatenation E1 : E2 : The terms E1 and E2 are successively interpreted. 4. Kleene Closure E ∗f : The term E is repeated for an indefinite number of times, subject to a binomial distribution. Each iteration occurs with a prob- ability of f . The termination probability is 1 − f . 5. Plus Closure E +f : The +Closure is a syntactic sugar that is omitted here, but can be easily emulated with E : E ∗f . Without loss of generality, the empty string ε is not included in the alphabet. However we include the empty string  as an atomic action. On the other hand,  can be derived from an expression like α∗0.0 , α ∈ Σ. A derivation of a conventional regular expression E is the set of sentences, or strings over the alphabet, derivable from it. This defines the language L(E) of E. This notion of language derivation is similarly applicable to SRE, except that each string has a probability value associated with it, and hence the language itself is associated with a probability distribution of its members as explained in the following within its denotational semantics. 2.2 Denotational semantics of SREs Intuitively SREs can be understood as an expression that defines a specific prob- ability function over its language strings such that JEK : s ∈ Σ∗ → [0, 1]. Such probabilistic language (p-language) is previously described for discrete events systems in [9]. In the following, we provide a trivial example on the p-language to explain the relationship between the SRE and the p-language. The semantics of SREs are described by the p-language [9] in the style of denotational semantics [15]. The probability function for an SRE E is denoted by JEK, and its application to a particular string s is denoted JEKs, which rep- resents an acceptance probability associated with string s in the language L(E). The probability function recursively calculates the occurrence probability of an arbitrary string s ∈ Σ∗ in a SRE model. By definition, if an arbitrary string s has a probability value greater than 0, it is accepted as a word of SRE. Meaning that if s ∈ L(E) then JEKs > 0. Example 1. Let L be a probabilistic language [9] describing the Bernoulli process where each experiment has two outcomes a and b with probabilities p and 1 − p respectively. Then L is defined on the alphabet Σ = {a, b} as L(s) = p#(a,s) .(1 − p)#(b,s) where #(a, s) is representing the number of occur- rences in word s. In the following we formally define some additional notions which will be referred to throughout this paper. Definition 1 (Words of a SRE). W ords(E) = {w | w ∈ L(E)} Definition 2 (Word length). A length of a word w = w0 w1 ...wn is the num- ber of included characters and denoted as |w|= n through the paper. Definition 3 (Probability of a word in the language). The probability Ei (ni ) | E1 : E2 | E ∗f | P function JEKs for every possible SRE term E (α | E +f ) is calculated recursively, where s = α1 , ..αn ∈ Σ∗ : – Atomic actions: JαKs = 1, if α = s JαKs = 0, if α 6= s (2) – Choice rX z X  nk  Ei ni s = P · JEk Ks (3) ni k Since every term might recognize s, the overall probability for a choice expression is the sum of all the term probabilities with respect to s. – Concatenation: n X JE1 : E2 Ks = (JE1 Kα1 ..αi · JE2 Kαi+1 ..αn ) i=0 In the summation, s is decomposed into two (possibly empty) substrings, each of which may be consumed by a concatenated expression. Even though one term may recognize its substring argument, if the other term does not recognize its respective substring, then that term returns a prob- ability of 0, and the overall probability for that instance of decomposition is 0. – Kleene closure: JE ∗f Kε = 1 − f n X JE ∗f Ks = (f · JEKα1 ..αi · JE ∗f Kαi+1 ..αn ) (4) i=1 +f · JEKs · JE ∗f Kε The first formula accounts for empty strings, as the only way an iterated expression should recognize an empty string is by not iterating; in other words terminating without executing (The termination probability is therefore 1-f ). The other formula recursively defines the general case. Here, one iteration of E will consume some portion of s, and the rest of s is consumed by further iterations. It is assumed that an iteration of a loop always consumes some non-empty string. Otherwise, the semantic model would have to account for Kleene closure iterating indefinitely on an argument, which is not an useful behaviour. All SRE probability functions presented above are well formed probabilitity func- tions. Interested readers can find the details of probability functions and the proof of well formness in [14]. 2.3 Action based Computation Tree Logic (ACTL*) ACTL* is introduced in [4] where the comparison for state labelled and transition labelled systems is studied. The syntax of ACTL* is described recursively on action labelled transition systems as follows; where ϕ is a formula executed on the runs of the system: ϕ := T rue | ¬ϕ | ϕ ∧ ϕ0 | ∃ϕ | ϕ U ϕ0 | Xa ϕ | X ϕ (5) We biefly recall the definitions required for the ACTL* semantics. Definition 4 (Labelled transition systems ). A labelled transition system is a tuple (S, Act, →) where: . S is a finite set of states . Act is a finite, non-empty set of actions . → is the transition relation denoted in ⊆ S × (Act ∪ )× S and any element of → is called a transition. Definition 5. A sequence (s0 , α0 , s1 )(s1 , α1 , s2 ).. ∈ → ∞ is called a path from s0 . A run ρ = (s, Φ) is a pair from s ∈ S, where Φ is a path from s, first state of ρ is s (f irst(ρ) = s) and path(ρ) = Φ. If a run θ is a suffix for run ρ, then we denote as θ ≤ ρ. The semantics is given by satisfaction relations based on the definitions above: ρ |= T rue always ρ |= ¬ϕ iff ρ 6|= ϕ ρ |= ϕ ∧ ϕ0 iff ρ |= ϕ and ρ |= ϕ0 ρ |= ∃ψ iff there exists a run θ ∈ run(f irst(ρ)) such that θ |= ψ ρ |= ψ U ψ 0 iff there exists a θ with ρ ≤ θ such that θ |= ψ0 and for all ρ ≤ η ≤ θ : η |= ψ0 ρ |= X ϕ iff there exists s, α, s0, θ such that ρ = (s, (s, α, s0))θ and θ |= ϕ ρ |= Xa ϕ iff there exists s, s0, θ such that ρ = (s, (s, a, s0))θ and θ |= ϕ (6) 3 Semantics of Stochastic Regular Expression Trees with Probabilistic Action based Computation Tree Logic Our goal is to reason about temporal properties on SRE models probabilistically. A very common logic Probabilistic computation tree logic (PCTL*) [6] and vari- ants are defined on the state and path formulas. However Stochastic Regular Expressions do not have the explicit notation of a state. Therefore we prefer to extend the ACTL* logic semantically expressed on the runs of the system as provided in subsection 2.3. The extended syntax of a Probabilistic ACTL* (PACTL*) is defined as fol- lows where ϕ is a word and Φ is a SRE formula. Definition 6 (Syntax of PACTL*). Φ = ¬Φ | Φ ∧ Φ0 | PP (ϕ) (7) ϕ = true | Xa ϕ | X ϕ | ϕ U ϕ0 (8) where P ⊆ [0, 1] 3.1 Semantics of PACTL* Every SRE term E is defined as a node that specifies operation type (choice, concatenation, kleene closure or action), choice rate, kleene probability and subnode(s). Formally E is an action or a tuple based on its type: (N , R), (N ), (N , k) if the opetaion types are choice, concatenation, kleene closure (T = + |:| ∗ or action a ∈ Σ,) respectively. where N = {E1 , E2 , ..., En } is the finite set of subnodes in the operating order (E1 : E2 6= E2 : E1 ), R = {r1 , r2 , ..rn } is the set of corresponding choice rates (r ∈ N) and k ∈ [0, 1] is a kleene probability. We construct every SRE node in a bottom up fashion by parsing the given string by using operator presedence. Hence, we avoid the ambiguity of the parsed trees and remain them unique. (The order of operator predence is ∗, :, +). We also restrict one operation type per SRE node which allows unambiguity. The semantics is defined a satisfaction relation for a word w and a SRE term E as follows: E |= ¬Φ iff E 6|= Φ (9) 0 0 E |= Φ ∧ Φ iff E |= Φ ∧ E |= Φ (10) E |= PP (ϕ) iff P r{w |= ϕ | w ∈ W ords(E)} ∈ P (11) w |= true always (12) w |= Xa ϕ iff w[0] = a and w[1] |= ϕ (13) w |= X ϕ iff w[1] |= ϕ (14) w |= ϕ U ϕ0 iff for some i ≤ |w|, w[i] |= ϕ0 and w[j] |= ϕ, ∀j < i. (15) The set of words for an SRE can be calculated recursively on SRE node E:  {a},   S if T = a W ords(Ei ), if T = +     Ei ∈N W ords(E) =   (W ords(E0 ) · W ords(E1 ))... · W ords(En ), ∀Ei ∈ N if T =:   ∞  W ords(Esub )k , where Esub ∈ N and E = (Esub )∗ S  if T = ∗ k (16) A system is specified as an SRE tree that includes a root node and finite set of nodes. An SRE tree is formally defined as; TE = (Eroot , E, Σ), where Eroot is the root node, E is the finite set of all nodes and Σ is the alphabet. Hence, verifying the the root node Eroot will result in verifying the system. 4 Example: System Specification with SRE Tree We provide an example automata and a corresponding SRE system specifica- tion in the following paragraphs. Let us assume that we have a system that is composed of some web services aiming to achieve a message protocol. The sub- components Service 1(S1 ) and Service 2(S2 ) are executing the login of sytem and message sending and logging out from the system respectively. The system can be defined as a stochastic regular expression tree as follows: TE = (Eroot , E, Σ) Eroot = S E = {S, S1 , S2 , E1 , E2 , E3 , E4 , E5 , E6 } Σ = {start, login, authenticationF ail, logout, sendM sg, msgF ail, terminate, success, retry} S = S1 : S2 S1 = start : login S2 = authenticationF ail[15] + logout[20] + E1 [65] E1 = sendM sg : E2 E2 = msgF ail[5] + E3 [95] E3 = E4 : E5 E4 = E6∗0.25 E5 = logout : terminate E6 = success : retry   Let a PACTL* formula P[0.3,0.4] true U (XmsgFail )true for the analysis on TE . The formula indicates the reachability analysis of the action “msgFail” on the root node Eroot = S. The words reaching the “msgFail’ from S are then recursively calculated: W ords(S)[reaching“msgFail”] ⊂ W ords(S) = {start.login.sendM sg.msgF ail (0.0325), start.login.sendM sg.succes.retry.msgF ail (0.008125), start.login.sendM sg.succes.retry.succes.retry.msgF ail (0.00203125), ....} The union is then ∞ Y 0.0325 × (0.25)i = 0.0325 ∈ [0.3, 0.4] (17) i=0   S |= P[0.3,0.4] true U (XmsgFail )true (18) Visually, we provide the corresponding probabilistic automata in Figure 1. The proof of equivalence between probabilistic Rabin automata [13] and the p- language, on which stochastic regular expression’s semantics denoted, is provided in [5]. SREs are calculated in a bottom up way by remaining the probabilistic cal- culations of strings. Such techique enables to reach every calculation on each node locally. The idea is to calculate all information on every SRE term and compose the solutions based on the operations. Service2 logout, 0.2 Service1 success, 0.95 login, 1 sendM sg, 0.65 logout, 0.75 start 0 1 2 3 4 retry, 0.25 authenticationF ail, 0.15 msgF ail, 0.05 terminate, 1 5 6 7 Fig. 1. Corresponding probabilistic automata to system TE 5 Conclusion We described a formal semantics for model checking of SREs that enjoys various applications in computer science. We studied the stochastic regular expressions with action based probabilistic logic in the model checking context and used stochastic regular expressions as an input model. Our initial attempt to reacha- bility analysis with strings is also presented which is promising and convenient for parallel and incremental computation especially in the domain of component based systems or modular systems. The further investigation is to extend the reachibility analysis for the application of full PACTL* on SRE trees and evalu- ate our approach with set of models. Furthermore, we are planning to use SRE model checking for the incremental computation of local changes that can occur in the model. References 1. Baier, C., Cloth, L., Haverkort, B.R., Kuntz, M., Siegle, M.: Model checking markov chains with actions and state labels. IEEE Trans. Software Eng. 33(4), 209–224 (2007) 2. Baier, C., Katoen, J.P.: Principles of Model Checking (Representation and Mind Series). The MIT Press (2008) 3. Bollig, B., Gastin, P., Monmege, B., Zeitoun, M.: A probabilistic kleene theorem. In: Chakraborty, S., Mukund, M. (eds.) Automated Technology for Verification and Analysis. pp. 400–415. Springer Berlin Heidelberg, Berlin, Heidelberg (2012) 4. De Nicola, R., Vaandrager, F.: Action versus state based logics for transition sys- tems. In: Guessarian, I. (ed.) Semantics of Systems of Concurrent Processes. pp. 407–419. Springer Berlin Heidelberg, Berlin, Heidelberg (1990) 5. Garg, V.K., Kumar, R., Marcus, S.I.: A probabilistic language formalism for stochastic discrete-event systems. IEEE Transactions on Automatic Control 44(2), 280–293 (Feb 1999) 6. Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects of Computing 6(5), 512–535 (Sep 1994) 7. Kartzow, A., Weidner, T.: Model checking constraint LTL over trees. CoRR abs/1504.06105 (2015) 8. Katoen, J.P.: The probabilistic model checking landscape. In: Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science. pp. 31–45. LICS ’16, ACM, New York, NY, USA (2016) 9. Kumar, R., Garg, V.K.: Control of stochastic discrete event systems modeled by probabilistic languages. IEEE Trans. Automat. Contr. 46(4), 593–606 (2001) 10. Kwiatkowska, M., Norman, G., Parker, D.: Probabilistic model checking in prac- tice: Case studies with PRISM. ACM SIGMETRICS Performance Evaluation Re- view 32(4), 16–21 (2005) 11. Marsan, M.A.: Stochastic petri nets: An elementary introduction. In: Rozenberg, G. (ed.) Advances in Petri Nets 1989. pp. 1–29. Springer Berlin Heidelberg, Berlin, Heidelberg (1990) 12. McIver, A., Rabehaja, T.M., Struth, G.: Probabilistic concurrent kleene algebra. In: Proceedings 11th International Workshop on Quantitative Aspects of Program- ming Languages and Systems, QAPL 2013, Rome, Italy, March 23-24, 2013. pp. 97–115 (2013) 13. Rabin, M.O.: Probabilistic automata. Information and Control 6(3), 230–245 (1963) 14. Ross, B.J.: Probabilistic pattern matching and the evolution of stochastic regular expressions. Applied Intelligence 13(3), 285–300 (Nov 2000) 15. Stoy, J.E.: Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory. MIT Press, Cambridge, MA, USA (1977) 16. Weidner, T.: Probabilistic Logic, Probabilistic Regular Expressions, and Constraint Temporal Logic. PhD dissertation, University Leipzig (2012) 17. Weidner, T.: Probabilistic Regular Expressions and MSO Logic on Finite Trees. In: Harsha, P., Ramalingam, G. (eds.) 35th IARCS Annual Conference on Founda- tions of Software Technology and Theoretical Computer Science (FSTTCS 2015). Leibniz International Proceedings in Informatics (LIPIcs), vol. 45, pp. 503–516 (2015)