Formal Semantics for Probabilistic Verification
         of Stochastic Regular Expressions

              Sinem Getir1 , Esteban Pavese1 , and Lars Grunske1

           Software Engineering, Humboldt University Berlin, Germany


      Abstract. Modelling and verification of software systems is an effective
      phase of system development, as it can uncover failures in design early
      in the development process. There is an increasing need for languages
      and processes that allow for the specification of uncertainty that allow,
      for example, the modelling of the unknown behaviour of a user, or the
      stochastic failure rate of hardware components.
      In this paper we introduce a formal semantics on Stochastic Regular
      Expressions (SREs) over probabilistic action logics for quantitative ver-
      ification. We provide the recursive calculation of the language generated
      by an SRE, enhanced to reuse local results for global verification of sys-
      tem specifications. Furthermore, we demonstrate how to model systems
      with SREs and how to perform reachability analysis with Probabilistic
      Action-based Computational Tree Logic (PACTL*).

      Keywords: stochastic regular expressions, probabilistic formalism,
      probabilistic model checking, probabilistic verification


1    Introduction
The analysis of systems with a probabilistic behaviour plays an important role
in several applications, such as software engineering, speech recognition, digital
communications and computational biology among others. On the other hand,
regular expressions have spread through all of theoretical computer science and
enjoy plentiful applications in the field of natural language processing, including
parsing, deep language models, model inference and machine translation.
    Several studies have been conducted in probabilistic version of regular ex-
pressions that is studied as probabilistic concurrent Kleene algebra in [12] and
extended with additional Kleene theorems in the application of quantitative rea-
soning on database queries [3].
    Kartzow and Weidner [7,17] define a Monadic Second-Order Logic (MSOL)
and a constraint logic with temporal properties for data analysis for probabilistic
regular expressions. Additionally, Weidner specified Probabilistic Regular Ex-
pressions for infinite strings with ω-properties in his thesis [16].
    In this paper, we define a semantics of stochastic regular expressions in the
context of probabilistic model checking, employing a probabilistic extension of
Action-based Computation Tree Logic (ACTL*) to reason about temporal prop-
erties quantitatively. Probabilistic model checking [8] is a technique developed
in order to automatically perform such assessments, and has been successfully
applied in recent years [10]. Probabilistic models used for model checking can be
at different levels of abstraction, such as Markov Chains [2], Markov Decision
Processes [2] and Stochastic Petri Nets [11] among others. In contrast to state-
based representations, we introduce an approach and focus on stochastic regular
expressions as an input model for probabilistic model checking applications.


2     Stochastic Regular Expressions (SRE)

In this section, we briefly recall SRE [14] and action logic [1].


2.1   Syntax of SRE

The syntax of a Stochastic Regular Expression (SRE) E over an alphabet Σ is
defined recursively as follows:

                                  X
                       E := α          Ei [ni ] E1 : E2   E ∗f                   (1)
                                   i


with α ∈ Σ ∪ {ε}, ni ∈ N0 , f ∈ [0, 1] ⊂ R and every term Ei is a SRE, such that:

 1. Atomic Action α: α is an atomic action that belongs to the alphabet Σ.
            P
 2. Choice i Ei [ni ]: One of the provided terms Ei [ni ] is probabilistically chosen
    according to calculated probabilities from occurence values. ni denotes the
    occurrence value or choice rate for each term, such that the i-th term is
    chosen with probability Pninj . Occurrence value or choice rate is defined as
                                j
    the number of cases that the node is chosen statistically.
 3. Concatenation E1 : E2 : The terms E1 and E2 are successively interpreted.
 4. Kleene Closure E ∗f : The term E is repeated for an indefinite number of
    times, subject to a binomial distribution. Each iteration occurs with a prob-
    ability of f . The termination probability is 1 − f .
 5. Plus Closure E +f : The +Closure is a syntactic sugar that is omitted here,
    but can be easily emulated with E : E ∗f .

Without loss of generality, the empty string ε is not included in the alphabet.
However we include the empty string  as an atomic action. On the other hand,
 can be derived from an expression like α∗0.0 , α ∈ Σ.
    A derivation of a conventional regular expression E is the set of sentences, or
strings over the alphabet, derivable from it. This defines the language L(E) of
E. This notion of language derivation is similarly applicable to SRE, except that
each string has a probability value associated with it, and hence the language
itself is associated with a probability distribution of its members as explained in
the following within its denotational semantics.
2.2     Denotational semantics of SREs
Intuitively SREs can be understood as an expression that defines a specific prob-
ability function over its language strings such that JEK : s ∈ Σ∗ → [0, 1]. Such
probabilistic language (p-language) is previously described for discrete events
systems in [9]. In the following, we provide a trivial example on the p-language
to explain the relationship between the SRE and the p-language.
    The semantics of SREs are described by the p-language [9] in the style of
denotational semantics [15]. The probability function for an SRE E is denoted
by JEK, and its application to a particular string s is denoted JEKs, which rep-
resents an acceptance probability associated with string s in the language L(E).
The probability function recursively calculates the occurrence probability of an
arbitrary string s ∈ Σ∗ in a SRE model. By definition, if an arbitrary string s
has a probability value greater than 0, it is accepted as a word of SRE. Meaning
that if s ∈ L(E) then JEKs > 0.
    Example 1. Let L be a probabilistic language [9] describing the Bernoulli
process where each experiment has two outcomes a and b with probabilities p
and 1 − p respectively. Then L is defined on the alphabet Σ = {a, b} as
L(s) = p#(a,s) .(1 − p)#(b,s) where #(a, s) is representing the number of occur-
rences in word s.
    In the following we formally define some additional notions which will be
referred to throughout this paper.
Definition 1 (Words of a SRE). W ords(E) = {w | w ∈ L(E)}
Definition 2 (Word length). A length of a word w = w0 w1 ...wn is the num-
ber of included characters and denoted as |w|= n through the paper.
Definition 3 (Probability of a word in the language).                 The probability
                                                           Ei (ni ) | E1 : E2 | E ∗f |
                                                        P
function JEKs for every possible SRE term E (α |
E +f ) is calculated recursively, where s = α1 , ..αn ∈ Σ∗ :

 – Atomic actions:

                                    JαKs = 1, if α = s
                                    JαKs = 0, if α 6= s                           (2)

 – Choice
                         rX         z    X  nk 
                               Ei ni s =    P     · JEk Ks                        (3)
                                              ni
                                               k

      Since every term might recognize s, the overall probability for a choice
      expression is the sum of all the term probabilities with respect to s.

 – Concatenation:
                                    n
                                    X
                    JE1 : E2 Ks =         (JE1 Kα1 ..αi · JE2 Kαi+1 ..αn )
                                    i=0
      In the summation, s is decomposed into two (possibly empty) substrings,
      each of which may be consumed by a concatenated expression. Even
      though one term may recognize its substring argument, if the other term
      does not recognize its respective substring, then that term returns a prob-
      ability of 0, and the overall probability for that instance of decomposition is 0.

 – Kleene closure:

                                                          JE ∗f Kε = 1 − f
                                   n
                                   X
                      JE ∗f Ks =         (f · JEKα1 ..αi · JE ∗f Kαi+1 ..αn )       (4)
                                   i=1
                                                       +f · JEKs · JE ∗f Kε

      The first formula accounts for empty strings, as the only way an iterated
      expression should recognize an empty string is by not iterating; in other words
      terminating without executing (The termination probability is therefore 1-f ).
      The other formula recursively defines the general case. Here, one iteration of
      E will consume some portion of s, and the rest of s is consumed by further
      iterations. It is assumed that an iteration of a loop always consumes some
      non-empty string. Otherwise, the semantic model would have to account for
      Kleene closure iterating indefinitely on an argument, which is not an useful
      behaviour.

All SRE probability functions presented above are well formed probabilitity func-
tions. Interested readers can find the details of probability functions and the
proof of well formness in [14].


2.3     Action based Computation Tree Logic (ACTL*)

ACTL* is introduced in [4] where the comparison for state labelled and transition
labelled systems is studied. The syntax of ACTL* is described recursively on
action labelled transition systems as follows; where ϕ is a formula executed on
the runs of the system:

                   ϕ := T rue | ¬ϕ | ϕ ∧ ϕ0 | ∃ϕ | ϕ U ϕ0 | Xa ϕ | X ϕ              (5)

      We biefly recall the definitions required for the ACTL* semantics.

Definition 4 (Labelled transition systems ). A labelled transition system
is a tuple (S, Act, →) where:

  . S is a finite set of states
  . Act is a finite, non-empty set of actions
  . → is the transition relation denoted in ⊆ S × (Act ∪ )× S and any element
    of → is called a transition.
Definition 5. A sequence (s0 , α0 , s1 )(s1 , α1 , s2 ).. ∈ → ∞ is called a path from
s0 . A run ρ = (s, Φ) is a pair from s ∈ S, where Φ is a path from s, first state
of ρ is s (f irst(ρ) = s) and path(ρ) = Φ. If a run θ is a suffix for run ρ, then
we denote as θ ≤ ρ.

The semantics is given by satisfaction relations based on the definitions above:

ρ |= T rue always
ρ |= ¬ϕ iff ρ 6|= ϕ
ρ |= ϕ ∧ ϕ0 iff ρ |= ϕ and ρ |= ϕ0
ρ |= ∃ψ iff there exists a run θ ∈ run(f irst(ρ)) such that θ |= ψ
ρ |= ψ U ψ 0 iff there exists a θ with ρ ≤ θ such that θ |= ψ0 and for all ρ ≤ η ≤ θ : η |= ψ0
ρ |= X ϕ iff there exists s, α, s0, θ such that ρ = (s, (s, α, s0))θ and θ |= ϕ
ρ |= Xa ϕ iff there exists s, s0, θ such that ρ = (s, (s, a, s0))θ and θ |= ϕ      (6)


3     Semantics of Stochastic Regular Expression Trees with
      Probabilistic Action based Computation Tree Logic

Our goal is to reason about temporal properties on SRE models probabilistically.
A very common logic Probabilistic computation tree logic (PCTL*) [6] and vari-
ants are defined on the state and path formulas. However Stochastic Regular
Expressions do not have the explicit notation of a state. Therefore we prefer to
extend the ACTL* logic semantically expressed on the runs of the system as
provided in subsection 2.3.
   The extended syntax of a Probabilistic ACTL* (PACTL*) is defined as fol-
lows where ϕ is a word and Φ is a SRE formula.

Definition 6 (Syntax of PACTL*).

                                Φ = ¬Φ | Φ ∧ Φ0 | PP (ϕ)                           (7)
                            ϕ = true | Xa ϕ | X ϕ | ϕ U ϕ0                         (8)

where P ⊆ [0, 1]


3.1   Semantics of PACTL*

Every SRE term E is defined as a node that specifies operation type (choice,
concatenation, kleene closure or action), choice rate, kleene probability and
subnode(s).
    Formally E is an action or a tuple based on its type: (N , R), (N ), (N , k)
if the opetaion types are choice, concatenation, kleene closure (T = + |:| ∗
or action a ∈ Σ,) respectively. where N = {E1 , E2 , ..., En } is the finite set of
subnodes in the operating order (E1 : E2 6= E2 : E1 ), R = {r1 , r2 , ..rn } is the set
of corresponding choice rates (r ∈ N) and k ∈ [0, 1] is a kleene probability.
    We construct every SRE node in a bottom up fashion by parsing the given
string by using operator presedence. Hence, we avoid the ambiguity of the parsed
trees and remain them unique. (The order of operator predence is ∗, :, +). We
also restrict one operation type per SRE node which allows unambiguity.

   The semantics is defined a satisfaction relation for a word w and a SRE term
E as follows:

      E |= ¬Φ iff E 6|= Φ                                                         (9)
                 0                    0
      E |= Φ ∧ Φ iff E |= Φ ∧ E |= Φ                                             (10)
      E |= PP (ϕ) iff P r{w |= ϕ | w ∈ W ords(E)} ∈ P                            (11)
      w |= true always                                                           (12)
      w |= Xa ϕ iff w[0] = a and w[1] |= ϕ                                       (13)
      w |= X ϕ iff w[1] |= ϕ                                                     (14)
      w |= ϕ U ϕ0 iff for some i ≤ |w|, w[i] |= ϕ0 and w[j] |= ϕ, ∀j < i.        (15)


The set of words for an SRE can be calculated recursively on SRE node E:


              
              {a},
              
               S
                                                                            if T = a
                    W ords(Ei ),                                            if T = +
              
              
              
              
                Ei ∈N
W ords(E) =
              
               (W ords(E0 ) · W ords(E1 ))... · W ords(En ), ∀Ei ∈ N       if T =:
              
               ∞
               W ords(Esub )k , where Esub ∈ N and E = (Esub )∗
              S
                                                                           if T = ∗
                k
                                                                                 (16)


A system is specified as an SRE tree that includes a root node and finite set of
nodes. An SRE tree is formally defined as; TE = (Eroot , E, Σ), where Eroot is the
root node, E is the finite set of all nodes and Σ is the alphabet. Hence, verifying
the the root node Eroot will result in verifying the system.


4   Example: System Specification with SRE Tree

We provide an example automata and a corresponding SRE system specifica-
tion in the following paragraphs. Let us assume that we have a system that is
composed of some web services aiming to achieve a message protocol. The sub-
components Service 1(S1 ) and Service 2(S2 ) are executing the login of sytem
and message sending and logging out from the system respectively. The system
can be defined as a stochastic regular expression tree as follows:
       TE = (Eroot , E, Σ)
       Eroot = S
       E = {S, S1 , S2 , E1 , E2 , E3 , E4 , E5 , E6 }
       Σ = {start, login, authenticationF ail, logout, sendM sg,
       msgF ail, terminate, success, retry}
       S = S1 : S2
       S1 = start : login
       S2 = authenticationF ail[15] + logout[20] + E1 [65]
       E1 = sendM sg : E2
       E2 = msgF ail[5] + E3 [95]
       E3 = E4 : E5
       E4 = E6∗0.25
       E5 = logout : terminate
       E6 = success : retry
                                                          
    Let a PACTL* formula P[0.3,0.4] true U (XmsgFail )true for the analysis on
TE . The formula indicates the reachability analysis of the action “msgFail” on
the root node Eroot = S. The words reaching the “msgFail’ from S are then
recursively calculated:

W ords(S)[reaching“msgFail”] ⊂ W ords(S) = {start.login.sendM sg.msgF ail (0.0325),
start.login.sendM sg.succes.retry.msgF ail (0.008125),
start.login.sendM sg.succes.retry.succes.retry.msgF ail (0.00203125), ....}

The union is then
                                  ∞
                                  Y
                      0.0325 ×         (0.25)i = 0.0325 ∈ [0.3, 0.4]          (17)
                                 i=0
                                                               
                          S |= P[0.3,0.4] true U (XmsgFail )true              (18)

    Visually, we provide the corresponding probabilistic automata in Figure 1.
The proof of equivalence between probabilistic Rabin automata [13] and the p-
language, on which stochastic regular expression’s semantics denoted, is provided
in [5].
    SREs are calculated in a bottom up way by remaining the probabilistic cal-
culations of strings. Such techique enables to reach every calculation on each
node locally. The idea is to calculate all information on every SRE term and
compose the solutions based on the operations.
                                                           Service2
                                              logout, 0.2

          Service1
                                                          success, 0.95
                   login, 1         sendM sg, 0.65                            logout, 0.75
start      0                    1                    2                    3                  4

                                                           retry, 0.25

    authenticationF ail, 0.15                            msgF ail, 0.05                          terminate, 1


                                5                    6                                       7


               Fig. 1. Corresponding probabilistic automata to system TE


5   Conclusion
We described a formal semantics for model checking of SREs that enjoys various
applications in computer science. We studied the stochastic regular expressions
with action based probabilistic logic in the model checking context and used
stochastic regular expressions as an input model. Our initial attempt to reacha-
bility analysis with strings is also presented which is promising and convenient
for parallel and incremental computation especially in the domain of component
based systems or modular systems. The further investigation is to extend the
reachibility analysis for the application of full PACTL* on SRE trees and evalu-
ate our approach with set of models. Furthermore, we are planning to use SRE
model checking for the incremental computation of local changes that can occur
in the model.


References
 1. Baier, C., Cloth, L., Haverkort, B.R., Kuntz, M., Siegle, M.: Model checking markov
    chains with actions and state labels. IEEE Trans. Software Eng. 33(4), 209–224
    (2007)
 2. Baier, C., Katoen, J.P.: Principles of Model Checking (Representation and Mind
    Series). The MIT Press (2008)
 3. Bollig, B., Gastin, P., Monmege, B., Zeitoun, M.: A probabilistic kleene theorem.
    In: Chakraborty, S., Mukund, M. (eds.) Automated Technology for Verification
    and Analysis. pp. 400–415. Springer Berlin Heidelberg, Berlin, Heidelberg (2012)
 4. De Nicola, R., Vaandrager, F.: Action versus state based logics for transition sys-
    tems. In: Guessarian, I. (ed.) Semantics of Systems of Concurrent Processes. pp.
    407–419. Springer Berlin Heidelberg, Berlin, Heidelberg (1990)
 5. Garg, V.K., Kumar, R., Marcus, S.I.: A probabilistic language formalism for
    stochastic discrete-event systems. IEEE Transactions on Automatic Control 44(2),
    280–293 (Feb 1999)
 6. Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal
    Aspects of Computing 6(5), 512–535 (Sep 1994)
 7. Kartzow, A., Weidner, T.: Model checking constraint LTL over trees. CoRR
    abs/1504.06105 (2015)
 8. Katoen, J.P.: The probabilistic model checking landscape. In: Proceedings of the
    31st Annual ACM/IEEE Symposium on Logic in Computer Science. pp. 31–45.
    LICS ’16, ACM, New York, NY, USA (2016)
 9. Kumar, R., Garg, V.K.: Control of stochastic discrete event systems modeled by
    probabilistic languages. IEEE Trans. Automat. Contr. 46(4), 593–606 (2001)
10. Kwiatkowska, M., Norman, G., Parker, D.: Probabilistic model checking in prac-
    tice: Case studies with PRISM. ACM SIGMETRICS Performance Evaluation Re-
    view 32(4), 16–21 (2005)
11. Marsan, M.A.: Stochastic petri nets: An elementary introduction. In: Rozenberg,
    G. (ed.) Advances in Petri Nets 1989. pp. 1–29. Springer Berlin Heidelberg, Berlin,
    Heidelberg (1990)
12. McIver, A., Rabehaja, T.M., Struth, G.: Probabilistic concurrent kleene algebra.
    In: Proceedings 11th International Workshop on Quantitative Aspects of Program-
    ming Languages and Systems, QAPL 2013, Rome, Italy, March 23-24, 2013. pp.
    97–115 (2013)
13. Rabin, M.O.: Probabilistic automata. Information and Control 6(3), 230–245
    (1963)
14. Ross, B.J.: Probabilistic pattern matching and the evolution of stochastic regular
    expressions. Applied Intelligence 13(3), 285–300 (Nov 2000)
15. Stoy, J.E.: Denotational Semantics: The Scott-Strachey Approach to Programming
    Language Theory. MIT Press, Cambridge, MA, USA (1977)
16. Weidner, T.: Probabilistic Logic, Probabilistic Regular Expressions, and Constraint
    Temporal Logic. PhD dissertation, University Leipzig (2012)
17. Weidner, T.: Probabilistic Regular Expressions and MSO Logic on Finite Trees.
    In: Harsha, P., Ramalingam, G. (eds.) 35th IARCS Annual Conference on Founda-
    tions of Software Technology and Theoretical Computer Science (FSTTCS 2015).
    Leibniz International Proceedings in Informatics (LIPIcs), vol. 45, pp. 503–516
    (2015)