Automatic Partitions Extraction to Distribute the Runtime Verification of a Global Specification Angelo Ferrando? DIBRIS, University of Genova, Italy angelo.ferrando@dibris.unige.it Abstract. Trace expressions are a compact and expressive formalism, initially devised for runtime verification of agent interactions in multia- gent systems (MAS), which has been successfully employed to model real protocols, and to generate monitors for mainstream multiagent system platforms, and generalized to support runtime verification of different kinds of properties and systems. In this paper we present the trace ex- pression formalism and its use in the runtime verification context focus- ing on future works related to the distribution aspects. Key words: Distributed Runtime Verification, Multiagent Systems, Distributed Monitoring, Trace expressions 1 Introduction and Motivations Runtime verification (RV) is a software verification technique that complements formal static verification (as model checking) and testing. In RV dynamic check- ing of the correct behavior of a system can be performed by a monitor which is generated from a formal specification of the properties to be verified. A possible way to specify the monitor behavior is through the set of all correct traces (finite or infinite sequences of events) which can be generated during the system execu- tion. Sets of traces may be represented in many different ways. In this extended abstract we present trace expressions [3, 4], a formalism inspired to session types [8, 12] which can be used to design monitors for the RV of centralized and decen- tralized software systems. Distributed runtime verification (DRV), as described by S. Rajsbaum in his keynote speech at the SSS 2015 Symposium [10], tackles the problem of building a decentralized runtime monitor for a distributed system and involves designing a distributed algorithm that coordinates the monitors in order for them to correctly verify the dynamic behavior of the whole system. Once the formal specification of the global pattern of events is given, however, distributing the monitoring activity can be resorted to decomposing the global specification into “sub-specifications”, involving less events than the global one, which can be monitored in an independent way from each other and such that ? Prof. Viviana Mascardi and Davide Ancona are the author’s advisors. the union of their independent monitoring gives the same guarantees as the mon- itoring of the whole system w.r.t. the original specification. The trace expression formalism can be used to construct protocols representing sequences on generic events. Since we focus on its distributed aspects, we limit the set of the events to the subset corresponding to the interaction events, where interactions are intended as exchange of messages among agents in a MAS. Trace expressions are already used for centralized runtime verification (no distribution) [4] and for the automatic generation of protocol-driven agents (total distribution) [2] through projection on each single agent involved in the protocol. This work lays the foundations to fill the gap between these two projects obtaining a middle way approach between the centralization (bottleneck) and the total distribution (inefficient) implementations. 2 Trace Expressions A trace expression τ represents a set of possibly infinite event traces, and is defined on top of the following operators: (1)  (empty trace), denoting the singleton set {} containing the empty event trace ; (2) ϑ:τ (prefix ), denoting the set of all traces whose first event e matches the event type1 ϑ (e ∈ ϑ), and the remaining part is a trace of τ ; (3) τ1 ·τ2 (concatenation), denoting the set of all traces obtained by concatenating the traces of τ1 with those of τ2 ; (4) τ1 ∧τ2 (intersection), denoting the intersection of the traces of τ1 and τ2 ; (5) τ1 ∨τ2 (union), denoting the union of the traces of τ1 and τ2 ; (6) τ1 |τ2 (shuffle), denoting the set obtained by shuffling the traces of τ1 with the traces of τ2 ; (7) ϑτ (filter ), a derived operator denoting the set of all traces contained in τ , when deprived of all events that do not match ϑ. Event types represent sets of generic events; in the following when we consider interaction types, which are a special case of event types related to the exchange of messages among agents, we denote them as α and we represent the interaction events contained inside as es→r meaning that the sender s sends the message e to the receiver r. To support recursion without introducing an explicit construct, trace expressions are regular (a.k.a. rational or cyclic) terms: they correspond to trees where nodes2 are either the leaf , or the node (corresponding to the prefix operator) ϑ with one child, or the nodes ·, ∧, ∨, and | all having two children. According to the standard definition of rational trees, their depth is allowed to be infinite, but the number of their subtrees must be finite. The semantics of trace expressions is specified by the transition relation δ ⊆ T × E × T, where T and E denote the set of trace e expressions and of events, respectively. As it is customary, we write τ1 → τ2 to mean (τ1 , e, τ2 ) ∈ δ. If the trace expression τ1 specifies the current valid state of the system, then an event e is considered valid iff there exists a transition e τ1 → τ2 ; in such a case, τ2 will specify the next valid state of the system after 1 To be more general, trace expressions are built on top of event types (chosen from a set ET), rather than of single events; an event type denotes a subset of E. 2 Since the filter is a derived operator it can be rewritten and there is not a corre- spondence on the tree. event e. Otherwise, the event e is not considered to be valid in the current state represented by τ1 . Example 1. Let E = {e1 , . . . , e7 }, and ϑi , i = 1, . . . , 7, be the event types such that e ∈ ϑi iff e = ei (that is, Jϑi K = {ei }); then the trace expression τ1 = ((ϑ1 :|ϑ2 :)∨(ϑ3 :|ϑ4 :))·(ϑ5 :ϑ6 :|ϑ7 :) denotes the following set of event traces:   e1 e2 e5 e6 e7 , e1 e2 e5 e7 e6 , e1 e2 e7 e5 e6 , e2 e1 e5 e6 e7 , e2 e1 e5 e7 e6 , e2 e1 e7 e5 e6 , e3 e4 e5 e6 e7 , e3 e4 e5 e7 e6 , e3 e4 e7 e5 e6 , e4 e3 e5 e6 e7 , e4 e3 e5 e7 e6 , e4 e3 e7 e5 e6 Example 2. Let the set of interaction types ET = {αping , αpong } and E = {pingA→B , pongB→A }, Jαping K = {pingA→B } and Jαpong K = {pongB→A }; then the trace expression P ingP ong = (αping :P ingP ong · αpong :)∨ denotes the fol- lowing set of interaction events: n n {pingA→B pongB→A , pingA→B pingA→B pongB→A pongB→A , ..., pingA→B pongB→A }. 2.1 Projection The projection function was first introduced in [1]. Given the finite set AGS of all the agents that could play a role in the MAS and an interaction type α (an event type containing only interaction events), senders(α) is the set of all the agents in AGS that could play the role of sender in actual interactions having type α, and receivers(α) is the set of all the agents in AGS that could play the role of receiver in interactions of type α. The involves predicate holds on one interaction type α and one set of agents Ags, involves(α, Ags), iff (senders(α) ∩ Ags 6= ∅) ∨ (receivers(α) ∩ Ags 6= ∅). We define two auxiliary functions f irst : T → P(ET) (last : T → P(ET)), f irst(τ ) = {ϑ1 , ϑ2 , ..., ϑn } (last(τ ) = {ϑ1 , ϑ2 , ..., ϑn }) iff ∀ϑi .∃e∈ϑi s.t. e is one of the first (last) events which can be consumed by the δ transition re- lation called on τ . In this way we can introduce the derived sets involved(α) = Ags ⇐⇒ senders(α) ∪ receivers(α) = Ags and involved() = ∅, involved(τ ) = involved(α1 )∪...∪involved(αn )∪involved(τ1 )∪...∪involved(τm ), with f irst(τ ) = e {α1 , ..., αn } and ∀1≤i≤m .τ → τi , for some e. Projection can be described as a function Π : T × P(AGS) → T. Π is driven by the syntax of the trace expression to project; since Π is defined on cyclic terms, the simplest way to define it is by coinduction as follows: (i) Π(, Ags) = , (ii) Π(α : τ, Ags) = α : Π(τ, Ags) if involves(α, Ags), (iii) Π(α : τ, Ags) = Π(τ, Ags) if ¬involves(α, Ags), (iv) Π(τ 0 op τ 00 , Ags) = Π(τ 0 , Ags) op Π(τ 00 , Ags), where op ∈ {·, ∧, ∨, |}. 3 Trace Expression Distribution As already anticipated, we consider trace expressions to model interaction proto- cols inside a MAS. With respect to the centralized runtime verification approach proposed in [4], we want to reason about the trace expression distribution in order to obtain a set of trace expressions where each one represents the proto- col subset related to a fixed set of agents. The main problem of this approach is to understand which agents can be monitored separately and which must be monitored together (unsplittable) in order to guarantee that the distributed monitoring gives the same results as the centralized one. This notion is strongly related to the correctness definition in the Choreographies research field [9] where a choreography can be projected on each single entity (agent for us) only if it satisfies three conditions (for trace expressions the first two are enough). We report these two conditions adapted for the trace expression formalism. Definition 1. The two conditions which a trace expression τ must satisfy in order to be correctly projected on each single agent involved in its event types are: 1. Connectedness for sequence. For each sub-expression α:τ1 , with f irst(τ1 ) = {α1 , α2 , ..., αn }, we have ∀αi ∈f irst(τ1 ) .involved(α) ∩ involved(αi )6=∅. For each sub-expression τ1 ·τ2 , with last(τ1 ) = {α1 , α2 , ..., αn } and f irst(τ2 ) = {α10 , α20 , ..., αn0 }, we have ∀α∈last(τ1 ) .∀α0 ∈f irst(τ2 ) .involved(α) ∩ involved(α0 ) 6= ∅. 2. Unique point of choice. For each sub-expression τ1 ∨τ2 , with f irst(τ1 ) = {α1 , α2 , ..., αn } and f irst(τ2 ) = {α10 , α20 , ..., αn0 }, we have ∀α∈f irst(τ1 ) .∀α0 ∈f irst(τ2 ) .involved(α) ∩ involved(α0 ) 6= ∅ and involved(τ1 ) = involved(τ2 ). Without loss of generality we consider only trace expressions where all inter- action types represent finite sets of events. Thus, we can have interaction types like α = {i1A→B , i2B→A , i3A→B , ...} and not interaction types like α = {i1A→B , i2C→D , ...} or α = {i1A→B , i2A→D , ...}. We can observe that the two conditions are too restrictive when we want to distribute our protocols on sets of agents. In particular, we can easily find protocols that cannot be projected on a single agent but can be projected on a set of agents. Considering for instance the trace expression τ = ...(αping :αpong :)... where ... means that τ contains also other terms and Jαping K = {pingA→B }, Jαpong K = {pongC→D }, can immediately note that it does not satisfy the first condition (the connectedness for sequence). Since τ can be a very large trace expression we would like to distribute it anyway. Even if we cannot distribute it on each single agent {{A}, {B}, {C}, {D}}, in this particular case a possible choice for the distribution could be {{A}, {B, C}, {D}} obtaining three different projections of the global protocol in three separated parts where B and C are projected and monitored together. Example 3. Given the trace expression τ = (αm1 :)∨(αm2 :) where AGS = {A, B, C, D}, Jαm1 K = {msg1A→B } and Jαm2 K = {msg2C→D }. Π(τ, {A}) = Π(τ, {B}) = Π((αm1 :)∨(αm2 :)) = (αm1 :Π())∨(Π()) = (αm1 :)∨() = αm1 : and Π(τ, {C}) = Π(τ, {D}) = Π((αm1 :)∨(αm2 :)) = (Π())∨(αm2 :Π()) = ()∨(αm2 :) = αm2 : The two local versions of the protocol have lost the unique point of choice information. 3.1 Automatic Partitions Extraction An important feature in order to distribute a trace expression is the automatic generation of the set of all possible partitions of agents that preserve the seman- tics during the projection phase. Since we know that a trace expression must satisfy the two conditions (Definition 1) to be correctly projected on each single agents involved in the protocol, we can use this information to guide our analysis. From an high level point of view, given a trace expression τ , for each subterm of τ we can check if the two conditions are satisfied; if a condition is not satisfied on a singular agent, we try to evaluate it on a set of agents containing the agent. Considering for instance the first condition (connectedness for sequence), if we find a subterm α1 : α2 : ..., with involved(α1 ) ∩ involved(α2 ) = ∅, it does not satisfy the condition, but we can generate the set of all possible correct partitions {{a, b}|a ∈ involved(α1 ), b ∈ involved(α2 )} forcing in this way the connected- ness property. A similar thing should be done for the other condition. Once we have obtained the set with all correct partitions we can choose the one that best meets our distribution requirements. We are working on the implementation of the corresponding algorithm focus- ing on the correctness and completeness proofs. The intuition is that forcing the preservation of the two conditions (Definition 1) we preserve all critical points of the trace expression, where the critical points are all points of the protocol where it could be necessary monitor some agents together. 4 Related and Future Work In this extended abstract we have presented a possible approach for the distri- bution of a trace expression on a set of agents. Following the conditions deriving from Choreographies (Definition 1) we can extract the partitions of agents which allow us to obtain the semantic preservation during the projection phase. Once we have the set of all possible valid partitions we can use them to generate the monitors to perform the distributed runtime verification of our MAS. We can find works on distributed runtime monitoring like [7, 6, 11] but w.r.t. DRV of MASs we were not able to find any related work, except for the one which spun off from [1], namely [5]. Another relevant work is [13] where an interesting distributed approach to process mining is presented. Considering the conformance checking, that work is extremely related to ours and it can be useful to study the possible connections. For instance using trace expressions instead of Petri nets for the conformance checking of event logs. Future work will be to implement the algorithm corresponding to this intu- ition using the programming language Prolog whereby we have already imple- mented the semantics of trace expressions and the projection function. Once we will have implemented the distribution algorithm we will be able to project the global protocol on each single partition in order to automatically generate dis- tributed monitors. A possible implementation could be using a MASs framework based on logic programming, in this way we will be able to generate monitors di- rectly by the projected trace expressions. We will try our approach first on some well known communicative protocols (as Contract Net Protocol, Alternating Bit Protocol, and so on) and after on a real case study concerning the distributed runtime verification of a MAS. References 1. D. Ancona, D. Briola, A. El Fallah Seghrouchni, V. Mascardi, and P. Taillibert. Ef- ficient verification of MASs with projections. In Engineering Multi-Agent Systems - Second International Workshop, EMAS 2014, Revised Selected Papers, Lecture Notes in Computer Science, 2014. 2. D. Ancona, D. Briola, A. Ferrando, and V. Mascardi. Global protocols as first class entities for self-adaptive agents. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2015, pages 1019–1029, 2015. 3. D. Ancona, S. Drossopoulou, and V. Mascardi. Automatic generation of self- monitoring MASs from multiparty global session types in Jason. In DALT 2012, volume 7784 of LNAI, pages 76–95. Springer International Publishing, 2012. 4. D. Ancona, A. Ferrando, and V. Mascardi. Theory and Practice of Formal Methods: Essays Dedicated to Frank de Boer on the Occasion of His 60th Birthday, chapter Comparing Trace Expressions and Linear Temporal Logic for Runtime Verification, pages 47–64. Springer International Publishing, Cham, 2016. 5. D. Briola, V. Mascardi, and D. Ancona. Distributed runtime verification of JADE multiagent systems. In D. Camacho, L. Braubach, S. Venticinque, and C. Badica, editors, IDC 2014, volume 570 of Studies in Computational Intelligence, pages 81–91. Springer International Publishing, 2014. 6. T.-C. Chen, L. Bocchi, P.-M. Deniélou, K. Honda, and N. Yoshida. Asynchronous Distributed Monitoring for Multiparty Session Enforcement, pages 25–45. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. 7. A. Francalanza, A. Gauci, and G. Pace. Runtime monitoring of distributed systems (extended abstract). Technical report, University of Malta, 2010. WICT. 8. K. Honda, V. T. Vasconcelos, and M. Kubo. Language primitives and type dis- cipline for structured communication-based programming. In Proceedings of the 7th European Symposium on Programming: Programming Languages and Systems, ESOP ’98, pages 122–138, London, UK, UK, 1998. Springer-Verlag. 9. I. Lanese, C. Guidi, F. Montesi, and G. Zavattaro. Bridging the gap between interaction-and process-oriented choreographies. In 2008 Sixth IEEE International Conference on Software Engineering and Formal Methods, pages 323–332. IEEE, 2008. 10. S. Rajsbaum. Distributed runtime verification – where combinatorics, fault- tolerance and formal methods meet. Keynote Talk at the SSS 2015, August 2015, 2015. 11. T. Scheffel and M. Schmitz. Three-valued asynchronous distributed runtime veri- fication. In Twelfth ACM/IEEE International Conference on Formal Methods and Models for Codesign, MEMOCODE 2014, Lausanne, Switzerland, October 19-21, 2014, pages 52–61, 2014. 12. K. Takeuchi, K. Honda, and M. Kubo. An interaction-based language and its typing system. In In PARLE?94, volume 817 of LNCS, pages 398–413. Springer- Verlag, 1994. 13. W. M. P. van der Aalst. Distributed Process Discovery and Conformance Checking, pages 1–25. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.