Stochastic Performance Analysis of Distributed Activities Toqeer Israr, Gregor v. Bochmann Department of Electrical Engineering and Computer Science University of Ottawa, 800 King Edward Ave. Ottawa Ontario K1N 6N5 Canada {tisra051, bochmann}@eecs.uottawa.ca Abstract. This paper analyzes stochastic performance of a distributed global activity, composed of sub-activities sequenced serially, probabilistically, or concurrently. We provide general formulas with which we calculate the per- formance of a composite activity based on the performance of the constituent sub-activities and the control structure. To do this, we model each sub-activity as a Partially Ordered Specification (POS), where each sub-activity is character- ized by independent input events, dependent output events and the stochastic minimum delays between these events. This technique allows two or more sub- activities to be combined hierarchically. Proofs of correctness for these formu- las are given and a simple example is discussed throughout the paper. Keywords: software performance, stochastic, modeling, partial order, collabo- rations, UML Activity Diagrams, distributed services, web services 1 Introduction Many workflows consist of distributed systems, with multiple communicating components running on different processors or in different processes. For example, a login workflow may start with a request from a Web Client that invokes a process in a Retailer Server, which in turn makes calls to some “third party” Servers – components involved would be the Client, Retailer Server and the “third party” Servers. A number of models have been employed during the system design and devel- opment process of such systems. These modeling paradigms include UML Activity Diagram [5], UML Activities [5], Use Case Maps, the Process Definition Language, Business Process Modeling Language, Business Process Execution Language, Web Services Choreography Description Language, and Petri Nets. Most, if not all, of the mentioned notations can potentially be decomposed in sub-activities and further sub- sub-activities. These notations assume a single system component to be allocated to a basic activity in the decomposition. However, this does not hold true anymore as even the most basic activity may involve several components. Clearly, there is a need to model when a component starts and ends its execution in a given workflow. With the interaction of multiple components in a given workflow (activity), it is imperative sometimes to be able to calculate the performance of the components i.e. the delay from the beginning to the end of a component’s participation. Proceedings of NiM-ALP 2013 24 To satisfy these needs to model activities, Bochmann et al. in [1] and [2] have proposed a new modeling paradigm based on UML Activities and their orderings whilst we represented this model in [3] as a partially ordered set of inputs and outputs, called Partially Ordered Specification. While the aforementioned paradigms model the activities and analyze the correct- ness of the required communication protocols in [1] and [2], we analyzed the per- formance of such activities in [3]. Using partial orders, we introduced a Partially Ordered Specification (POS) to model the temporal relationships among the sub- activities within a given activity. Inspired by Performance Evaluation and Review Technique (PERT), with POS, we calculate the fixed completion time for each com- ponent (actor) of a global activity based on fixed performance characteristics of the sub-activities. In this paper, we extend this work by assuming that delays are distributions, and analyze the performance of a global scenario based on the stochastic performance properties of the constituent sub-activities. We start off by reviewing the composition rules for stochastic distributions in Sec- tion 2. In Section 3, we discuss the modeling paradigm based on activities as well as Partial Order Systems. We also describe the rules of strong and weak sequencing as well as provide performance analysis for fixed delays from [3]. In Section 4, we pro- pose formulas and provide proofs for calculating the performance of composite activi- ties with distributions. 2 Composition of Distributions Graph-based models are common modeling paradigms to represent system behav- iours as UML Activities. An Activity is comprised of actions (nodes) and sequencing operators (edges) such as sequence, alternative, concurrency, and loops to define the relationship between these actions, as illustrated in Fig 1. These activities may have constant or distribution time delays. We analyzed in [3] the performance of a global activity, based on fixed time delays of constituent sub-activities. However, when activity delays are statistically varying and characterized by a time distribution, the analysis of a graph model can be quiet challenging. If an event-precedence graph with random activity times is in se- ries/parallel/alternative in nature, the distribution function of the completion of the graph can be calculated by combining the distribution functions of the individual activities using multiplication and convolution [6]. However, the following initial assumptions are made:  Parallel activities do not need to have identical distributions.  Infinite resources are available for the activities and hence there are no re- source contention issues. We assume the duration of an activity, Ai, has an associated delay distribution (probability density function) of fi(x). The cumulative distributive function (CDF) for the delay of activity Ai is then defined by: Fi(t) = i(x) dx The function Fi(t), associated with each activity Ai, represents the probability that that activity Ai finishes by time t. While each sub-activity is assigned a cumulative Proceedings of NiM-ALP 2013 25 distribution function (CDF) for the completion time of the activity it models, these sub-activities may be composed to form a composite activity with a resultant CDF for the completion time of the entire set of considered activities. We consider activities composed with the following 3 operators: series, alternative and concurrency. We assume all activities start with a single activity, called an initial activity. Activities are in series when a single activity succeeds the initial activity such as shown in Figure 1a. Alternate activities may exist when a single activity may execute amongst multiple activities such as shown in Figure 1b. Concurrent activi- ties exist when multiple activities may execute simultaneously succeeding the initial activity. Various compositions of these sub-activities A1...An, each activity Ai with inde- pendent CDF Fi(t) can be abstracted by a global activity G with a CDF, FG(t). A1 F1(t) A1 F1(t) p1 F1(t) A1 p2 S0 A2 F2(t) A0 A2 F2(t) F2(t) A2 pn ... ... An F (t) An F (t) n n a) series b) alternative c) concurrency Fig. 1. Various Operators 2.1 Series If any two sub-activities, Aj and Ak, with CDF of Fj(t) and Fk(t) are combined in series such as shown in Figure 1a, the CDF of the combined activities is [6]: FSUM(t) = Fj(t) ⊗ Fk(t) (1) where ⊗ represents convolution, defined as: Fj(t) ⊗ Fk(t) = Fk( t – x ) dFj(x) (2) 2.2 Alternatives If there is a probability pi for the execution of activity Ai, such as in Figure 1b, the model represents a scenario with multiple possible paths. Then the distribution for the global activity G is [6]: FG(t) = ∑ i(t) Fi(t) (3) 2.3 Concurrency If the sub-activities in the global activity G execute concurrently, such as in Figure 1c, then the following two cases can be considered. Suppose there is a parallel search for an item in a distributed database, where the earliest concurrent search to finish, will terminate the overall search. We would be Proceedings of NiM-ALP 2013 26 interested in the time it would take for the earliest search to finish. A scenario with the earliest or “minimum” CDF of a global activity G composed of parallel sub- activities Ai, can be modeled by [6]: minFG(t)= 1 – ∏ 1 Fi(t)) (4) Now suppose there are parallel sub-activities, Ai, composing the global activity G again, but the completion of this global activity G requires all of the sub-activities to complete. This can be accomplished by calculating the delay of the activity with the maximum time delay by [6]: maxFG(t)= ∏ Fi(t) (5) Note: For the activity distributions in (4) and (5), the Fi(t) need not to be the same for different i. 3 Modeling Distributed Activities Based on [2], we introduced in [3] a Partially Ordered Specification (POS), which allows modeling of a UML activity with a partial ordered set of input and output events, as shown in Figure 2. This modeling paradigm shows the dependencies be- tween various events and is used in analyzing performance of activities with various operators. For a given activity, we suggested the input and the output of each in- R1 R2 initiating input volved role to be modeled by a starting event i1 i2 event and an ending event [3]. These events, belonging to various components, form direct indirect a partially ordered set, where a causal dependency dependency relationship may exists between some o1 o2 of these events, shown by arcs “” in terminating events the figure. The ending Figure 2 – Activity represent as a POS events are not ordered relative to one another, but each ending event has a depend- ency on its corresponding starting event (local sequencing). An initiating event, be- longing to an initiating role (represented by dark circles “●”), is a specialized starting event in this partially ordered set of events, for which there are no other events in that set which precedes that event. Similar holds for the terminating events corresponding to terminating roles. Figure 2 illustrates an activity, with input events i1 and i2 and output events o1 and o2. As i1 and o1 are input and output events of the same role R1, o1 must occur after i1 due to local sequencing. Furthermore, since i1 is the only initiating event, all events in the activity, including, i2, must occur after i1. Due to the relationship i1  i2 and i2  o2, there is an indirect dependency from i1 to o2, shown by the dashed arrow “-->.” Terminating events o1 and o2 are not ordered and may occur in parallel. 3.1 General Formulas for Standard Sequencing Operators with Fixed Delays We introduced Nominal Execution Time Delay (NETD) written as Δixom, between input ix and output om, where NETD is the the delay between the time instance of the occurrence of input event ix and the occurrence of a dependent output event om, pro- vided all the other events on which om depends have occurred long time ago [3]. Proceedings of NiM-ALP 2013 27 Based on NETD, we derived the following general performance formula which can then be applied to sequencing operators of sub-activities to yield the performance metrics of a global activity: tom = max x (tix + Δixom) (6) where tom is the time of the output event om, tix is the time of the input event ix on which om depends, and Δixom is the NETD from input event ix to the output event om. We assumed shared resources are not involved in these activities. Furthermore, we assumed that there is a dependency, and hence a delay, from each input to each output of an activity. An activity may be comprised of sub-activities sequenced with strong or weak se- quencing, parallel operators, alternatives and interruptions. In this paper, we limit our scope to strong and weak sequencing. Strong sequencing, sometimes called global sequencing, between two activities A1 and A2 means that all sub-activities of A1 must be completed before any sub-activity of A2 may start. In contrast, weak sequencing between A1 and A2 (only) means that each system component locally applies sequencing to the local sub-activities of A1 and A2, that is, a component may start with sub-activities that belong to A2 as soon as it has completed all its local sub-activities that are part of A1. Strong sequencing im- plies weak sequencing, but not inversely. In particular, if a component is not involved in A1, it may start with sub-activities of A2 even before A1 begins its execution. A strong sequence is modeled in Figure 3.0 between two sub-activities, A and B, and as discussed, all the initiating events in activity B may occur, only if all the ter- minating events of the previous activity, activity A, have occurred. This is repre- sented by a Final Action event - when all the terminating events have occurred, de- noted by AOF (Final Output of sub-activity A). The time of all the initiating events for the next activity, activity B, is the time of the Final Action event of the previous activity, activity A. Strong Sequence POS Equivalent AI1 AIk A A ΔAIxAOy ΔAIxAOF AO1 AOk' s AOF BI1 BIh B B ΔBIsBOm BO1 BOh' Fig. 2. Strong Sequencing We also proposed and proved for two sub-activities, sub-activity A with k inputs and k’ outputs and sub-activity B with h inputs and h’ outputs, that are strongly se- quenced, with known NETD for each sub-activity (ΔAIxAOy and ΔBIsBOm), that the NETD for the composite activity (ΔAIxBOm) is given by the formula: Proceedings of NiM-ALP 2013 28 ΔAIxBOm = maxy=1..k’(ΔAIxAOy) + maxs=1..h(ΔBIsBOm) (7) Weak Sequence POS Equivalent AIc AIc’ AIa AIa’ A A FAIxAOy AOc AIc’ AOa AOa’ w FAIxBOm BIc BIc’ BIb BIb’ B B FBIgBOm BOc BOc' BOb BOb’ Fig. 3. Weak Sequencing Similarly for two activities A and B weakly sequenced which have roles c..c’ com- mon to both A and B, it was shown that the NETD for the composition is: ΔAIxBOm = maxs=c..c’ (ΔAIxAOs + ΔBisBOm ) (8) 4 General Formulas for Standard Sequencing Operators with Delay Distributions So far we somewhat summarized the modeling and performance analysis of com- posite activities as described in [3]. The remainder of this paper is new material. In the following, we analyze the performance of activities composed of strong and weak sequences where the NETDs of activities are characterized by time distributions. 4.1 Strong Sequence Proposition: If two sub-activities, sub-activity A with k inputs and k’ outputs and sub-activity B with h inputs and h’ outputs, are strongly sequenced and the NETD for each sub- activity, ΔAixAOy and ΔBIsBOm, are known and characterized by cumulative distributive functions (CDF) FAixAOy(t) and FBIsBOm(t), respectively, then the CDF, FAIxBOm(t), of the NETD for the composite activity is: FAIxBOm(t) = ∏ ´ FAixAOy(t) ⊗ ∏ FBIsBOm(t) (9) Proof: Figure 2 shows strong sequencing between two activities A and B and its resultant Proceedings of NiM-ALP 2013 29 POS. We assume FAixAOy(t) and FBIsBOm(t) are known, either given or measured using the testing methodology described in [3]. The delay from the Final Action event (AOF) relative to the input event x of A, is the maximum of all the paths’ delays from event x to event AOF: ΔAIxAOF = maxy=1..k’(ΔAIxAOy) (10) Since the maximum CDF is required of all the delays involved, we use (5) and obtain: FAIxAOF(t) = ∏ FAixAOy(t) (11) The NETD for activity B for an input s to an output m is ΔBIsBOm. To calculate the maximum delay to produce the output m, the maximum is taken over all the given inputs of activity B: ΔBIjBOm = maxs=1..h(ΔBIsBOm) (12) Using (5), this formulas leads to: FBIjBOm(t) = ∏ FBIsBOm(t) (13) The delays for activity A and B are represented by (11) and (13), respectively. Since activity A and activity B are in sequence, using (1), the total time delay FAIxBOM is the convolution of FAIxAOF(t) and FBIjBOm(t): FAIxBOM(t)= FAIxAOF(t) ⊗ FBIjBOm(t) (14) =∏ FAixAOy(t) ⊗ ∏ FBIsBOm(t) (15) 4.2 Weak Sequence Weak sequencing between two activities A and B is illustrated in Figure 3. It is as- sumed that roles c…c’ are participating in both activities A and B, while roles a…a’ participate only in activity A and not in B and roles b…b’ participate only in activity B and not in A. To calculate the NETD between any two events, we assume all the remaining in- put events have occurred long time ago and any executions precipitated by these input events would have also completed long time ago as discussed in the testing methodol- ogy of [3]. Roles a…a’ are involved only in activity A and no dependency exist from activity B to the output of roles a…a’. Hence, the NETD for roles a…a’ is that of activity A, FAIxAOy. Similar is true for activity B. We are interested in the NETD between the output events of activity B relative to the input events of activity A, given the NETDs are delay distributions. Proposition: If two activities, A and B, are weakly sequenced then the NETD between the out- put events of activity B relative to the input events of activity A is: FAIxBOm(t) = ∏ ´ FAIxAOs(t) ⊗ FBisBOm(t)) (16) Proof: The NETD of activity A and B is ΔAIxAOy and ΔBigBOm, respectively, which have the CDFs FAIxAOy(t) and FBIgBOm(t), respectively. Proceedings of NiM-ALP 2013 30 From the right side of Figure 3, it’s evident that the NETD of the composed activ- ity is the maximum of the NETD of both activities A and B in series with the common role s, i.e. s being the ending role of activity A and also the starting role of activity B. Activity A, with input event x in series with activity B with output event m, is represented by ΔAIxBOm with the CDF FAIxBOm(t). As two activities are in series, equa- tion (1) can be used to calculate the NETD for a single s: FsingleAIxBOm(t) = FAIxAOs(t) ⊗ FBisBOm(t) (17) And to take the maximum over the inputs s=c…c’, we obtain: FAIxBOm(t) = ∏ ´ FAIxAOs(t) ⊗ FBisBOm(t) ) (18) 5 Conclusion In this paper, we have considered the delay distributions of composite activities sequenced with strong and weak sequencing. Similar to analysis done in [4] for fixed delays, we plan to consider other composition operators such as parallel, alternatives, loops, etc for delay distributions. We have implemented a tool that takes as input an Activity Diagram including sub-activities with defined performance characteristics and provides as output the NETDs of the global collaboration for fixed delays. We plan on extending this tool to support delay distributions. We believe that this approach to performance modeling of distributed systems is useful in many fields of application, including distributed work flow management systems, service composition for communication services, e-commerce applications, and Web Services. References 1. Bochmann, G.V. Deriving component designs from global requirements, in: Proceedings on International Workshop on Model Based Architecting and Construction of Embedded Systems (ACES), Toulouse, 2008, pp 55-69 2. Castejón, H.N, Bræk, R., and Bochmann, G. v., “Realizability of Activity-based Service Specifications”. Journal of Software and Systems Modeling(to be published), 2011 3. Israr, Bochmann, Performance Modeling of Distributed Activity Services, Proceedings of the 2nd ACM/SPEC International Conference on Performance Engineering, Karlsruhe, Germany, 2011, pp 475-480 4. Israr, Bochmann, Performance Modeling of Distributed Collaboration Services with Inde- pendent Inputs/Outputs, Proceedings of 5th International Workshop on Non-functional Properties in Modeling: Analysis, Languages, Processes, Miami, USA, 2013 5. OMG, Unified Modeling Language (UML), Version 2.1.1, February 2007 6. R.A. Sahner and K.S. Trivedi, Performance and reliability analysis using directed acyclic graphs. IEEE Trans. Software Eng., (1987), pp. 1105–1114. Proceedings of NiM-ALP 2013 31