Estimating costs of a service Christian Gierds and Jan Sürmeli Institut für Informatik Humboldt-Universität zu Berlin Unter den Linden 6, 10099 Berlin, Germany {gierds|suermeli}@informatik.hu-berlin.de Abstract. When designing a publicly available Web service, a service designer has to take care of costs and revenue caused by this services. In the very beginning possible partners might only be vaguely known, or the service behavior contains arbitrary repetitions. Then the estimation of costs for running this service is difficult and decisions based on them can hardly be made. We propose a static analysis of the service’s behavior. We over-approximate possible runs and therefore costs of the service. Our approach provides a basis for reasoning about nonfunctional properties as shown for costs. Key words: Nonfunctional properties, state equation, open nets, service behavior 1 Motivation and approach A service provider in a service-oriented architecture (SOA) [1] has only limited knowl- edge about the future interaction partners of its published service. Then determining nonfunctional properties such as estimating the costs of a service run (or any other performance measure) in the interaction with a service requester is complex, because they vary depending on the requester’s behavior. A service broker may assure certain functional properties such as proper termination, thus giving starting points for narrowing down the possible interactions. For approximation of a service’s costs we propose a static analysis approach on a constrained set of interaction partners. For two services A and B, question is: How much does it cost the provider of A to interact with B? As a prerequisite, we assume that each action a of A has a distinct cost. However prior to execution, it is impossible to give one discrete overall cost: Both A and B may contain non-determinism resulting in different runs of their interaction. So we propose an interval for the overall cost instead. Naively, we can explore the whole state space to determine the minimal and maximal overall costs, taking only into account the maximal runs, that is those that are either infinite or end in a deadlock state. This however can be very complex and may be impossible, if the state space is infinite or too big to fit into memory. We thus propose a simplification of the previous question: If both A and B termi- nate, how much did the interaction cost the provider of A? If we further assume that termination always means reaching a final state ω of A, the question boils down to: How much does it cost the provider of A to reach ω from its initial state α when A and B are coupled? We identify three core challenges: (i) As for the overall cost, the cost of reaching ω is not one distinct number but a cost interval C = (cmin , cmax ), because there typically exists not only one, but a set R of runs from α to ω, which can even be infinite. (ii) We find the nature of R not only being dependent on the behavior of A, but also on the behavior of B. (iii) ω is not necessarily reachable in the interaction of A and B, resulting in an undefined cost interval. In this paper, we will tackle those three challenges with classical Petri net analysis. We translate the model of the composition A ⊕ B into a system of linear equations – the state equation [2] – and use linear optimization to minimize and maximize costs. The resulting bounds X = (xmin , xmax ) will over-approximate C. Although we lose some precision with this technique, we expect the proposed approach to make cost estimation before execution feasible in practice. If B is not given as a model, but only as a set Ψ of constraints, we build the state equation of A alone and augment it by the constraints in Ψ . Again, we use linear optimization which yields a valid over-approximation X 0 = (x0min , x0max ) of C. We observe that X 0 is not as tight as X: X 0 is valid for a whole set of partner services BΨ (induced by Ψ ) in contrast to only one explicitly given partner service B. However, computation of X 0 can be done in a less time critical phase, whereas X is computed after B has been selected and also requires a complete behavioral model of B. Furthermore, we attend the special case that Ψ = ∅, resulting in the complete set of partners of A, similarly to the work in [3]. The third challenge is harder to come by. In this paper we aim at ensuring, that if A reaches ω, then the found cost interval applies. The found cost intervals for each final marking can be composed for a global overview over costs. To avoid complexity from this source, we draft a method to directly find minimal and maximal costs for given sets of final markings. The rest of the paper is structured as follows: In Sect. 2 we introduce the necessary formal notions. Section 3 describes the approach itself together with the above announced specializations and generalizations. Finally we will conclude and will give an outlook for application. 2 Notation Let f : A → B be a function, then for A0 ⊆ A, f (A0 ) is defined as {f (a) | a ∈ A0 }. For the rest of this paper, let us assume a set C of message channels and message exchange to be asynchronous, messages may even overtake each other. Sending a message over channel a is denoted by !a and receiving a message over channel a is denoted by ?a. Then E = {!c | c ∈ C} ∪ {?c | c ∈ C} denotes the set of all sending and receiving events, respectively. Furthermore, each service uses a channel in at most one direction: sending or receiving (viz. a service cannot unsent a message). Now an open net N = (P, T, F, α, Ω, ev ) is a classical P/T-net [4] (P, T, F, α). ev : T → (E ∪ {τ }) is a labeling function indicating for each transition t if it is either communicating (ev (t) ∈ E with ev (t) =?c ⇒!c ∈ / ev (T )) or internal to the net (ev (t) = τ ). Additionally we define a set of final markings Ω, which indicate proper termination. The interface E(N ) = ev (T ) \ {τ } of an open net N comprises all labels used by the transitions of N . Examples for open nets are depicted in Figs. 1 and 2. As usual, places are depicted by circles, transitions by rectangles and the flow relation by arcs. Tokens are black dots, labels are written inside the transitions, omitting τ . initial ?login t1 t2 ?add !invoice p3 t6 p5 p2 p1 ?checkout t8 p4 t7 p6 t5 done t3 !added !shippingData t4 ?abort aborted Fig. 1. Open net shop with Ω = {[aborted], [done]} !login !abort initial’ !checkout done' !add ?added ?invoice shippingData Fig. 2. Open net customer with Ω = {[done0 ]} The open net shop models an online shop. After logging in, a customer can decide to add products by sending messages via channel add which are confirmed via added. The customer can either decide to send an abort message or to checkout. After receiving a message via checkout, the shop sends information about shipping and an invoice. The two final markings [done] and [aborted] specify the two expected results of the interaction. The open net customer is a partner of shop: Their interfaces are compatible. t We assume the standard firing semantics for transitions, thus µ → − µ0 means a step 0 from µ to µ by firing t. A transition sequence t0 t1 t2 . . . is called firing sequence of N if there exists a sequence of markings µ0 µ1 µ2 . . . , such that µ0 = α and for each ti i = 0, 1, 2 . . . , µi −→ µi+1 is a step of N . Note that for each firing sequence, there exists exactly one corresponding sequence of markings. We thus say that a finite firing sequence t0 t1 t2 . . . tn ends in µn+1 . We call a finite firing sequence terminating, if it ends in a final marking. The Parikh vector [5] of a transition sequence r is denoted as occ(r). r|T 0 denotes the restriction of r to elements of T 0 , which we canonically extend to Parikh vectors. The behavior of an open net is the set of all its firing sequences, denoted as Beh(N ). The set of all firing sequences ending in a marking µ is denoted as Beh(N, µ). ?a !b a b ?b !a ?b Fig. 3. Two open nets and their composition The composition of two partners N and N 0 to the open net N ⊕ N 0 is realized by introducing buffer places and corresponding arcs as depicted in Fig. 3. Inspecting the structure of N ⊕ N 0 , we find that a firing sequence of N ⊕ N 0 corresponds to one firing sequence of N as well as one of N 0 , allowing us to analyze N in isolation and draw conclusions for the behavior of N in N ⊕ N 0 . Thus, let N, N 0 be partners and µ, µ0 be markings of N, N 0 respectively. Then, r ∈ Beh(N ⊕ N 0 , µ + µ0 ) implies r|T ∈ Beh(N, µ) and r|T 0 ∈ Beh(N 0 , µ0 ). We now introduce costs for actions of services into our formal model. Each transition t has a globally fixed integer cost, denoted as cost(t). The costPof a transition sequence only depends on its Parikh vector: cost(r) = cost(occ(r)) = t∈T occ(r)(t) · cost(t). Let µ be a marking of an open net N . If Beh(N, µ) 6= ∅, then the minimal and maximal cost of µ in N are defined as min({cost(r) | r ∈ Beh(N, µ)}) and max({cost(r) | r ∈ Beh(N, µ)}), denoted as ⊥(N, µ) and >(N, µ) respectively. Otherwise, the minimal and maximal cost are undefined. Let T 0 ⊆ T , then ⊥(N, µ)|T 0 and >(N, µ)|T 0 denote the minimal and maximal cost for µ by only taking into account transitions in T 0 . As an example, consider that the transitions t1 , . . . , t8 of the open net shop in Fig. 1 have costs of 10, 5, 1, 10, 5, 20, 20, 1 respectively. Then, the firing sequence t1 t2 t3 t2 t3 t4 has a cost of 32. For marking [p3 ] the cost interval is (15, ∞). 3 Cost estimation The state equation [2] is a proven tool in Petri net analysis: Let as usual • x (x• ) denote the preset (postset) of a node x ∈ P ∪ T . The P × T -matrix IN , the incidence matrix of N , is defined as follows: In (p, t) = −1 if p ∈ • t \ t• , In (p, t) = 1 if p ∈ t• \ • t and In (p, t) = 0, otherwise. Let y be a vector with y(p) = µ(p) − α(p), p ∈ P . Then, X(N, µ) denotes all non-negative integer solutions of the state equation IN · x = y of N with respect to marking µ. The state equation for the open net shop in Fig. 1 is displayed in Table 1. Table 1. State equation of the open net shop (Fig 1) w.r.t. marking [done]. t1 t2 t3 t4 t5 t6 t7 t8 y initial -1 -1 p1 +1 -1 -1 p2 +1 -1 p3 +1 -1 p4 +1 -1 p5 +1 -1 p5 +1 -1 done +1 +1 abort +1 The Parikh vector of any firing sequence ending in µ is a solution of the state equation. As an example, consider open net shop in Fig. 1: t1 t2 t3 t2 t3 t5 t6 t7 t8 is a firing sequence ending in the marking [done]. Its Parik vector is 1 2 2 0 1 1 1 1 , which is a solution of the state equation. The reverse does not hold for the general case: By solving the state equation tokens can be "borrowed" from and "given back" to places, leading to an effect of 0, although no firing sequence with this solution as a Parikh vector exists. Obviously, X(N, µ) is a valid over-approximation of occ(Beh(N, µ)). Because the cost of a firing sequence is only dependent on its Parikh vector, the state equation is a valid mechanism to estimate costs for a given marking. In this section, we present two approaches: First, we show how for two given services the costs for one provider can be estimated by solving the state equation of the composition of their models. Then, we explain how this approach can be generalized to the setting where the second partner is only represented by a set of constraints. For the second approach, we can shift the analysis effort to a time-uncritical phase. Ensuing, we have a critical look at both approaches, collecting results for the open net shop in Fig. 1. Finally, we draft how our general approach can be extended from estimating costs for a given state to a given (even infinite) set of states. 3.1 Estimating costs for one provider in a specific composite In the following let N (with its transition set T ) and N 0 be partners and ω, ω 0 be final markings of N and N 0 , respectively. From the property of the state equation follows: Lemma 1. occ(Beh(N ⊕ N 0 , ω + ω 0 ))|T ⊆ X(N ⊕ N 0 , µ)|T . Directly from set algebra follows that the state equation of N ⊕ N 0 provides a valid over-approximation for the costs of the provider of N in N ⊕ N 0 : Theorem 1. Beh(N ⊕ N 0 , ω + ω 0 ) 6= ∅ implies min(cost(X(N ⊕ N 0 , ω + ω 0 )|T )) ≤ ⊥(N ⊕ N 0 , ω + ω 0 )|T ≤ >(N ⊕ N 0 , ω + ω 0 )|T ≤ max(cost(X(N ⊕ N 0 , ω + ω 0 )|T )). Thus, the costs for the provider of N can be estimated by solving the linear problem for each final marking of Ω × Ω 0 . From our experience, the number of final markings is very small, if not 1. As an example, evaluating the state equation for the composite shop⊕customer, we find two cost intervals: (56, 74) for the final marking [done+done0 ] and (20, 38) for [aborted + done0 ], which we can compose to an overall cost estimation of (20, 74). We observe, that this is the actual cost interval. Obviously, the approach is symmetrical: The costs for the provider of N 0 can be estimated analogously. We will now generalize our approach from a completely known open net N 0 to a set of constraints, describing a set of services. 3.2 Pre-estimating costs for one provider Assume now that not an open net N 0 is given but a set of constraints of the following syntax and semantics: A constraint is an inequality or equation l∗r where ∗ ∈ {≤, ≥, =}, consisting of an integer linear combination l of event labels and an integer value r. An open net N 0 solves l ∗ r if for any terminating firing sequence in N 0 , its event occurrence vector solves l ∗ r. N 0 solves the set Ψ of constraints, if N 0 solves each ψ ∈ Ψ . We denote the set of open nets solving Ψ with BΨ . As an example, the open net shop in Fig. 1 solves the constraints ?login = 1 and ?add−!added = 0. [3] shows how such a set of constraints can be obtained from an open net by static analysis. Intuitively, we take the state equation of N and augment it correspondingly with the given set of constraints Ψ : Because every message has been consumed in a final marking of the composite, we find that for each sent message of a partner of N , N has received the message and vice versa. Thus, for each message x sent (received) by N 0 , a transition receiving P (sending) x of N has fired once. This induces a set of constraints Ψb(N ) = { t∈T l(ev (t)) · t ∗ r | l ∗ r ∈ Ψ } on the transition occurrence of N . As an example, consider the open net shop in Fig. 1 and assume Ψ = {!add ≤ 10} as well as N 0 ∈ BΨ . Then, in any terminating firing sequence of N ⊕ N 0 the only transition with label ?add, namely t2 fires only up to 10 times, thus Ψb(N ) = {t2 ≤ 10}. Therefore, the state equation of N with respect to ω together with Ψb(N ), its solutions denoted as XΨ (N, ω), over-approximates all firing sequences ending in ω + ω of N ⊕ N 0 if N 0 ∈ BΨ . Lemma 2. N 0 ∈ BΨ implies Beh(N ⊕ N 0 , ω + ω 0 )|T ⊆ XΨ (N, ω). Analogously to our approach with full knowledge of N 0 , we can conclude that for a given Ψ , the costs for the provider of N for N interacting with an N 0 ∈ BΨ can be over-approximated by using the state equation: Theorem 2. N 0 ∈ BΨ and Beh(N ⊕N 0 , ω +ω 0 ) 6= ∅ implies min(cost(XΨ (N, ω))) ≤ ⊥(N ⊕ N 0 , ω)|T ≤ >(N ⊕ N 0 , ω)|T ≤ max(cost(XΨ (N, ω))). Thus, solving the linear program for each ω ∈ Ω yields a valid cost estimation. Expecting |Ω| to be small, this approach seems feasible for use in practice. Furthermore, we draft in Sect. 3.4 how a valid cost estimation can directly be computed for a set of final markings. 3.3 A critical look We have collected a small number of results in Table 2. The table shows for the open net shop in Fig. 1 how the actual cost intervals (cmin , cmax ) and the estimated cost intervals (xmin , xmax ) for a selected final marking ω correspond for partners specified by Ψ . The results were gained manually; there does not exist an implementation yet. In the example, the estimated cost interval is identical with the actual cost interval for any Ψ . We also see that open net customer in Fig. 2 solves {1 ≤ !add ≤ 10}. Composing the second and fifth row in the table, we find an estimated cost interval of (20, 116), which is not as tight as the former result (20, 74), but still valid. In the general case however, results are not necessarily this precise. A factor are t-invariants, transition  vectors x, such that IN · x = 0. For the open net shop in Fig. 1, 0 1 1 0 0 0 0 0 is a t-invariant. Obviously, if y is a solution of state equation S and x is a t-invariant, then for any n ∈ N holds that y +n·x is also a solution of S. One can create an example with a t-invariant that never fires, intuitively an unmarked loop structure. Thus, if a t-invariant x exists and cost(x) > 0 (cost(x) < 0), cost estimation yields unbounded for the upper (lower) bound, unless  there is an additional constraint bounding it. For the t-invariant x = 0 1 1 0 0 0 0 0 , cost(x) is greater than zero. Constraining it through the constraint set {1 ≤ !add ≤ 10} bounds the cost interval because x can not be added arbitrarily often due to bounding the occurrence t2 . It is up to future work to find a way to handle t-invariants and to use them as a starting point for narrowing down interesting Ψ . Table 2. Minimal/maximal costs for the provider of open net shop (Fig. 1) Ψ ω cmin cmax xmin xmax {} [done] 56 ∞ 56 unbounded {0 ≤ !add ≤ 10} [done] 56 116 56 116 {!abort = 1} [done] undefined undefined infeasible infeasible {} [aborted] 20 ∞ 20 unbounded {0 ≤ !add ≤ 10} [aborted] 20 80 20 80 {!abort = 1} [aborted] 20 ∞ 20 unbounded 3.4 Costs for sets of markings Solving one linear program for each ω ∈ Ω is not feasible for big or even infinite Ω. For the case that we can express or approximate the given set of markings as a set of linear constraints, we build a system of linear inequalities similar to the state equation and use the same techniques as described above. 4 Conclusion In this paper, we presented an approach to estimate the costs for the provider of a service A for interacting with a service B. Thereby, we concentrated on two scenarios: Either the partner B is known in detail, then the estimation process boils down to static analysis of the composed system A ⊕ B. Or, B is not known and only narrowed down by a set of constraints Ψ . In this case, we analyze the behavior of A in isolation, constraining it according to Ψ . The downside of static analysis is a loss of precision, for which we identify t-invariants as an important factor. Still, we find that our approach tackles the first and the second challenge introduced in Sect. 1. For the case of a not reachable final marking, our approach might yield a cost interval although it is not defined. We consider this a low price to pay in contrast to the state space explosion problem. 5 Related and future work In [6, 7] the authors analyze BPMN [8] models. For minimal and maximal costs [6] uses the Dijkstra algorithm with complexity O(n log n). Our approach is based on the simplex algorithm which is known to normally have linear runtime, although worst-case complexity is exponential. In the case of acyclic services, the state equation approach is even sufficient, so we also find strict bounds. The approach in [7] is pattern-based, an approach not applicable to arbitrary graph-based formalisms such as BPMN or open nets. In this paper, we only considered a single fixed cost for each atomic action of a service. As a start it is intuitive that a certain step has always the same costs. A natural extension would be allow intervals [6] or even stochastic values [9]. However, as long as the cost functions are still linear, the extension is canonical. Additionally, costs might be history-dependent. An action might have some fixed and some execution costs, such that a consecutive execution of this action might be less expensive. Furthermore it would be interesting to use the approach as a decision help for service discovery. As a base we can imagine a cost profile stored in a service repository, including cost estimations and acceptable cost intervals, inducing a concept of compatibility under costs. References 1. Papazoglou, M.P.: Web Services: Principles and Technology. Pearson - Prentice Hall, Essex (July 2007) 2. Lautenbach, K.: Liveness in Petri Nets. St. Augustin: Gesellschaft für Mathematik und Datenverarbeitung Bonn, Interner Bericht ISF-75-02.1 (1975) 3. Sürmeli, J.: Profiling services with static analysis. In Freytag, T., Eckleder, A., eds.: AWPN. (2009) 4. Reisig, W.: Petri Nets: An Introduction. Volume 4 of Monographs in Theoretical Computer Science. An EATCS Series. Springer (1985) 5. Parikh, R.: On context-free languages. J. ACM 13(4) (1966) 570–581 6. Magnani, M., Montesi, D.: BPMN: How much does it cost? An incremental approach. In: BPM. (2007) 80–87 7. Sampath, P., Wirsing, M.: Computing the cost of business processes. In: UNISCON. (2009) 178–183 8. OMG: Business Process Model and Notation 2.0. (August 2009) 9. van Hee, K.M., Verbeek, H.M.W., Stahl, C., Sidorova, N.: A framework for linking and pricing no-cure-no-pay services. T. Petri Nets and Other Models of Concurrency 2 (2009) 192–207