Safira: Implementing the Set Algebra for Service Behavior Kathrin Kaschner Universität Rostock, Institut für Informatik, 18051 Rostock, Germany safira@service-technology.org Abstract. An important property of interacting services is their observ- able behavior. There exist different formalisms to describe service behavior (e.g. BPEL, BPMN, Petri nets, automata). Based on an extension of automata, in previous work we proposed a compact representation to characterize the behavior of sets of services and introduced a set algebra on it. In this paper, we present with Safira a tool which implements the fundamental set operations for such set of services. 1 Introduction The observable behavior is an important aspect of interacting services. For studying correct interaction, the concept of operating guidelines [1] was introduced. An operating guideline is an annotated automaton which represents the set of all correctly interacting partners of a given service. In [2] the concept of annotated automata has been extended such that fundamental set operations can be realized for sets of services. These operations have a number of useful applications, including reasoning about substitutability, behavioral constraints, and organizing a service registry [2]. This paper is devoted to our tool Safira which implements the fundamental set operations complement, intersection and union for sets of services. Section 2 introduces the basic formalisms. In Sect. 3 we define the algorithms of these set operations and highlight interesting issues of their implementation in Safira. Section 4 gives an overview of how to obtain and use Safira. Section 5 presents experimental results. Finally, we conclude the paper and give directions to future work in Sect. 6. 2 Background To model the behavior of a single service, we use service automata [1]. A service automaton is a finite state automaton whose edges are labeled with asynchronous message events. As an example, Fig. 1(a) and Fig. 1(b) show two service automata. In the graphical representation, sending events are preceded by “!” and receiving events are preceded by “?”. Multiple labels on one edge are a short hand notation for multiple transitions – each one being labeled with one of the events. Initial states have an incoming arc from nowhere. Final states are double-lined. The service automaton V models a simple vending machine expecting a customer to insert coins (?c) and to press a button for iced tea (?t), or orange juice (?o). If enough money has been inserted, the vending machine returns the beverage t0 ϕ(q0) = !t ∨ !o ∨ !c v0 q0 !c ?* !c ϕ(q1) = !c ?c !t, !o t1 ϕ(q2) = !t ∨ !o v1 q1 q ?t ?o !t !c ?* 2 ϕ(q3) = ?b ∧ ?r ?* !t, !o ϕ(q4) = true v2 v3 t2 !r !b q4 q3 ϕ(q5) = final !b !r ?b, ?r * ?b, ?r v4 v5 t3 * q5 χ = true (a) V (b) T (c) Q Fig. 1: The service automata in (a) and (b) model the behavior of a vending machine and a single customer. The extended annotated service automaton in (c) represents a set of customers for the vending machine in (a). (!b). If not enough money has been inserted, the customer does not receive the beverage. Instead, the money is returned (!r). The service automaton T represents the behavior of a customer who orders iced tea. Note that sending events of T are receiving events of V and vice versa. For the finite representation of a (possibly infinite) set of service automata, we use extended annotated automata [2]. They are also finite automata whose edges are labeled with asynchronous message events. To be able to characterize a set of services, we additionally annotate each state with a local Boolean formula φ and add a global Boolean formula χ to the automaton. In comparison to service automata, an extended annotated automaton does not have final states. Instead, the proposition final in the local formulas may define final states of the represented service automata. Each service automaton S belonging to the set of service automata character- ized by the extended annotated automaton A fulfills the following requirements. First, S is a subautomaton of A (including an initial state). Further, each state q of S satisfies the Boolean formula φ of its corresponding state in A and the global formula χ is evaluated to true. Thereby, φ determines which outgoing edges must be present in state q and χ defines which combinations of states are allowed in S. The interested reader is referred to [2]. Figure 1(c) depicts an extended annotated service automaton Q which char- acterizes the set of all customers of the vending machine V . An edge labeled with “*” means that there is a transition for each label of Q. “?*” is a placeholder for all receiving events of Q. The customer T in Fig. 1(b) is a subautomaton of Q. The corresponding states are (t0 , q0 ), (t1 , q2 ), (t2 , q3 ) and (t3 , q4 ). Furthermore, all states of T satisfy the local formulas of the corresponding states in A. For example, t0 evaluates the formula !c∨?t∨?o of q0 to true, because there is a leaving edge labeled with !c. Since the global formula χ is equal to true, there are no additional constraints to the states of T . Thus, T is characterized by Q. As all possible customers of V are represented, Q is also called an operating guideline [1] of V . 3 Set Operations on extended annotated automata In this section, we introduce the algorithms for the fundamental set operations. This means, given two extended annotated service automata representing sets M1 and M2 of services, we show how to compute an extended annotated service automaton that represents the complement M1 , the intersection M1 ∩ M2 and the union M1 ∪ M2 . Since the result of each operation is again an extended annotated automaton, arbitrary nested structures are possible. Due to the page limit we only sketch the algorithms (see [2] for a detailed description), but focus on the interesting details of their actual implementation in our tool Safira. At the end of each subsection, we also discuss the complexity of the operation. 3.1 Complement Theory. The most challenging of the three operations is the complement. To compute the complement, we first normalize the extended annotated automaton by applying two transformations. Both transformations modify the shape of the extended annotated automaton, but do not change the represented set of service automata. Having a normalized extended annotated automaton the actual complement operation turns out to be very simple. The first transformation totalizes the extended annotated automaton A. That means, in each state of the totalized A there exists at least one outgoing edge for each label of A. To compute a total extended annotated automaton without changing its semantics, we insert missing edges with label x but explicitly forbid their usage by adding a conjunction with ¬x to the corresponding local Boolean formula. Each inserted edge is connected to a trap state t that contains a self-loop for every label of A. Figure 2(a) illustrates this procedure for state q1 of the extended annotated automaton Q in Fig. 1(c). The set of the labels is {!t, !o, !c, ?b, ?r}. State q1 has only three outgoing edges labeled with !c, ?b and ?r. Therefore, we insert two new edges labeled with !t and !o connecting q1 with the trap state qt and set the Boolean formula of state q1 to !c ∧ ¬!t ∧ ¬!o. In the second transformation, we complete the extended annotated automaton such that for each state q and all labels y the disjunction of the formulas of y all states q 0 with q − → q 0 is equivalent to true. In Fig. 2(b), we illustrate the completion of state q1 of the annotated automaton Q in Fig. 1(c). State q1 has only one !c-labeled edge leading to state q3 . Since the Boolean formula φ(q3 ) =?b∧?r is not equivalent to true we add an edge labeled with !c to a new state q30 . State q30 has for every label an outgoing edge to the trap state qt . The local formula of q30 is set to ¬(?b∧?r). Now, the disjunction of the local formulas of all ?c-successors is equivalent to true. The remaining successor states of state q1 are treated likewise. To avoid a change in the semantics of Q by inserting the new states q30 and q40 we explicitly forbid their usage by setting the global formula to χ = true ∧ ¬q30 ∧ ¬q40 . That means, as soon as a service automaton S covers one of these new states, the global formula is evaluated to false. Consequently, S is correctly classified as a non represented service automaton. After applying the two transformations to an extended annotated automaton A, every service automaton with the same interface as A is a subautomaton of A. Only the global formula χ decides wether or not the service is represented by A. !t, !o q1 qt ϕ(q3) = ?b ∧ ?r q1 qt ϕ(q1) = !c ∧ ¬!t ∧ ¬!o !c * ϕ(q4) = final ?* !c ϕ(qt) = true ?* !c ?* * * ϕ(q30) = ¬?b ∨ ¬?r q4 q3 * q4 q3 q40 q30 ϕ(q40) = ¬final (a) making total (b) making complete Fig. 2: In (a), the procedure of making the automaton total is shown for state q1 of Q (from Fig. 1(c)). (b) illustrates the procedure of completion for the same state. The dashed parts are those that are inserted in the course of a transformation. Consequently, the complement operation for a total and complete automaton turns out to be very simple: only the global formula χ has to be negated. Figure 3(b) depicts the extended annotated automaton Q that represents the complement set of service automata represented by Q in Fig. 1(c). Figure 3(a) shows a service automaton T 0 which is represented by Q. As both the button for iced tea and orange juice are pressed, it is not a customer of Q. Implementation. Overall, Safira follows the procedure described above. As both transformations can be executed independently for every node, we store the nodes in a list, which is traversed. The totalization of the automaton can easily be done by checking the outgoing edges for every node. In contrast, the completion is more complicated, as boolean formulas have to be evaluated. Therefore, we integrated the open-source SAT solver Minisat1 as a library into our tool Safira. To use Minisat, every question concerning Boolean formulas must be converted into a satisfaction problem. Thus, to proceed with the completion, we negate the disjunction f of the formulas of all y-successors for each node q and ask Minisat, wether the formula ¬f is satisfiable. If the answer is ‘yes’, then f is not equivalent to true and an additional y-edge for q is inserted. Although, the satisfaction problem is NP-complete, the experimental results in the next section show that the complement can be computed efficiently. As there usually are only a small number of labels and the length of the formulas is rather small, the satisfaction problems we formulate for the complement generation are not challenging Minisat. Our experimental results in Sect. 5 show that the complement of an extended annotated automaton can be computed. The applications mentioned in the first section of this paper require a further use of the resulting automaton. Thus, Safira also implements an optimization algorithm aiming at reducing the size of the result. The main idea of this optimization is to merge added states with the same Boolean formula. This can be done, because all successors of each added state lead to the same state – the trap state qt . To decide if a newly computed state can be merged with another state, we build a special decision tree during the completion operation. Each inner node of the tree contains an assignment over the labels of the automaton and has two outgoing edges labeled with ‘yes’ and ‘no’. Each leaf represents a state, which was already added to the automaton by the completion operation. Figure 4 depicts the decision tree which was built during 1 See the website of Minisat at http://minisat.se. !t,!o ϕ(q0) = !t∨!o∨!c q0 q8 q7 t0 !c ϕ(q1) = !c∧¬!t∧¬!o ?* !t,!o !c q9 ϕ(q2) = (!t∨!o)∧¬!c !c * * q1 q2 !c * ϕ(q3) = ?b∧?r∧¬!c∧¬!t∧¬!o t1 !c ?* !t,!o q10 * qt ϕ(q4) = true !t !c ?* !t,!o !t,!o,!c ϕ(q5) = final∧¬!c∧¬!t∧¬!o * ϕ(qt) = true * t2 q10 q4 q3 ?b,?r q11 ϕ(q7) = ¬!c∧¬!t∧¬!o !o !t,!o ϕ(q8) = ¬!c∨!t∨!o * ?b,?r t3 * * ϕ(q9) = (¬!t∧¬!o)∨!c qt q5 ϕ(q10) = ¬?b∨¬?r∨!c∨!t∧∨!o ?b, ?r !t,!o,!c ϕ(q11) = ¬final∨!c∨!t∨!o t4 * χ = qt ∨ q7 ∨ q8 ∨ q9 ∨ q10 ∨ q11 (a) T 0 (b) Q Fig. 3: The extended annotated service automaton Q in (b) is the Complement of Q in Fig. 1(c). For reducing the number of edge crossings, we depicted two copies of state q10 and the trap state qt . (a) depicts a service automaton T 0 , represented by Q. the complement generation for the vending machine Q. Each leaf represents one of the states q7 , ..., q11 which was inserted into the automaton by the completion operation (cf. Figure 3(b)). Consider a leaf l representing state p with its local Boolean formula g. The decision tree is constructed such that the assignment β of each inner node n along the path leading to l holds the following condition: β is a satisfying assignment of g if the outgoing edge of n leading to l is labeled with ‘yes’ and is a non-satisfying assignment, otherwise. As an example, see Figure 4. The assignments β0 and β1 both do not satisfy the Boolean formula of state q7 . At the beginning of the completion operation we start with an empty decision tree. When we insert the first new state q0 to the automaton, we also add q0 to the decision tree. Thus, at this moment the decision tree contains only a leaf which represents q0 . Then, we proceed as follows: Suppose, we want to connect an existing state of the automaton with a new state q by edge e. Let f be the local Boolean formula of q. To find out if there already exists a state which is annotated with the same Boolean formula, we traverse the decision tree. At each inner node n, we check if the assignment β of n satisfies f . If this is the case, we follow the outgoing edge labeled with ‘yes’. Otherwise, we follow the outgoing edge labeled with ‘no’. Arriving at a leaf representing a state p, we check if its formula h is equivalent to f . If the answer is ‘yes’, no new state has to be inserted. Instead, the new edge e is directed to the existing state p. Otherwise, there is an assignment α that satisfies f , but not h or vice versa. In the decision tree, at the place of the leaf representing q, we insert a new inner node, containing α. Its outgoing edges are then directed to two leafs representing q and p, respectively. In this manner, the decision tree is built successively during the completion operation. Note that for the totalization of the automaton, no additional nodes are necessary. Therefore the decision tree is not needed during this procedure. β0: ¬final, ¬?b, ¬?r, !c, ¬!o, ¬!t yes no β2: ¬final, ¬?b, ¬?r,¬!c, ¬!o, ¬!t β1: ¬final, ¬?b, ¬?r, ¬!c, ¬!o, !t yes no yes no β5: ¬final, ¬?b, ¬?r, ¬!c, !o, ¬!t q10: ¬?b∨¬?r∨!c∨!t∧∨!o q8: ¬!c∨!t∨!o q7: ¬!c∧¬!t∧¬!o yes no q11: ¬final∨!c∨!t∨!o q9: (¬!t∧¬!o)∨!c Fig. 4: The decision tree of Q. Complexity. Due to our optimization, for each state in the given automaton at most one new state (with the negated formula) is added. Thus, in the worst case the size complexity for completion is linear in the number of the states. The time complexity for the completion depends mainly of the shape of the decision tree. Assuming the decision tree is a balanced tree, the automaton is completed in O(n · log n · l), whereas n is the number of the states and l the number of labels. For the transformation to a total automaton, the worst case complexity for both space and time is O(n · l). 3.2 Intersection Theory. The idea of implementing the intersection of two extended annotated automata is to construct the product automaton known from classical automata theory. A product automaton implements the idea that both constituents run synchronized – in every step executing transitions with the same label. The states in the product automaton are annotated with the conjunction of the local Boolean formulas of the constituents [3]. The global formulas are connected by ∧. Implementation. The algorithm for the intersection is well known from classical automata theory. It was just adapted to the specific characteristics of extended annotated automata. Complexity. The size of the resulting automaton mainly depends on the degree of similarity of the given automata. The worst case complexity regarding both space and time of the intersection algorithm is in the product of the number of states of the two involved automata. 3.3 Union Theory. Given the two operations of intersection and complement from the previ- ous subsections, the implementation of union is trivial by using De Morgan’s rule: M ∪ N = M ∩ N. Implementation. We already proofed that the product of two total and com- plete automata is again total and complete [2]. Thus, we do not need to transform the product automaton of M ∩ N to generate the complement. Instead, we only have to negate the global formula of the automaton representing M ∩ N . Complexity. The complexity of the union results by the complexity of the underlying operations complement and intersection. 4 Obtaining and using Safira Safira is free software2 and can be downloaded at http://service-technology. org/safira. It is written in C++ and uses the GNU build system to compile the binary. Thus, it is available for a several platforms, including Linux, Microsoft Windows (using cygwin) and Mac OS X. Safira is a command-line tool implementing the following use cases. We assume that two extended annotated automata are given in files “coffeeVendor.og” and “juiceVendor.og”. – Complement. Call Safira with safira -opdf -t --complement coffeeVendor.og – Intersection. Call Safira with safira -opdf -t --intersection coffeeVendor.og juiceVendor.og – Union. Call Safira with safira -opdf -t --union coffeeVendor.og juiceVendor.og Option -t measures the time for computing the result and option -o triggers the output of Safira. In the examples above, a PDF file is generated which shows the graphical representation of the resulting extended annotated automaton. For a full description of the command line parameters, type safira --help. 5 Experimental results In this section, we study how the algorithms for computing the complement automaton scale in practice. We do not examine the intersection, because the algorithm is of no issue and the size of the result mainly depends on the degree of similarity of the given automaton. To generate the input automata for our experiments , we used the tool Wendy3 . Wendy computes the operating guideline of a given service automaton. The examples are industrial service models and have been extracted from real BPEL processes using the tool BPEL2oWFN4 (except the first one). We executed the examples on a computer with a 1.83 GHz Intel Core 2 Duo processor and 2 GB of memory. The results are listed in Tab. 1. As expected, the number of states of the complement automaton computed by the optimization algorithm A2 (usage of decision tree) is significantly smaller than the number of states of the automaton obtained by the basic algorithm A1 (no usage of decision tree). The memory usage shows a similar result. There is one case (service ”Car Analysis”) in which the results concerning the memory usage diverge. This can be explained by the structure of the resulting decision tree. Compared with the decision trees generated for the other automata, this tree is not well balanced. Moreover, A2 consumes more time than A1 . This can be explained by the additional data structure – the decision tree – which has to be built up and is traversed when a new state of the complement automaton is 2 GNU Affero Public License Version 3, http://gnu.org/licenses/agpl.html. 3 Available at http://service-technology.org/wendy. 4 Available at http://service-technology.org/bpel2owfn. Table 1: Experimental results showing the number of state of the complement automaton, the execution time and the memory usage for both the optimization algorithm A2 and the basic algorithm A1 . states of service automaton complement time [sec] memory [MB] A1 A2 A1 A2 A1 A2 Vending Machine 6 23 12 0.0 0.0 0.5 0.5 Purchase Order 169 887 338 0.1 0.1 1.3 0.8 Car Analysis 733 3,931 987 0.6 6.51 1.9 2.9 SMTP 3,307 19,995 3,460 2.9 29.4 10.9 10.6 Quotation1 7,937 61,571 14,594 14.8 208.0 24.7 15.5 Quotation2 11,265 88,323 22,539 19.2 286.1 132.8 36.1 calculated. The overall time, however, is not a crucial measure because in our application settings the complement automaton is usually calculated at a point in time in which the calculation time is not an issue. The number of states of the complement automaton is more critical. In most settings, the complement automaton is used as an input for complex, nested operations such as union and intersection, for instance. 6 Conclusion and Future Work In this paper, we presented the tool Safira implementing the set operations complement, intersection, and union. With the help of experimental results, we demonstrated that the algorithms used for the calculation of the complement scale well. The implementation of the set operations is an important step towards their applications: substitutability, behavioral constraints, and organizing a service registry. For their realization, the decision problem membership S ∈ M for a service S and a set M of services, and also the emptiness check M = ∅ still need to be implemented. For both, we already introduced the theory [2]. The complexity for the membership problem is linear in the size of S so that we do not expect problems during the implementation. In contrast, we could proof that the emptiness check is NP-complete. Consequently, we have to find heuristics to decide this problem efficiently in practice. References 1. Lohmann, N., Massuthe, P., Wolf, K.: Operating guidelines for finite-state services. In: ICATPN 2007. LNCS 4546, Springer (2007) 321–341 2. Kaschner, K., Wolf, K.: Set algebra for service behavior: Applications and construc- tions. In: BPM 2009. LNCS 5701, Springer (2009) 193–210 3. Lohmann, N., Massuthe, P., Wolf, K.: Behavioral constraints for services. In: BPM 2007. LNCS 4714, Springer (2007) 271–287