Process Mediation for Semantic Web Services Emilia Cimpian Semantic Technology Institute Innsbruck emilia.cimpian@sti2.at 1 Research Problem It is a common scenario for the business environment that one process needs to commu- nicate with another process, in order to fulfill its goal. For example, the simple action of paying a bill to a service provider can be seen as two processes that are communicating: one process is defined by the client, its own steps to be taken for paying the bill; the other one belongs to the service, the sequence of activities it performs in order to obtain the payment. If a bank is also involved (which is currently the case in most of this type of situations), we can even talk of three different processes performed by three different entities in order to obtain the final result: the bill had been paid. The problem addressed by this thesis is how two (or more) processes can success- fully interact in order to accomplish a common goal. The processes considered are se- mantically defined and any inputs and outputs of a process needs to also be represented using an ontology. This thesis addresses the problem of solving heterogeneity mismatches between previously defined processes. The assumption made is that the processes should not ad- just in order to match the processes they want to interact with, from various reasons. Either they are involved in more than one interaction, and this adjustment will damage those, or the business partner owning the processes simply does not want to change any- thing. In this case, the communication can be hampered even if all the data is available [2]. This thesis makes the distinction between two different heterogeneity problems: process model heterogeneity and communication heterogeneity. In the first case, the processes are incompatible, that is no automatic solution can be developed for over- coming the heterogeneity problem. In this situation the inputs of a business expert are needed, and the process mediator will have to provide semi-automatic support for the domain expert. In the second case the processes are compatible, the mismatch existing only on the message exchange level. In this case the process mediator can provide a completely automatic mediation solution. An example of mismatch that can not be au- tomatically solved is when one process expects a message that the other one will never send. In this case the domain expert can select a third process that will generate that message, or manually create it. The first step for process mediation is to determine the nature of the problem, if it can be solved automatically or not. The heterogeneity problems that can be automat- ically solved are called solvable (or communication) mismatches, while the ones that require domain expert interactions are called unsolvable (or process model) mismatches [1]. 2 Related Work Process mediation is still a poorly explored research field, in the context of Seman- tic Web Services. The existing work represents only visions of mediator systems able to resolve in a (semi-) automatic manner the processes heterogeneity problems, without presenting sufficient details about their architectural elements. Still, these visions repre- sent the starting points and valuable references for the future concrete implementations. Two integration tools, Contivo1 and CrossWorlds2 seemed to be the most advanced ones in this field. Contivo is an integration framework which uses metadata representing messages organized by semantically defined relationships. One of its functionalities is that it is able to generate transform code based on the semantic of the relationships between data elements, and to use this code for transforming the exchange messages. However, Contivo is limited by the use of a purpose-built vocabulary and of pre-configured data models and formats. CrossWorlds is an IBM integration tool, meant to facilitate the B2B collaboration through business processes integration. It may be used to implement various e-business models, including enhanced intranets (improving operational efficiency within a busi- ness enterprize), extranets (facilitating electronic trading between a business and its suppliers) and virtual enterprizes (allowing enterprizes to link to outsourced parts). The draw-backs of this approach is that different applications need to implement different collaboration and connection modules, in order to interact. As a consequence, the inte- gration of a new application can be done only with additional effort. 3 Contributions The main contribution of this thesis is the development of a semantic process mediation solution. This overall accomplishment consists of a number of smaller contributions: 1. Identification and formalization of a set of atomic problems that can be auto- matically solved by a mediator (solvable or communication mismatches), as well as identification of a set of problems that can not be automatically overcome (unsolvable or process model mismatches). 2. Development of a run-time process mediator able to address the solvable mis- matches. 3. Development of a design-time process mediation for allowing the domain expert to accommodate for the unsolvable mismatches. 4. Development of a comprehensive architecture for process mediation. Because of space constraints this extended summary of the thesis contains only details of how the formalization and of the algebra developed in the thesis. 3.1 Notations and Definition The service mediator performs an automatic analyze of the two processes involved in a communication. The internal decisions taken inside any of the processes are not relevant 1 http://www.contivo.com/ 2 http://www.sars.ws/hl4/ibm-crossworlds.html in this case, the mediator operating on the level of messages sent and received during the actual communication. In this sense it can be considered that the mediator operates on one particular branch of each process involved in the communications. That is, if depending on one condition one of the processes can perform one activity or another, the run-time mediator sees only the result of evaluating that condition, only the activity that is performed. Furthermore, in a semantic environment the messages are important only from the point of view of the semantic information they carry. This information consists of in- stances of concepts defined in an ontology used in the description of the process (in the process model). If the process description specifies that message M1 contains in- stance I1 of concept C1 , the mediator understands this as M1 consists of an instance of C1 , or in other words an instance of C1 is being sent or received. The previous two formulations are further simplified to Message C1 . If a message M1 consists of multiple instances of multiple concepts (C1 , C2 ,... Cn ) it will be referred to as: message C1 and C2 and ... and Cn . This definition still holds if multiple instances of the same concept are part of the same process, in which case the message will refer to every one of these instances. The notation used for denoting that message C1 is to be sent by a process is S(C), while a message that should be received by a process is represented by R(C). For denot- ing that a process should be either sent or received the notation used is A(C) (an action for handling the message C). If the message carries more then one instance, of types C1 , C2 ,...Cn , this is denoted by A(C1 +C2 +...+Cn ). The order of messages is represented by using the symbol ³. The message sequence of a process P is represented as M S(P ). If P exchanges n messages during a communication, then: M S(P ) = A(C1 ) ³ A(C2 ) ³ ... ³ A(Cn ) For representing the communication between two processes P1 and P2 the notation M S(P1 ) M S(P2 ) is used. The fractions for representing a communication can be decomposed in multiple fractions, respecting the messages sequences of the processes involved in the commu- nication. If : M S(P1 ) = M S1 (P1 ) ³ M S2 (P1 ) and M S(P2 ) = M S1 (P2 ) ³ M S2 (P2 ) M S(P1 ) M S1 (P1 ) M S2 (P1 ) then : = ³ M S(P2 ) M S1 (P2 ) M S2 (P2 ) Furthermore, the following terms are defined: Definition 1. An Atomic Send/Receive (Atomic S/R) is considered to be that particular fragment of a communication consisting of one process sending a message and the other process receiving it. S(C) R(C) Atomic S/R(C) = or Atomic S/R(C) = R(C) S(C) Definition 2. A Projection of a process, denoted by π(P ), is a derived process obtained from P as the result transformations performed by the run-time Process Mediator. The communication between two processes is equivalent with the communication between one process and the projection of the other process, which is denoted by the symbol ≈. M S(P1 ) M S(π(P1 )) ≈ M S(P2 ) M S(P2 ) Definition 3. There is a Match between two given processes if the communication be- tween them can be represented as a sequence of Atomic S/R. The notation used for denoting that two processes P1 and P2 match is: M atch(P1 , P2 ) Definition 4. Two processes are considered to be Compatible if there is a Match be- tween them or if every mismatch is at the message sequence level. The notation used for denoting that two processes P1 and P2 are compatible is: Compatible(P1 , P2 ) Both Match and Compatible relationships are symmetric. 3.2 Process Mediation - Lemmas and Theorems A set of lemmas can be defined for obtaining the projection of a process, given its message sequence. An example of such lemma is: Lemma 1. For a given process P where M S(P ) = M S1 (P ) ³ S(C) ³ M S2 (P ), a process P’ such as M S(P 0 ) = M S1 (P ) ³ M S2 (P ) is a projection of P (i.e., P 0 = π(P )). A total of 8 lemmas are defined for governing the creation of the projections, based on the message exchange sequence of all the processes involved in the communication. All of them define the conditions under which the messages can be interchanged in order to create projections. Furthermore, the thesis defines and proves several theorems for the process interop- erability, given the relationships between their projections. The most general one is: Theorem 1 Any two processes P1 and P2 are compatible if and only if exist two pro- jections P1n and P2m that match, where P1i = π(P1i−1 ) and P2j = π(P2j−1 ) where 1 ≤ i ≤ n and 1 ≤ j ≤ m. As part of this thesis, a run-time process mediator able to apply the projections described above for each process in respect with the process it communicates with was developed. The appropriate projections are determined based on the exchange patterns of the processes involved in the communication, involving a detailed analyze of the processes and the evaluation of the rules that govern the message ordering. For dealing with the heterogeneity problems that cannot be automatically solved, a design-time process mediator which provides support to the domain expert was also developed. 4 Evaluation The approach and prototypes developed in this thesis will be evaluated based on a two- fold methodology. Firstly, the thesis will consider a real use-case scenario developed as part of the SUPER project; this type of evaluation will prove that the approach is appli- cable in a real scenario. Secondly, in order to prove the correctness and completeness of the formal modeled developed in this theses, it will be evaluated against the existing workflow data patterns, based on the data visibility, interaction, transfer and routing [3]. 5 Work Plan The most important steps in accomplishing the objectives of this theses were already performed: 1. Identification of the types of mismatches that can be automatically solved. 2. Formalization of the operations that can be automatically performed by a mediator without breaking the communication. 3. Development of proof of concepts design-time and run-time prototypes needed for the process mediation. 4. Identification of a real-use case scenario, detailed analyze of the problems raised by the scenario. However, important phases needed for the completion of the theses are still on- going, such as: 1. Development of a comprehensive architecture for process mediation which will al- low the integration of the two prototypes previously developed, providing complete solutions for process mediation; 2. Evaluation of the prototypes based on the available scenario; 3. Evaluation of the completness and correctness of the approach based on the existing workflow data patterns References 1. C. Bussler. B2B Integration: Concepts and Architecture. Springer, 2003. 2. E. Cimpian, A. Mocan, and M. Stollberg. Mediation enabled semantic web services usage. Proceedings of the First Asian Semantic Web Conference, 09 2006. 3. N. Russell, A. H. ter Hofstede, D. Edmond, and W. M. van der Aalst. Workflow data patterns. Technical report, Workflow Patterns Initiative, 2005.