Repository for Business Processes and Arbitrary Associated Metadata Jussi Vanhatalo12 , Jana Koehler1 , and Frank Leymann2 1 IBM Research GmbH, Zurich Research Laboratory, Säumerstrasse 4, 8803 Rüschlikon, Switzerland {juv, koe}@zurich.ibm.com 2 Institute of Architecture of Application Systems, Universität Stuttgart, Universitätsstraße 38, 70569 Stuttgart, Germany frank.leymann@informatik.uni-stuttgart.de Abstract. We have published a repository for storing business processes and associated metadata. The BPEL Repository is an Eclipse plug-in originally built for BPEL business processes and other related XML data. It provides a framework for storing, finding and using these documents. Other research prototypes can reuse these features and build on top of it. The repository can easily be extended with new types of XML doc- uments. It provides a Java API for manipulating the XML files as Java objects hiding the serialization and de-serialization from a user. This has the advantage that the user can manipulate the data as more convenient Java objects, although the data is stored as XML files compliant with the standard XML schemas. The data can be queried as Java objects using an object-oriented query language, namely the Object Constraint Lan- guage (OCL). Moreover, the flexible design allows the OCL query engine to be replaced with another engine based on other query language. 1 Introduction Interoperability based on several XML standards is one of the corner stones of Web services. The Business Process Execution Language for Web Services (BPEL) [2] is the defacto industry standard for representing business processes. It is tightly related to other XML standards, such as the Web Service Definition Language (WSDL) and XML Schema. In addition, arbitrary metadata repre- sented in XML format may be associated to business processes depending on the context and applications that use the data. XML data is commonly used by different applications. However, currently managing the documents and searching information from their contents is labo- rious and inefficient. It is beneficial to store the data in a repository that takes care of data access and executes queries. Although it is important for interoper- ability to exchange data in XML format across organizations and systems, it is often more convenient for a developer to manipulate the data as Java objects, instead of XML. Our goal was to build a business process repository that stores data as documents compliant to the XML standards, but allows applications to be implemented directly on the Java representation of the data model. We have implemented the BPEL Repository, which is an Eclipse plug-in built to store business processes together with other XML data. It provides a frame- work for storing, finding and using these documents. Other research prototypes can reuse these features and build on top of it. The repository can easily be extended with additional XML schemas because of its flexible architecture. By default it supports the common Web service standards, such as BPEL, WSDL and XML schema, and it can easily be extended to support other XML schemas for business processes and other data. The object-oriented approach frees developers from the burden of the under- lying XML data model and allows them to concentrate on the object model of their application, which they usually know well. The Eclipse Modeling Frame- work (EMF) [10] is used to hide data serialization and de-serialization from the user. The framework takes care of representing the XML data as EMF objects that are Java objects. As a novel feature, it is possible to query the XML files as EMF objects using an object-oriented query language, namely the Object Con- straint Language (OCL) [7] that is part of the UML specification. Native XML databases support typically XQuery as their query language. A major advantage of OCL over an XQuery is its ability to navigate through the data model and follow all the associations of an object model. In contrast, XQuery forces the user to formulate the queries based on the tree structure of the underlying XML schema. In contrast to our file system based solution, there are other business process repositories [12] [14] that are built on top of a database system. However, their database schemas are created manually. Flexibility is an advantage of the BPEL Repository, because EMF is used to automatically generate support for new and modified XML schemas. In research projects, data structures are often modified and new ones are introduced. The automatization makes adapting these changes easier. Nevertheless, repositories based on a database have typically better per- formance and scalability than our solution. The BPEL Repository was recently published in IBM alphaWorks [16] with special licensing terms for academic use. The software has been integrated with a change management system called CHAMPS [3], [13]. 2 Solution The architecture of the BPEL Repository is presented in Figure 1. All compo- nents are plug-ins on the Eclipse platform. The core component of the solution is the Repository API, which provides an application programming interface for external software to build on. The Repository User Interface (UI) is an example implementation that uses the Repository API. However, it is also a useful graphical user interface to man- age the contents of the repository. The user interface is integrated in the Eclipse workbench and built on the Standard Widget Toolkit (SWT) and JFace libraries. The Data Handler is a sub-component that takes care of the data access on a file system. It abstracts the choice of the storage medium from the other ' ) + ( #%& ' $ #%& ' ' ( #%& ) $ #%& , ' * !" ! # Fig. 1. Repository architecture showing components and technologies used. components. The data access component could be replaced with another one storing data in a database by using a technology such as the Service Data Objects [10]. The Data Handler uses the Eclipse Modeling Framework to serialize EMF objects into XML files and de-serialize the files back to EMF objects. Thus, all repository components manipulate data as EMF objects rather than of XML. 2.1 Flexibility of Manipulating Data as EMF Objects In the repository, data is represented as EMF objects. Therefore, all data must have an EMF model. However, the EMF model can be automatically generated from an XML schema, a UML class diagram or Java classes [4], [10]. As in the business process management context data is often stored as XML conforming standardized XML schemas, it is trivial to obtain EMF models for XML files. The repository can be extended to support a new data type by plugging in the EMF model of this new data type. The components providing EMF models for the repository are shown on the left-hand side of the Repository API in Figure 1. The Default EMF Extensions plug-in contains EMF models for BPEL, WSDL and XML schema standards. Thus, the repository supports the respective file types by default. This compo- nent can be replaced by another component supporting a different version of these standards or completely different file types. Because the Eclipse plug-in mechanism is used, this does not require any modifications in the other parts of the repository. Similarly, other EMF extensions can be plugged into the repository. In the evolving research community, extensibility is an asset. For instance, in the con- text of combining business process management with semantic Web, the reposi- tory can easily be extended to support a new document type containing metadata related to a business process. The Sample EMF Extension contains an EMF model that is used in the user guide [16] to illustrate, step by step, how to create an EMF model and plug it into the repository. It is also explained how EMF can be used to automatically generate a graphical editor for the instances of an EMF model. First, the data structure can be modeled as a UML class diagram, which is usually much faster than describing the same structure as an XML schema. Next, the UML class diagram is transformed to an EMF model. An editor can be automatically generated for the EMF model. An XML schema can be generated from the EMF model, if desired. In any case, instances of an EMF model can be serialized to interoperable XML files. Instances of the model can be generated for testing purposes with the editor, which takes care of proper syntax. Finally, the EMF model is plugged into the repository, which persists the data and provides capabilities for querying data. A chief advantage is that queries can be formulated using the same object-oriented model as was used to create the data structure in the first place. Thus, the XML representation is used only for interoperability with other systems, and the developers need not bother with the concrete XML syntax. 2.2 Query Engines We used existing query engines with the repository. The repository handles the iteration over the queried objects, but each sub-query is executed in the query engine plugged into to the repository. It is possible to change the query engine to another pre-registered one between queries. The available query engines are shown on the right-hand side of the Repository API in Figure 1. The repository has been tested with two OCL query engines, that query Java objects with an object-oriented query language, namely OCL. If the repository is installed on top of the IBM Rational Software Architect (RSA) product, the OCL engine of the latter can be used. However, as we did not want to limit the repository to a single commercial query engine or a specific query language, the query engine interface has been built generic. Therefore, the IBM OCL engine is plugged into the repository using an adapter. The IBM OCL Engine Adapter is delivered together with the repository. Another OCL engine was built at the University of Kent [1]. It is an open source tool that can be plugged into the repository using the Kent OCL Engine Adapter. The Eclipse Modeling Framework Technology project [11] is building another open source OCL engine that could also be adapted to the repository. We have not yet implemented the corresponding adapter because this OCL engine is still under development. It is straightforward to plug a new query engine into the repository or adapt an existing query engine for it. An example of how to adapt an OCL engine is included in the user guide [16]. The Dummy Query Engine is an example implementation to show how a new query engine can directly be integrated into the repository. Thus, also query engines based on another query languages can be used. The repository is not aware of the query language that is used. The repository merely passes the query and other parameters from the user interface or external software to the query engine selected together with the EMF object that is to be queried. Thus, any query engine that can execute queries on EMF objects can be plugged into the repository. One limitation of the query mechanism is that the performance is only linear compared with the number of documents that are queried. Indexing data or other ways to improve the query performance are not used. However, this performance has been sufficient for research prototypes. For example, querying 100 BPEL files took 3 seconds on a laptop in our performance tests [15]. One way to improve OCL queries would be to map them to a query language, such as XQuery, that is natively supported by a database system. In that case, the repository would also be built directly on top of the database system. Some work on mapping OCL to XQuery already exists [5], [6]. 2.3 Data Structure The data is organized in a tree of organizations. The organizations are mapped to directories in a file system. Each organization may contain a business process and associated metadata grouping the related files together. In addition to these data documents, a descriptor document is stored in each organization. It contains the file type and the role of each data document in the organization. This information is used to make the conversion between EMF objects and XML files. In addition, the role describes how the data document is related to the other documents in the organization. For instance, a WSDL file stored with the repository may contain the public interface or the partner links of the BPEL business process. Queries can be applied to files with a specified role in an organization, a list of organizations, or a list of sub-trees in the organization hierarchy. Related files can be searched based on their roles. Data can be accessed from the file system as XML files and through the repository as EMF objects. Any directory in a file system can act as the root organization of the repository contents. The data in the repository can be moved to another location or a computer as simply as copying the directories. 2.4 Usage Scenario The repository has been deployed with a change management system called CHAMPS. As part of the solution, planning algorithms are used to facilitate the automatization of the change and configuration management. The plans are stored as BPEL files into the BPEL Repository. In addition, the plans are analyzed and the results are stored as metadata associated to the plans. The metadata includes information about the plan such as its number of activities, degree of concurrency, execution duration and correctness [13]. Storing the analysis results as metadata enables reuse of the data, and unnec- essary recomputing of the results can be avoided. The suitable plans are found from the repository by querying the content of the plans and their associated metadata. For example, the plans that are structurally correct can be found by querying the metadata. Among these plans the ones that reach a specified goal can be found with a subsequent query. During the development of the system, trying out different alternatives of the metadata schema was uncomplicated, since the data structure was designed as a UML class diagram and the corresponding EMF classes were used as the basis of the implementation and the OCL queries. The developers were able to avoid completely working with the XML representation of the data, because it was automatically generated and used only as the serialization format behind the scenes. 3 Conclusion The repository has already been proved useful for IBM internal research proto- types [13]. By publishing the repository, we wanted to make it widely available to the research community as we are interested in more user experience with it. We would also be interested in finding out how convenient users find OCL as a query language, because currently OCL is more common as a language to express constraints rather than queries. As next steps, we plan to contribute our experiences gained while building the repository to the IP-SUPER project [8], [9] funded by European Union. The project merges business process management with semantic Web services. As part of the project, we plan to build a business process library, most likely on a database system rather than a file system in order to improve the query performance for more extensive querying purposes. This would also be beneficial, when the repository is used for searches based on an ontology or as a component of a business process execution engine. References 1. Dave Akehurst and Octavian Patrascoiu. Object constraint language library. Web site, June 2004. http://www.cs.kent.ac.uk/projects/ocl/. 2. Tony Andrews, Francisco Curbera, Hitesh Dholakia, Yaron Goland, Jo- hannes Klein, Frank Leymann, Kevin Liu, Dieter Roller, Doug Smith, Satish Thatte, Ivana Trickovic, and Sanjiva Weerawarana. Business Process Ex- ecution Language for Web Services. OASIS Org., 2003. http://www- 106.ibm.com/developerworks/webservices/library/ws-bpel/. 3. Aaron B. Brown, Alexander Keller, and Joseph L. Hellerstein. A model of con- figuration complexity and its application to a change management system. In Proceedings of the 9th International IFIP/IEEE Symposium on Integrated Man- agement (IM 2005), May, 2005. 4. Frank Budinsky, David Steinberg, Ed Merks, Raymond Ellersick, and Timothy J. Grose. Eclipse Modeling Framework. The Eclipse Series. Addison-Wesley Profes- sional, 2003. 5. Ahmed Gaafar and Sherif Sakr. Proposed framework for integrating XML/XQuery and UML/OCL. In Proceedings of the 7th Conference in the UML series (UML2004), pages 241–259, Lisbon, Portugal, 2004. 6. Ahmed Gaafar and Sherif Sakr. Towards a framework for mapping between UML/OCL and XML/XQuery. In Proceedings of the IADIS e-Society 2004 Con- ference (ES2004), pages 241–259, 2004. 7. Object Management Group. OCL 2.0 Specification. OMG, 2005. http://www.omg.org/docs/ptc/05-06-06.pdf. 8. Martin Hepp, Frank Leymann, John Domingue, Alexander Wahler, and Dieter Fensel. Semantic business process management: A vision towards using semantic web services for business process management. In Proceedings of the IEEE ICEBE 2005, pages 535–540, Beijing, China, October 2005. 9. IP-SUPER. Semantics utilised for process management within and between enter- prises. Web site, April 2006. http://www.ip-super.org/. 10. Eclipse Org. Eclipse modeling framework. Web site. http://www.eclipse.org/emf/. 11. Eclipse Org. Eclipse modeling framework technology. Web site, 2006. http://www.eclipse.org/emft/projects/ocl/. 12. Minrong Song, John Miller, and Ismailcem Arpinar. RepoX: XML repository for workflow designs and specifications. Technical Report #UGA-CS-LSDIS-TR-01- 011, University of Georgia, August 2001. 13. Biplav Srivastava, Jussi Vanhatalo, and Jana Koehler. Managing the life cycle of plans. In Proceedings of the 17th Innovative Applications of Artificial Intelligence Conference, pages 1569–1575, Pittsburgh, Pennsylvania, USA, 2005. 14. Tammo van Lessen. Konzipierung und Entwicklung eines Repository für Geschäftsprozesse. Master’s thesis, Institute of Architecture of Application Sys- tems, University of Stuttgart, March 2006. 15. Jussi Vanhatalo. Building and querying a repository of BPEL process specifica- tions. Master’s thesis, Helsinki University of Technology, Institute Eurecom and University of Nice – Sophia Antipolis, September 2004. 16. Jussi Vanhatalo. BPEL Repository. IBM alphaWorks, April 2006. http://www.alphaworks.ibm.com/tech/bpelrepository.