An Extensible Platform for Process Model Search and Evaluation Christian Ress and Matthias Kunze Hasso Plattner Institute at the University of Potsdam, christian.ress@student.hpi.uni-potsdam.de, matthias.kunze@hpi.uni-potsdam.de Abstract. We present a platform that integrates a number of process model search techniques and provides a uniform interface for query formulation and search result presentation, as well as a framework to evaluate a particular search technique. The platform provides researchers with a common infrastructure to embed their search technique in form of a dedicated search engine. The demo explains the features of our platform and encourages researchers to contribute their search techniques. 1 Introduction & Background Business process models are a central cornerstone of process-oriented organiza- tions as they explicitly capture the knowledge to carry out the operations of a business and are reused for documentation, analysis, automation, and certifica- tion, among others. Modern companies maintain thousands of process models for reference and reuse [2], which requires effective capabilities to search among them. Researchers have proposed an abundance of techniques that focus on text, structure, or behavior of process models and allow for similarity search, i.e., obtaining approximate matches to a complete model, and querying, i.e., searching precisely by few yet relevant aspects of a model [2]. The majority of these process model search techniques, however, has been presented only theoretically. Evaluations of their quality and performance has been conducted under lab conditions, i.e., with a minimalistic implementation that mainly addresses the query processing, i.e., comparing candidate models with the query and deciding a match. This is, arguably, due to the effort of providing a complete process model search infrastructure that includes a user interface to formulate a query, a process model repository to store, manage, and retrieve models from, and the visual presentation of search results as a response to the query. This functionality is shared among all process models search techniques. To this end, we have developed a prototypical process model search platform that assumes these tasks and allows for the integration of dedicated search techniques in form of a search engine plugin architecture. This includes a set of well-defined APIs that integrate a search engine with our platform. Moreover, the platform provides a framework to evaluate a search engine with regards to the quality of a search technique, i.e., the relevance of the provided results, and, the performance of its implementation. The platform aims to reduce the time to implement and evaluate a particular search technique, and enables the 2 Christian Ress and Matthias Kunze comparison of various techniques, as they can now be deployed in the same runtime environment. 2 Search with your Search Technique The central concept of our search platform is to enable developers to deploy a dedicated search engine to the platform and use it to search for process models in a straight-forward manner. Hence, one of the key features of our search platform is a presentation layer, which lets users specify search queries using BPMN and view ranked search results in a similar visual representation, depicted in Fig. 1. Fig. 1: Screenshot of the web-search interface. The presentation layer includes a simplistic, web-based process model editor, that allows formulating queries as regular process models, as this is typical for similarity search and has been proposed for querying [6], too. The editor itself is highly extendable, which allows the formulation of queries in languages devised for search, e.g., BPMN-Q [1]. The input query is provided to the dedicated search engine that parses it and matches it against stored process models. To this end, BPMN queries are transformed to Petri nets. The extensible architecture of the search platform makes it possible to inte- grate a dedicated search engine, provided it is implemented in Java, by providing common interfaces through which the search platform can communicate with the search engine. Currently, we require that search algorithms accept queries in form of Petri nets, defined using the jBPT Java library1 , and return search results in a similar format. We resorted to Petri nets, as they provide a common formalism for a variety of business process modeling languages [7]. Our aim is to provide researchers with an opportunity to experiment with their algorithms faster and easier, and explore the search results in an environment that is similar to one that end users expect. We do this by providing an API through which search algorithms can expose parameters that can be changed during runtime, and make these parameters accessible in the search interface. This way, parameters can be configured without modifying source code or static configurations, and without recompilation. The results of the chance are visible immediately. 1 http://code.google.com/p/jbpt/ An Extensible Platform for Process Model Search and Evaluation 3 3 Evaluate your Search Technique Our platform has originally been devised to integrate a number of dedicated search techniques in a common infrastructure. However, it turned out that the very same functionality of a search engine can be used to evaluate the underlying search technique and its implementation. That is, experiments are typically carried out by running a well-defined set of queries against a set of candidates, and assessing quality and performance. Thus, we developed a framework that allows running predefined experiments against search engines without the need for a complex evaluation infrastructure. Fig. 2: Screenshot of the interface for precision/recall analysis. Two methods for evaluating a search technique are provided. Quality judges on the relevance of matches in a search result with respect to a human assessment and therefore needs a reference data set. For similarity search, such a dataset has been introduced in [3]. Performance addresses the resource consumption of the implementation of a search engine and its scalability with regard to large process model collections. Hence, performance can be evaluated without a human data set, using any process model collection. For this purpose, a number of time measurements and counters are provided through an API, which can be uses during the execution of a search run by a dedicated search engine. Features, such as support for laps and persistence of counters and timers over multiple search requests are available. All measurements are automatically included in the response of a query, along with statistical measures such as average, median, quantiles, standard deviation, etc. To evaluate a search technique, we developed a web-based evaluation interface that allows choosing among a set of quality and performance evaluation methods, e.g., computation of precision and recall values and the visualization of precision- recall curves. With regards to performance measures, trend analyses can be 4 Christian Ress and Matthias Kunze plotted over a number of search runs for various sizes of the candidate model collection. Fig. 2 shows an excerpt of the evaluation interface that allows choosing a dedicated search engine and compute precision and recall values for each of the queries. The result is provided in a table for each query, and a visualization shows the queries in a coordinate system. This allows for fast identification of queries with significantly good (right upper quadrant) or poor (left lower quadrant) quality. 4 Architecture The search platform has been implemented as Search Evaluation a two-tiered web application, consisting of an Web Interface HTML5 frontend and a Java backend, depicted in R R Fig. 3. The web-search and evaluation interfaces Platform Server are implemented as web applications that run in R common web browsers and require no installation. R They communicate with the search platform via a Index Index Model R Cache JSON API and prove for the interaction with the R user. For search, a simplistic process model editor Query Repository Processor based on the processWave editor2 is provided to formulate the query. Search results are provided Search Engine Process Models as an ordered list along with quality measures, cf. [4]. Evaluation offers predefined experiments, Fig. 3: Architecture of Process i.e., a set of queries and candidates, run against Model Search Platform a dedicated search engine and visualizes results in terms of quality and performance. Particular techniques to match a query with models from the process model repository are realized in dedicated search engines. Search and Evaluation interfaces communicate with the platform server. As an evaluation comprises running reference searches, the platform server does not distinguish between search and experiment. That is, the experiment framework is implemented completely client-side. The search platform server integrates different dedicated search engines. Such an engine comprises at least a query processor that decides, whether a candidate matches the query and scores relevance. A custom index enables efficient search. To facilitate the implementation of these components, the search platform provides shared components that can be accessed by the components of a search engine, i.e., a model cache that underpins a custom index and a persistence interface with the repository that manages stored process models. The model cache increases startup speed as it preserves data that has been expensively precomputed, when models are loaded. Through our strict use of a generic JSON API to access the search platform server it could also be used as a web service for process search and be integrated with other services or applications. 2 http://processwave.org An Extensible Platform for Process Model Search and Evaluation 5 5 Maturity & Show Case The search platform has been implemented as a prototype to elaborate on the requirements on search engines and their integration with a common platform. Since dedicated search methods require their own query processor and indexes, such a platform provides users with a unique search interface that covers various perspectives of process model search, including similarity search and querying. At the current state, we have implemented similarity search and querying based on successor relations, cf. [5,6]. As matching is conducted on net-based formalisms, search results are currently presented using their Petri net representations. This shall be extended in future work. Also, the quality experiments are limited to human assessment of similarity, cf. [3], as other reference data was not available. In the demo, we address researchers that are interested in process model search and may even have proposed a search technique on their own. We will introduce the platform and its architecture in a brief presentation before turning to a live demo that comprises two parts. 1. We demonstrate the search and evaluation capabilities of the platform by means of example queries, and their results. This includes a discussion of search result quality metrics and how they support users in understanding a search result. For evaluation, we show how various measures and diagrams give insight into the quality and performance of a search technique. 2. In a quick walkthrough tutorial, we explain the requirements of a custom search engine and which steps are required to integrate it into the platform by a simple example. A screencast demonstrating our search and evaluation platform can be found at: http://vimeo.com/ress/process-search-demo. The platform is publicly available under the MIT open source license along with a short tutorial on how to use it and integrate a custom search engine at http://bitbucket.org/ress/ process-search. References 1. A. Awad. BPMN-Q: A Language to Query Business Processes. In EMISA, volume 119, pages 115–128, 2007. 2. R. Dijkman, M.L. Rosa, and H.A. Reijers. Managing Large Collections of Business Process Models—Current Techniques and Challenges. Comput Ind, 63(2):91, 2012. 3. R. Dijkman, M. Dumas, B. Dongen, R. Käärik, & J. Mendling. Similarity of Business Process Models: Metrics and Evaluation. Inform Syst, 36(2):498 – 516, 2011. 4. M. Guentert, M. Kunze, and M. Weske. Evaluation Measures for Similarity Search Results in Process Model Repositories. ER ’12, pages 214–227, Springer, 2012. 5. M. Kunze, M. Weidlich, and M. Weske. Behavioral Similarity—A Proper Metric. In BPM ’11, pages 166–181. Springer, 2011. 6. M. Kunze and M. Weske. Local Behavior Similarity. In BPMDS ’1, volume 113 of LNBIP, pages 107–120. Springer, 2012. 7. N. Lohmann, E. Verbeek, and R. Dijkman. Petri Net Transformations for Busi- ness Processes—A Survey. In Transactions on Petri Nets and Other Models of Concurrency II, pages 46–63. Springer, 2009.