Process and Deviation Exploration with Inductive visual Miner Sander J.J. Leemans, Dirk Fahland, and Wil M.P. van der Aalst Eindhoven University of Technology, the Netherlands {s.j.j.leemans, d.fahland, w.m.p.v.d.aalst}@tue.nl Abstract Process mining aims to extract information from recorded process data, which can be used to gain insights into the process. This requires applying a discovery algorithm and settings its parameters, after which the discovered process model should be evaluated. Both steps may need to be repeated several times until a satisfying model is found; we refer to this as process exploration. Existing commercial tools usually do not provide models having executable semantics, thereby disallowing for accurate map evaluation, while most academic tools lack features and by the repetitive nature of process exploration, their use is tedious. In this paper, we describe a novel process exploration tool: the Inductive visual Miner. It aims to bridge this gap between commercial and academic tools, by combining the executable semantics of academic tools with the exploration support of commercial tools. It also adds animation and deviation visualisation capabilities. Keywords: Process mining, process exploration, deviation analysis 1 Process Exploration To gain insights in business processes based on factual knowledge, recorded event data can be analysed using process mining. Process mining aims to extract in- formation from recorded process data, stored in an event log, and starts with discovering a process model from the event log. However, many process discov- ery algorithms exist, their parameters have to be set, and the question at hand might require to focus on specific parts of the event log. The implications of these choices are, although well-studied for academic approaches, unclear for the av- erage user, which makes it difficult to obtain a model that is suitable to answer the question at hand. In this paper we focus on process exploration, which is the process of repeatedly trying settings until a satisfactory model is discovered [4]. The first step to take in process exploration is to select a process discovery algorithm and to set its parameters. Moreover, the scope of the exploration needs to be set by applying all kinds of filters and choosing a perspective, e.g. one can focus on the control flow or resource perspective. Copyright c 2014 for this paper by its authors. Copying permitted for private and academic purposes. 2 Sander J.J. Leemans, Dirk Fahland, and Wil M.P. van der Aalst In the second step of the exploration cycle, one needs to apply the algorithm in set scope the selected scope to the event log to ob- tain a process model. Before conclusions use discover can be drawn and insights can be gained, process map evaluate process map the model should be evaluated. For in- process map stance, compliance related questions, such Figure 1: Exploration cycle. as whether the four-eyes principle was ad- hered to, can only be answered if the model represents a large part of the be- haviour in the event log, and future related questions should only be answered using models that are likely able to represent future behaviour. Evaluation of a model with respect to an event log can only be done accurately if the behaviour that the model allows is well-defined, i.e. if it has executable semantics, and different parts of the model might have different problems. Often, general questions, such as what a process looks like, lead to more specific questions such as where in the process delays or deviations occur, or to questions that need to be answered using other perspectives on the event log. Or, the evaluation shows that the question cannot be answered with the discovered process model. Then, the parameters need to be set again and a new model must be discovered; process exploration is a highly iterative process. After a user has found a suitable model, that model can be used in for instance automatic enactment of models in systems [5], in automatic prediction [7] and in compliance checking [6]. The full process exploration cycle is shown in Figure 1. All of these uses for process models require that the model can be processed automatically, for which it needs to have executable semantics. Current commercially available process exploration tools offer plenty of op- tions to set the scope of the exploration, but usually do not produce models having executable semantics, which thus cannot be used for automated evalua- tion or further use. There is plethora of academic tools available to set the scope of the exploration, to discover a process model and to evaluate it, but given the nature of process exploration, using them iteratively is tedious. In this paper, we introduce a tool, Inductive visual Miner (IvM), that aims to bridge this gap between commercial and academic tools. It supports the steps of process explo- ration by chaining existing academic tools and streamlining their use. Moreover, it improves on evaluation by a new notation and the addition of animation and quick node selection filtering. Thus far, such capabilities only existed for tools having no or just weak semantics or without formal guarantees (Fuzzy Miner, Disco, BPM|One, Celonis, Perceptive, etc.). IvM has been implemented as a plug-in of the ProM framework, which can be obtained by installing ProM 6.4 from http://promtools.org and, us- ing the ProM package manager, installing the plug-in Inductive visual Miner. Example event logs can be obtained from http://www.processmining.org/ logs/start; a screencast is available at http://vimeo.com/user29103154/ inductivevisualminer. Inductive visual Miner 3 In the remainder of this paper, we explain the implementation of IvM, high- light the deviation visualisation and give an example. For a detailed comparison with existing exploration approaches, please refer to [4]. 2 Inductive visual Miner: Implementation The architecture of IvM resembles a chain of analysis and visualisation tasks, shown in Figure 2. To encourage exploration, a user can change any parameter at any time. IvM will ensure that the current computation is discarded and the chain is restarted from the first task that is influenced by the parameter change. For instance, if the user selects or deselects a node, only the tasks ‘filter node selection’ and ‘animate’ are redone. As especially the align task can take some time, intermediate visual results are shown to the user until the next task is finished. model enriched model highlighted model animated model prepare filter filter discover align animate log activities node selection perspective activity threshold noise threshold selected nodes Figure 2: Chain of tasks, their parameters (bottom) and their visual results (top). If a user changes a parameter, the necessary tasks restart immediately. In the prepare log task, the events in the log are classified using the provided perspective classifier. Next, in the filter activities task, given a threshold value, the most-frequent activities are kept, the events of other activities are filtered out. The Inductive Miner - infrequent (IMi) [3] discovery algorithm is applied in the discover task. IMi takes as an input parameter the amount of noise fil- tering to be applied to paths and produces a process tree. In the align task, the traces of the event log are aligned to find the best matching runs through the model (needed in case of deviations between model and log) [1]. This provides the information needed to enrich the model with information how often model elements were executed in the event log. The filter node selection task filters the aligned traces to keep only those that go through a selected node. The final task, animate, computes when traces passed model elements; this information is used to show a quick animated preview of traces in the log onto the model1 . If the log contains no timestamps, random timestamps are inserted for demonstration purposes. Once the model is available, it can be exported to ProM for further analysis, both as a Petri net and as a process tree; a user can perform its own evaluation without waiting for the evaluation of IvM to finish. At any point during the exploration, the model can be saved as bitmap (png) and vector (pdf, svg) image formats. The full animation of the complete log can be exported to bitmap (avi) and vector (svg) based movie formats once it is computed. 1 At time of writing, we limited the quick preview to 50 traces for performance reasons. 4 Sander J.J. Leemans, Dirk Fahland, and Wil M.P. van der Aalst Deviations. Deviations are a crucial part of the evaluation: they show precisely C what parts of the model deviate with 7 1 respect to the log. Deviations are visu- alised to show shich parts of the model fit 1 well and which parts do not. This is im- portant for drawing reliable conclusions. Figure 3: Model with the result of Two types of deviations have been identi- the align task. The edge circumvent- fied [1]: if a trace contains an event that is ing C denotes a model move; the not allowed by the model, it is a log move; self-edge on the right a log move. if the model requires an event that is not present in the trace, it is a model move. Log and model moves are identified by the align task, that chooses a run through the process model such that the number of such deviating moves is minimal. As shown in Figure 3, IvM visualises both of them using dashed red edges; such an edge that circumvents an activity represents a model move, while a self-edge represents a log move. Example. Figure 4 shows the initial model with default values for all parameters. Looking at this model, the question rose what the happy flow of the process was, i.e. the most frequently taken path. After a few iterations, parameters were settled: using only the 50% most frequent activities and applying noise filtering of 20%, a happy flow of 6 activities was uncovered. Before exporting this model for further analysis, the deviation visualisation was turned on, resulting in the model shown in Figure 4b. This shows that the fourth and fifth activity are often skipped. (a) Default parameters. (b) After a few iterations; with deviations and animation. Figure 4: Screenshot of IvM applied to ‘A’ activies of [2]; default parameters. Inductive visual Miner 5 3 Conclusion In this paper, we discussed the cycle of process exploration, consisting of re- peatedly setting parameters, discovering a process model and evaluating it. We identified a gap between existing commercial and academic process exploration tools: commercial tools usually do not provide models having executable seman- tics, thereby disallowing for accurate map evaluation, while most academic tools lack features such as seamless zooming and animation, thus do not support the repetitive nature of process exploration well. We introduced a process exploration tool, Inductive visual Miner (IvM), that aims to bridge this gap. When started, IvM immediately applies a chain of analysis and visualisation tasks to show the user not only a model, but also the traces of the event log animated on it, and where the log and model deviate from one another. IvM encourages the user to interact by enabling setting parameters at anytime: computations will be restarted as necessary in the background. IvM is not as feature-rich as some of the commercial tools, but shows that it is possible to use powerful techniques with formal guarantees in a user-friendly package. We hope that IvM will inspire commercial vendors to consider models with executable semantics and support deviation analysis. Extensions to IvM can be made in all tasks, for instance other process tree discovery algorithms can be plugged in instead of IMi. In the future, we’d like to include approximation algorithms to compute the alignments in order to speed it up. To allow for even better evaluation, several extensions are possible, such as global quality measures (fitness, precision and generalisation) and identification of traces in the animation. Furthermore, several other filters such as filters on specific activity, timestamp, resource and on data could be included to give a user more freedom in setting the scope. References 1. Adriansyah, A.: Aligning Observed and Modeled Behavior. Ph.D. thesis, Eindhoven University of Technology (2014) 2. van Dongen, B.: BPI Challenge 2012 Dataset (2012), http://dx.doi.org/10.4121/ uuid:3926db30-f712-4394-aebc-75976070e91f 3. Leemans, S., Fahland, D., van der Aalst, W.: Discovering block-structured process models from event logs containing infrequent behaviour. In: Business Process Man- agement Workshops. pp. 66–78 (2013) 4. Leemans, S., Fahland, D., van der Aalst, W.: Exploring processes and deviations. In: Business Process Management Workshops (2014), to appear 5. Meyer, A., Pufahl, L., Fahland, D., Weske, M.: Modeling and enacting complex data dependencies in business processes. In: BPM. Lecture Notes in Computer Science, vol. 8094, pp. 171–186. Springer (2013) 6. Ramezani, E., Fahland, D., van der Aalst, W.: Where did I misbehave? Diagnostic information in compliance checking. In: BPM. Lecture Notes in Computer Science, vol. 7481, pp. 262–278. Springer (2012) 7. Wynn, M., Rozinat, A., van der Aalst, W., ter Hofstede, A., Fidge, C.: Process mining and simulation. In: Modern Business Process Automation, pp. 437–457. Springer (2010)