Declarative Process Discovery with MINERful in ProM Claudio Di Ciccio1 , Mitchel H. M. Schouten2 , Massimiliano de Leoni2 , and Jan Mendling1 1 Vienna University of Economics and Business, Austria claudio.di.ciccio@wu.ac.at, jan.mendling@wu.ac.at 2 Eindhoven University of Technology, The Netherlands m.h.m.schouten@student.tue.nl, m.d.leoni@tue.nl Abstract. Declarative process models consist of a set of constraints exerted over the execution of process activities. D ECLARE is a declarative process modelling language that specifies a set of constraint templates along with their graphical no- tation. The automated discovery of D ECLARE models aims at finding those con- straints that are verified throughout a given event log. In this paper, we present a fast scalable tool for mining D ECLARE models in ProM. Its usage is described with its application on a use case, based on a publicly available real-life bench- mark. Keywords: Process Mining; Process Discovery; Declarative Processes 1 Overview Process Mining [1] is the area of research embracing the automated discovery, confor- mance checking and enhancement of business process models. All involved techniques are evidence-based, as the input always comprises a collection of computer-recorded information that track the executions of process instances, namely event logs. Indeed, process discovery pertains to the inference of process models stemming from event logs. Over the last years, the declarative process modelling approach has flanked the clas- sical procedural one [5]. Declarative approaches only depict the behavioural constraints under which a process instance can unfold in its execution: as long as the constraints are not violated, the process instance is considered as valid. The declarative approach is a complementary strategy to the procedural models, which specify what are the next allowed activities at each stage of the process execution. Declarative process models are effective in a context of high flexibility for business processes [2]. The reason intu- itively lies in the fact that fewer constraints allow for more possible executions. On the contrary, more flexibility implies a higher number of alternative paths to depict in the procedural models. D ECLARE [2] is a declarative process modelling language. It specifies an extensible set of constraint templates that are parametric with respect to the process activities. A list of constraint templates used in the remainder of the paper are listed in Table 1, where a, b, and c are example activities. Examples of D ECLARE constraints are Init(a), and Response(b, c). The first one states that every instance must start with the execution Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes. Constraints Description Init(a) a should be the first activity in a trace AtMostOne(a) a should be executed at most once CoExistence(a, b) If one of the activities a or b is executed, the other one also has to be executed Response(a, b) When a is executed, b has to be executed after a AlternateResponse(a, b) When a is executed, b has to be executed after a and no other a can be executed in between Precedence(a, b) b has to be preceded by a AlternatePrecedence(a, b) b has to be preceded by a and another b cannot be executed between a and b AlternateSuccession(a, b) Combination of AlternateResponse(a, b) and AlternatePrecedence(a, b) ChainSuccession(a, b) a is immediately followed by b NotChainSuccession(a, b) a is not allowed to be immediately followed by b Table 1: Table listing the constraints mentioned in this paper. of activity a. The second constraint imposes that if activity b is performed, then c must be performed eventually in the future. Init is named existence constraint template as it constrains the execution of one activity in process instances. Response is named relation constraint template instead, because it constrains the interplay of two activities. Among the pair of constrained activities, there always are at least an activation and a target. The activation is an event whose occurrence constrains the possibility of other events (targets) to occur before or afterwards. For example, for the constraint “every request is eventually acknowledged”, each request is an activation. This activation is eventually associated with either a fulfilment or a violation, depending on whether or not the activation is matched with a target event that satisfies the constraint. Using again the example of requests that need acknowledgements, if the request occurs, this activation is associated with a fulfilment if the acknowledgement event is later observed; otherwise, the activation is associated with a violation. For Response(b, c), b is the activation and c is the target. The full list of D ECLARE constraint templates can be found in [2]. This paper reports on the implementation of MINERful, a technique to mine D E - CLARE process models from an existing event log [3]. Compared with other existing techniques, MINERful has shown the best scalability with respect to the input size, in terms of number of traces, length of traces and activities of the process. Readers are referred to[3] for more details about this comparison. In particular, the implementation presented in this paper has been realised in ProM,3 an extensible framework that pro- vides support to develop and exploit a wide variety of process mining techniques in a standardised environment. To use MINERful with ProM, it is necessary to download the ProM Nightly build4 and, subsequently, install the DeclareMinerFul package through the ProM’s Package Manager. 3 http://www.processmining.org/tools/prom 4 http://www.promtools.org/prom6/nightly 2 Usage of the Tool on a Use Case In this paper we will demonstrate the functionalities of MINERful using the publicly available real-life event log Road Traffic Fine Management Process.5 The event log records executions of instances of the process enacted in an Italian local police office for managing fines for road traffic violations. It contains 150,370 traces and 561,470 events for 11 different process activities. 2.1 Parameters The MINERful plug-in uses an event log as input. In the remainder, we will adopt the following example event log: {ha, b, a, ci , ha, b, b, a, c, b, ai , ha, c, ci , ha, b, ci}. The application of the MINERful plug-in for the D ECLARE-model discovery can be customised through four parameters, namely: Support. It is the number of fulfilments divided by either (i) the number of traces in the log, in the case of existence constraints like Init(a), or (ii) the number of occur- rences of the activations (in the case of relation constraints like Response(b, c)). In the example log, the support of Init(a) is 1.0, because all traces start with a, whereas the support of Response(b, c) is 0.8, as 4 b’s out of 5 fulfil the constraint. Confidence. It is the product of the support and the fraction of traces in the log where either (i) the constrained activity occurs (existence constraints), or (ii) the activation occurs (relation constraints). The confidence of Init(a) is 1.0 · 1.0 = 1.0 and the confidence of Response(b, c) is 0.8 · 0.75 = 0.6, since b occurs in 3 traces out of 4. Interest Factor. It is the product of confidence and the fraction of traces in the log where either (i) the constrained activity occurs (existence constraints), or (ii) the tar- get occurs (relation constraints). The interest factor of Init(a) is 1.0 · 1.0 · 1.0 = 1.0, and the interest factor of Response(b, c) is 0.8 · 0.75 · 1.0 = 0.6, since c occurs in all traces. Skip Negative Constraints. When the process is characterised by parts with a rigid structure, the discovered model may blow up in term of presence of negative con- straints. Therefore, analysts are provided with an option to not considering negative constraints, thus increasing the readability of the discovered models. 2.2 Output Initially, the MINERful plug-in was executed skipping the negative constraints and using the following values for the other parameters: (i) support = 0.50, (ii) confi- dence = 0.00, (iii) interest factor = 0.00. The resulting declarative process model can be seen in Fig. 1. The output view consists of two panels. The panel on the left-hand side contains the mined declarative process model. The user is free to relocate activities and constraints to manually improve the readability. The panel on the right-hand side allows the user to adjust the four parameters mentioned in Section 2.1 (see the area delimited by a black 5 http://dx.doi.org/10.4121/uuid:270fd440-1057-4fb9-89a9-b699b47990f5 Fig. 1: The resulting output screen from MINERful’s where the model has been discov- ered by setting support to 0.5 and setting confidence and interest factor to 0. rectangle in the figure). After the adjustment, the user can click on button Regenerate Model to mine a new model with the new values set for those parameters. The model in Fig. 1 has been obtained by assigning value 0 to all parameters, ex- cept for support. This configuration has produced a cluttered declarative process model with many constraints, i.e. the model is probably overfitting the event log. The increase of the value of any parameter would generate a model with fewer constraints, thus probably reducing the overfitting problems and, also, improving the readability of the declarative process model. Of course, an excessive increase of any parameter may have a detrimental effect on the precision of the discovered model: the model may underfit the event log, allowing for too much behaviour. The declarative process model shown in Fig. 2 derives from the application of the following parameters: (i) support = 0.70, (ii) confidence = 0.30, (iii) interest factor = 0.00. A screencast illustrating the function- ing of the MINERful plug-in is available at https://svn.win.tue.nl/repos/ prom/Documentation/DeclareMinerFul/screencast.mp4. 2.3 Tool Maturity The process models were discovered using a laptop equipped with an Intel Core i3 with 4GB of RAM. With this modest hardware, the MINERful plug-in was able to mine the model in less than 30 seconds using a real-size event log with 561,470 events belonging to 150,370 traces. This indicates that the MINERful plug-in has reached a Fig. 2: The resulting output screen from MINERful’s where the model has been discov- ered by setting support, confidence and interesting factor to 0.7, 0.3 and 0, respectively. The resulting model is certainly simpler but may be underfitting, hence not very precise. large degree of maturity as it performs extremely well in terms of scalability. Also, the plug-in is integrated with the entire repertoire of techniques that are already available in ProM (see, e.g., [4]): the mined model can thus be later used for conformance checking, bottleneck analysis, improvement, and more. Acknowledgements. The work of Dr. Di Ciccio and Dr. de Leoni has received fund- ing from the EU Seventh Framework Programme under grant agreement 318275 (GET Service) and grant agreement 603993 (CORE), respectively. References 1. van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Busi- ness Processes. Springer (2011) 2. van der Aalst, W.M.P., Pesic, M.: DecSerFlow: Towards a truly declarative service flow lan- guage. In: WS-FM. pp. 1–23 (2006) 3. Di Ciccio, C., Mecella, M.: On the discovery of declarative control flows for artful processes. ACM Trans. Manage. Inf. Syst. 5(4), 24:1–24:37 (2015) 4. Maggi, F.M.: Declarative process mining with the Declare component of ProM. In: BPM Demos. CEUR Workshop Proceedings (2013) 5. Pichler, P., Weber, B., Zugal, S., Pinggera, J., Mendling, J., Reijers, H.A.: Imperative versus declarative process modeling languages: An empirical investigation. In: BPM Workshops. pp. 383–394 (2011)