Log Skeleton Filter and Browser Eric Verbeek Department of Mathematics and Computer Science Eindhoven University of Technology Eindhoven, The Netherlands Email: h.m.w.verbeek@tue.nl Abstract—In the area of process mining, an event log is a The remainder of this paper is organized as follows. Sec- crucial artifact. However, existing event log visualizers do not tion II introduces the three main parts of the tool using a really help us in gaining many useful insights into the underlying noisy example event log. Section III introduces the part of the process, that is, into the process that generated this event log. This paper introduces the Log Skeleton Filter and Browser tool, tool that visualizes the event log at hand using a so-called log which allows us to gain many such insights. This tool can handle skeleton. Section IV introduces the browser controls, which noisy event logs, and still show the structure of this event log and can be used by us to change the view on the current log its underlying process, that is, the skeleton that supports the log skeleton. Section V introduces the filter controls, which can and the process. be used by us to filter the event log before the log skeleton is derived from it. Section VI concludes the paper. I. I NTRODUCTION II. T HE TOOL In the area of process mining [1], event logs are crucial. Fig. 1 showcases the Log Skeleton Filter and Browser using For process discovery, one needs an event log to discover a an event log from [3] containing 12 activities and 10% noise. process model from. For process conformance, one needs an The center part of the Log Skeleton Filter and Browser event log to replay on the process model at hand. For process visualizes the selected log skeleton, which is explained in enhancement, one needs an event log to enhance the process Section III. The rightmost part contains the browser controls, model at hand. which are explained in Section IV. The leftmost part contains However, an event log visualizer that allows us to gain the filter controls, which are explained in Section V. insights into the process model that underlies the event log, is The tool has been implemented as a visualizer plug-in for not yet available. Typically, the existing event log visualizers event logs in ProM [4]. In this paper we have used the ProM visualize an event log by visualizing traces in some way. Nightly Build1 of December 3rd, 2018, with version 6.9.86 of Although this may be useful in itself, it does not really help the LogSkeleton package installed. us in understanding the underlying process model. At the same time, the two top submissions of the process III. L OG S KELETONS discovery contest of 2017 [2] have both used an approach A. Activities which involved a human. The winner of that contest used The log skeleton shown in Fig. 1 shows in total 14 different an interactive approach, where an experienced user would activities: the 12 activities as found in the log (b, . . . , k, E, interactively construct a process model from an event log using and S) and two artificial activities which denote the start (|>) some tools. The runner-up approach used an earlier version of and the end ([ ]) of a trace. the tool presented in this paper. Both submitted approaches Activity S occurs 977 times in the entire event log, and performed much better than any submitted approach that either may not occur or occurs once (0..1) in every trace. included only automated discovery. But if we need to keep Furthermore, S is considered to be equivalent to E, which a human in the loop, then we should provide that human with means that they tend to occur equally often in every trace. tools to support him. The activities |>, E, and [ ] are also equivalent to E, and E Therefore, this paper presents an event log visualization tool serves as the ringleader for this equivalence class. As we see called Log Skeleton Filter and Browser that can help us in that S occurs 977 times and E occurs 975 times, we know that understanding the structural properties of the log, which in turn they do not occur equally often in every trace. The selected can help in understanding the system that generated the log. noise level determines (5%, in this case) how much differences This tool takes the activities as present in the event log together are accepted. Basically, this noise level guarantees that the with some constraints between them as main artifacts, builds accumulated occurrence count differences of both activities in a so-called log skeleton from these activities and constraints, all traces is at most 50 (5% of 1000 traces). Clearly, if the and visualizes this log skeleton. The tool allows us to filter the noise level is set to 0%, then no differences are allowed for event log before creating the log skeleton, and to browse the them to be equivalent. current log skeleton that was created from the filtered event log. 1 See http://www.promtools.org/doku.php?id=nightly. Fig. 1. An example event log viewed with the Log Skeleton Filter and Browser. Equivalent activities use the same background color. As a • c is a not-co-existence of d for all occurrences of c and result, if two activities share the same background color, they vice versa. This means that c and d never occur in the are considered to be equivalent. As an example, in Fig. 1 f, same trace. g, h, i, and k are equivalent, with f as their ringleader. This Before the (not) response and (not) precedence relations are suggests that in the process model that underlies the event log visualized, a transitive reduction is performed on them. As a these activities occur equally often. result, although no explicit response constraint from f to i is shown, implicitly it is there because of the response constraints B. Constraints from f to g and from g to i. For the not response and not The log skeleton shown in Fig. 1 shows 5 different kinds precedence relations, prior to this reduction, the identity and of Declare [5] constraints between the activities: response not-co-existence relations are removed. (precedence) (a blue arrow at the blue tail (head) of an arc), Our way of visualizing the constraints differs from the way not response (not precedence) (a red arrow with a red bar at they are visualized by Declare. In Declare, a dot is used to the red head (tail) of an arc), and not co-existence (an ocher bar indicate the ’point of view’ for some constraint. In our tool, at the head and/or tail of an ocher line). However, in contrast this role is simply taken by the symbol (arrow, bar, or arrow with the original Declare constraints, the constraints also take with bar) put either on the tail or the head. As a result, the the noise level into account. symbols replace the dots, but also indicate the types of the As examples: constraints. This saves space (and clutter) on the arcs, • E is a response of k in 99% (.99) of all occurrences of k. The log skeleton also shows that several activities do not This means that 99% of all occurrences of k are followed occur in the same trace: f on the one hand, and b, c, and d on in the same trace by some occurrence of E. the other hand, and b on the one hand and c on the other hand. • S is a precedence of f in 98% (.98) of all occurrences This indicates that there are two choices in the process model: of f. This means that 98% of all occurrences of f are A first between the f, . . . , i, and k on the one hand, and b, preceded in the same trace by some occurrences of S. . . . , e, and j on the other hand, and a second between the c • k is a not-response of E in 99% of all occurrences of and e on the one hand and d on the other hand. Note that the E. This means that 99% of all occurrences of E are not numbers of occurrences of these activities also indicate this, followed in the same trace by some occurrence of k. as 455 (f) + 520 (b) ≈ 977 (S), and 254 (c) + 274 (d) ≈ • f is a not-precedence of S in 99% (.99) of all occurrences 520 (b). of S. This means that 99% of all occurrences of S are not Although the example at hand does not show this, the preceded in the same trace by some occurrence of f. not co-existence constraint can also take noise into account. C. Noise Level This allows us to vary the noise level from 0% to 20%. Allowing a noise level of 50% or more makes hardly any sense, as this could allow multiple contradicting constraints between two activities. As such, we have chosen a maximal noise level of 20%, as this is still some distance away from this problematic 50% level. Fig. 2 shows the usefulness of this Fig. 2. The example event log viewed with the Log Skeleton Filter and slider, as it shows the log skeleton obtained from the event log Browser with noise level set to 0. with noise level set to 0. Obviously, the log skeleton shown by Fig. 1 provides us with more insights about the event log as this log skeleton. Assume, for the sake of convenience, that the not co-existence constraint between b and f has .98 at the b-end and .95 at the D. Options f-end of the constraint. Then 98% of all occurrences of b occur 1) Use Hyper Arcs (may be slow...): This allows us to in traces where f does not occur, and 95% of all occurrences visualize a clique of identical (not) response/precedence con- of f occur in traces where b does not occur. Note that as a straints by a single hyper (not) response/precedence constraint. result the symmetry of the not co-existence constraint may be In some cases this may significantly reduce clutter in the log broken. For example if we select a noise level of 3%, then b skeleton as shown. As indicated, this option may make the will have a not co-existence constraint to f, but not vice versa. tool very slow, as it uses a simple recursive algorithm to detect maximal cliques of identical constraints. 2) Use False Constraints: This allows us to ignore the not C. Configuration co-existence constraints (which have no rela direction) when At the bottom of the log skeleton, its configuration is shown. laying the log skeleton out. This shows us the main settings that were used to obtain the 3) Use Edge Colors: This allows us to use colors for shown log skeleton from the event log at hand. the constraint edges. If selected, the tail/head of a re- sponse/precedence constraint is colored blue, the tail/head of a not precedence/not response constraint is colored red, and the IV. T HE B ROWSER tail/head of a not co-existence constraint is colored ocher. If The browser controls at the right hand side of the log the constraints are there because of the noise level, the color skeleton provide us with means to change the visualization will be lighter. 4) Use Equivalence Class: This allows us to show only of the current log skeleton. The visualization can be changed the non co-existence constraints for the ringleaders of the by selecting different activities to show, selecting different equivalence classes. In Fig. 1, this option is selected. If constraints to show, a slider for the allowed noise level, not selected, the not-co-existence relation between k and e and a number of visualization options that may be handy. will also be shown, and many others as well. As equivalent Furthermore, the browser comes with a View Log Skeleton activities typically share the same not co-existence constraints, in New Window button that allows us to open a new window there is no need to show them all. with the current visualization. This allows us to, say, compare 5) Use Head/Tail Labels: This allows us to position the different log skeletons easily. tail and head labels (like .98) on the constraint heads and tails. If selected, they are positioned on the tail and head. If A. View Activities not selected, they are positioned near the middle of the edge, like .98→.97 for the response/precedence edge between f and This allows us to select which activities are shown. By g (see Fig. 1). default, all activities are pre-selected. 6) Show Neighbors: This allows us to visualize the neigh- bors of selected activities as well. A neighbor of an activity is B. View Constraints any other activity that has some selected constraint from or to that activity. This allows us to quickly inspect the constraints This allows us to select which constraints are shown. By of a selected collection of activities. As an example, in the default, the (not) response/precedence constraints are all pre- example log skeleton, if we only select |> with this option selected, and the not-co-existence constraint if the number of selected, then S would also be shown (be it without border), these constraints does not exceed 100. The reason for the latter which indicates that S typically occurs first. We can then is that the layout algorithm used for the log skeletons may be- extend the selection with S, which causes b and f to be shown, come very slow if many constraints are present. Because of the etc. possible transitive reduction on the (not) response/precedence constraints, the chances of having many of these constraints V. T HE F ILTER is considerably lower than the chances of having many not The filter controls at the left hand side allow us to filter co-existence constraints. the log prior to creating the log skeleton from it. Typically, a positive noise level to be detected, but with an appropriate noise level the same constraints were detected, and the same equivalence classes for the activities. The tool has been used by the author to gain insights into the 10 training event logs of the process discovery contest of 2019 [6]. See, for example, this screencast2 , which explains (to some extent) how the model was discovered for training log 4 using the tool, which included some splitting of recurrent activities. Fig. 3 show some insights for this log of this contest, which indicates that activities a and a choice between ai and (some occurrence of) aj can be executed concurrently, after Fig. 3. Some insights for the process discovery contest of 2019. which the activities ak and v can be executed concurrently. However, for some other event logs, using only this tool these controls will result in a different log skeleton. Unlike the did not provide sufficient insights. These event logs typically browser controls, which take into effect as soon as we change contained recurrent activities, which could not be split suc- them, we need to select the Apply Settings button, which will cessfully with the tool, as it can only split a recurrent activity create and show the new log skeleton using the filter settings if there is a certain activity in-between the occurrences of the as provided. recurrent activity. For the process discovery contest of 2017, this was sufficient, but for the contest of 2019 it was sometimes A. Required Activities not. This shows that other ways of splitting recurrent activities This allows us to select the traces that contain all of the could be very interesting as future work. selected activities. If one or more of the selected activities In the end, the models discovered by the author from does not occur in a trace, the entire trace will be filtered out. the training event logs of the process discovery contest of 2019 were able to classify 898 of the 900 test traces cor- B. Forbidden Activities rectly. Hence, the models achieved a classification accuracy This allows us to select the traces that contain none of the of 99.78% on the final test logs. This made the author’s selected activities. If one or more of the selected activities submission to this contest the best-classifying submission, and occurs in a trace, the entire trace will be filtered out. also shows that many useful insights were obtained using This can be combined with the required activities. Only the tool on the training event logs. The two traces that were traces that contain all of the selected required activities and misclassified belong to models 9 and 10, that indeed contain none of the selected forbidden activities will be filtered in. recurrent activities. C. Activity Splitters ACKNOWLEDGMENT This allows us to split activities into multiple activities, The authors would like to thank the organizers of the process which can be useful in case of recurrent activities. In the discovery contests, as without these contests, this tool would leftmost field of a row, we can specify the activity he wants not have been. to split. In the rightmost field, we can specify the activity he R EFERENCES wants to split over. [1] W. M. P. v. d. Aalst, Process Mining: Data Science in Action, 2nd ed. As an example, assume that we want to split f over b. Every Springer-Verlag, 2016. occurrence of f is then renamed to either f.0 (if in the trace [2] J. Carmona, M. de Leoni, B. Depaire, and T. Jouck. (2017) Process b does not precede this occurrence of f) or f.1 (if in the trace discovery contest 2017. [Online]. Available: http://www.win.tue.nl/ ieeetfpm/doku.php?id=shared:process discovery contest b does precede this occurrence of f). We can also split an [3] L. Maruster, A. J. M. M. Weijters, W. M. P. v. d. Aalst, and A. v. d. activity f over itself: The first occurrence of f in a trace is Bosch, “A rule-based approach for process discovery: Dealing with noise renamed f.0, all other in that trace are renamed f.1. and imbalance in process logs,” Data Min. Knowl. Disc., vol. 13, no. 1, pp. 67–87, 2006. VI. C ONCLUSION [4] H. M. W. Verbeek, J. C. A. M. Buijs, B. F. v. Dongen, and W. M. P. v. d. Aalst, “ProM 6: The process mining toolkit,” in Proc. of BPM This paper has introduced the Log Skeleton Filter and Demonstration Track 2010, vol. 615. CEUR-WS.org, 2010, pp. 34–39. [Online]. Available: http://ceur-ws.org/Vol-615/paper13.pdf Browser tool, that can be used by a user to gain insights into [5] W. M. P. v. d. Aalst, M. Pesic, and H. Schonenberg, “Declarative the process underlying the event log at hand. Using the tool, workflows: Balancing between flexibility and support,” Computer Science important relations between activities can be concluded by us, - Research and Development, vol. 23, pp. 99–113, 2009. [6] J. Carmona, M. de Leoni, and B. Depaire. (2019) Process which allows him to understand the process much better. discovery contest 2019. [Online]. Available: https://icpmconference. The tool can handle noisy event logs. From a noisy event org/process-discovery-contest log, a log skeleton was obtained that was a spitting image of the log skeleton that was obtained from the noise-free sibling event log. Of course, the numbers of occurrences of activities in the event log were different, and some constraints required 2 See https://www.win.tue.nl/∼hverbeek/downloads/ICPM2019Demo.mp4