Taking on Infrequent Behavior in Event Logs using Hypothesis Tests Patrizia Schalk1 , Lisa Petrak1 1 University of Augsburg, Universitätsstraße 6a, 86159 Augsburg Abstract When an event log is generated based on the real-life data of an existing process, there is a high proba- bility that apart from events that happen frequently and therefore should be represented in the process model, infrequent behavior is featured in the event log that would make a model generated from this log hardly readable. There are many process discovery algorithms that work with such outliers by filtering infrequent behavior just before or during the creation of an appropriate process model. Less common methods make use of statistical tests to detect infrequent behavior in a preprocessing step. This paper aims to simplify the use of hypothesis tests introduced by Petrak et al. and proposes an approach to delete infrequent behavior without generating unwanted side effects for the process model. Further- more, we solve a problem regarding the use of hypothesis tests for a process that contains loops and compare the results of hypothesis tests to well-known discovery techniques. Keywords Process Mining, Process Discovery, Filtering Outliers, Detection of Infrequent Behavior, Hypothesis Test, Directly Follows Graph, Loop Shortening 1. Introduction Finding mathematical models for real-life processes is a highly relevant task, since such models can easily be analyzed in a way that shows exactly where the process has its weaknesses and where it can be optimized. However, constructing such a process model by hand leads to results that describe what the process should look like instead of how it actually looks. Therefore, it is highly desired to create such models automatically and analyze them afterwards, so there is no bias present in the model. To automatically construct such a model, a record of the behavior of the process, e.g. in the form of an event log, is needed. However, there are several challenges that make it hard to construct a model from an event log, especially if such a model has to meet some criteria like being readable. One example of such a challenge is the existence of outliers in the event log – data that is present in the log but is so infrequent that it does not belong to the important parts of a process. Related to the existence of outliers is the existence of noise – data in the event log that cannot occur in the process. There are several causes for noise, such as human error ("I meant to log a different task than I actually did") or technical failure ("A part of the system shut down and didn’t log a status during a period of time"). While noise is erroneous behavior and should not be featured in the event log, outliers can very well be part of the actual behavior of the process. Algorithms & Theories for the Analysis of Event Data (ATAED) 2022 patrizia.schalk@informatik.uni-augsburg.de (P. Schalk); lisa.petrak@informatik.uni-augsburg.de (L. Petrak) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) However, removing edge cases of the event log promises to make the remaining behavior and therefore a suitable process model more clear and readable. Finding such a reasonable model despite the existence of outliers in the event log is the task of process discovery. Most such techniques handle noise by using heuristics or by setting thresholds for the maximum amount of behavior that may be deleted [1, 2, 3, 4, 5, 6]. Fewer discovery techniques make use of statistical theory to detect infrequent behavior in event logs [7]. As mentioned before, a discovered model needs to meet some quality measures. For example, we want a process model that is able to replay most of the behavior present in the event log (but not everything, since infrequent behavior should be excluded) – this trait is measured by the fitness of a process model. However, fitness alone is not enough for a good process model, since we can accomplish perfect fitness by simply allowing every behavior, even behavior that has nothing to do with the process. Therefore, another quality measure is the precision, which ensures that the process model cannot replay way more than is present in the event log. The notion of generalization describes how open the model is for reasonable extensions, i.e. behavior that was not recorded in the log but can happen in the process. Finally, a process model should be simple and easy to read, so that a human can understand and analyze it – this trait is measured by the simplicity of the discovered process model. More information on quality measures for process models can be found in [8]. In this paper, we continue research on the question of how to discover a "good" process model for the main behavior of an event log by filtering out infrequent behavior. The idea to use hypothesis tests to find such outliers in an event log was presented in [7], where one-sided hypothesis tests are used to decide whether the direct neighborhood of two events should be classified as main or infrequent behavior. However, the approach of said paper creates unwanted side effects by deleting single events out of a trace and therefore creating new direct neighborhood relations that might be simply wrong. We present a new approach without this problem that is graphical, therefore easy to follow, and gives the user more power over the result he wishes to get. We first lay the foundation for our approach by defining needed notions in Section 2 and revisit the general idea to use hypothesis tests to find infrequent behavior in Section 3 by simplifying the used techniques described in [7] and investigating a running example that is small and easy to follow. In Section 4 we propose to use a Directly Follows Graph to represent the directly follows relations between events after we detected infrequent direct neighbors using hypothesis tests and show how we can delete infrequent behavior without risking side effects. We address a problem concerning loops that feature many iterations in the event log in Section 5 and propose a simple solution that is based on the well-studied Chinese Postman Problem [9]. Section 6 evaluates the results by comparing the accomplished fitness, precision, generalization and simplicity with those of well-known discovery techniques, namely the Inductive Miner infrequent [6] and the Directly-Follows Miner [5], with which our approach shares similar ideas. Section 7 concludes the paper. 2. Basic Definitions We denote the set of natural numbers {1, 2, 3, . . . } by N and define N0 := N ∪ {0}. For an arbitrary set 𝑇 we call 𝑚 : 𝑇 → N0 a multiset over 𝑇 . For any 𝑎 ∈ 𝑇 , 𝑚(𝑎) is the number of occurences of the element 𝑎 in the multiset 𝑚. We write 𝑎 ∈ 𝑚 if this number is greater than zero, i.e. 𝑎 ∈ 𝑚 :⇔ 𝑚(𝑎) > 0. We also write a finite multiset 𝑚 over 𝑇 with elements 𝑚(𝑎 ) 𝑚(𝑎 ) 𝑎1 , . . . , 𝑎𝑛 in the form [𝑎1 1 , . . . , 𝑎𝑛 𝑛 ]. Definition 1 (Event, Trace, Event log [10]). Let 𝑇 be a set of activity names. A sequence of activities 𝜎 = ⟨𝑡1 , . . . , 𝑡𝑛 ⟩ ∈ 𝑇 * is called a trace. An activity 𝑎 ∈ 𝑇 is contained in 𝜎 if it occurs in 𝜎 at any time, i.e. 𝑎 ∈ 𝜎 :⇔ 𝜎 = ⟨𝑡1 , . . . , 𝑡𝑛 ⟩ ∧ ∃𝑖 ∈ {1, . . . , 𝑛} : 𝑡𝑖 = 𝑎. For an arbitrary trace 𝜎 = ⟨𝑡1 , . . . , 𝑡𝑛 ⟩ ∈ 𝑇 * with 𝑛 ≥ 1 we define 𝑓 𝑖𝑟𝑠𝑡(𝜎) := 𝑡1 and 𝑙𝑎𝑠𝑡(𝜎) := 𝑡𝑛 . An event log 𝐿 : 𝑇 * → N0 over 𝑇 is a multiset of traces. The occurrence of an activitiy 𝑎 ∈ 𝑇 in 𝐿 is called an event. Regarding a real-life process, we give every event that takes place a name, which can be an artificial one (like 𝑎, 𝑏, 𝑐, . . . ) or a descriptive one (like 𝑡𝑎𝑘𝑒_𝑜𝑟𝑑𝑒𝑟). A trace then describes a sequence of events that happened for a special case, e.g. for a customer or a general object. Definition 2 (Ordering relations [10]). Let 𝐿 be an event log over a set of activities 𝑇 . For any 𝑎, 𝑏 ∈ 𝑇 we introduce the following binary causal relations on 𝑇 : • 𝑎 >𝐿 𝑏 if and only if 𝑎 is directly followed by 𝑏 somewhere in a trace 𝜎 in 𝐿, i.e. if a trace 𝜎 = ⟨𝑡1 , . . . , 𝑡𝑛 ⟩ ∈ 𝐿 and a number 𝑖 ∈ {1, . . . , 𝑛 − 1} exist, with 𝑡𝑖 = 𝑎 and 𝑡𝑖+1 = 𝑏 • 𝑎 →𝐿 𝑏 if and only if 𝑎 >𝐿 𝑏 and 𝑏 ≯𝐿 𝑎 • 𝑎 ←𝐿 𝑏 if and only if 𝑎 ≯𝐿 𝑏 and 𝑏 >𝐿 𝑎 • 𝑎 #𝐿 𝑏 if and only if 𝑎 ≯𝐿 𝑏 and 𝑏 ≯𝐿 𝑎 • 𝑎 ‖𝐿 𝑏 if and only if 𝑎 >𝐿 𝑏 and 𝑏 >𝐿 𝑎. From these ordering relations >𝐿 will be of great interest for us, since we want to investigate the neighborhood of two events and decide whether two events directly following each other happen often (i.e. main behavior) or rarely (i.e. infrequent behavior). To do so, we need to know how often two events follow each other, which we can store in the Correlation Matrix: Definition 3 (Correlation matrix). Let 𝐿 be an event log over 𝑇 with 𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑 ̸∈ 𝑇 . Fur- ther, take 𝑇𝑆 := 𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡}, 𝑇𝐸 := 𝑇 ∪ {𝐸𝑛𝑑} and 𝜆 := ⟨⟩, the empty trace. We denote the number of times 𝑎 ∈ 𝑇𝑆 is directly followed by 𝑏 ∈ 𝑇𝐸 in all traces contained in 𝐿 by |(𝑎, 𝑏)|>𝐿 . The correlation matrix for an event log 𝐿 is defined as the square matrix 𝐶 𝐿 : 𝑇𝑆 × 𝑇𝐸 → N0 with ⎧ ⎪ ⎪ ⎪ |(𝑎, 𝑏)|>𝐿 if 𝑎, 𝑏 ∈ 𝑇 ⎨∑︀ 𝐿(𝜎) if 𝑎 = Start, 𝑏 ∈ 𝑇 ⎪ 𝐿 𝐶𝑎,𝑏 = ∑︀𝜎∈𝐿∖{𝜆},𝑓 𝑖𝑟𝑠𝑡(𝜎)=𝑏 ⎪ ⎪ ⎪ 𝜎∈𝐿∖{𝜆},𝑙𝑎𝑠𝑡(𝜎)=𝑎 𝐿(𝜎) if 𝑎 ∈ 𝑇 , 𝑏 = End if 𝑎 = Start, 𝑏 = End. ⎪ ⎩𝐿(𝜆) Recall that 𝐿 is a multiset and hence, for any 𝜎 ∈ 𝐿, 𝐿(𝜎) denotes the number of times the trace 𝜎 is contained in 𝐿. Apart from the Correlation Matrix, we can store the information related to which events are directly followed by one another in the Directly Follows Graph, which is a simple directed graph containing all events as vertices. Two vertices 𝑢 and 𝑣 are connected by an edge (𝑢, 𝑣) if somewhere in the event log 𝑢 is directly followed by 𝑣. The weight on the edge shows how often this happens in the event log. Definition 4 (Directly Follows Graph). Let 𝐿 be an event log over a set of activities 𝑇 . The Directly Follows Graph (DFG) for 𝐿 is a weighted directed graph 𝐺 = (𝑉, 𝐸, 𝑤) with • 𝑉 := 𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑}, • 𝐸 := >𝐿 ∪ {(𝑆𝑡𝑎𝑟𝑡, 𝑡) | ∃𝜎 ∈ 𝐿 : 𝑡 = 𝑓 𝑖𝑟𝑠𝑡(𝜎)} ∪ {(𝑡, 𝐸𝑛𝑑) | ∃𝜎 ∈ 𝐿 : 𝑡 = 𝑙𝑎𝑠𝑡(𝜎)} and • 𝑤 : 𝐸 → N0 , 𝑤((𝑎, 𝑏)) := 𝐶𝑎,𝑏 𝐿 . An example for a Directly Follows Graph will be given in Figure 1 in Section 4, where we first need the DFG. A useful property of the DFG is that it can easily be translated to a language-equivalent Petri-net. Leemans et al. [5] showed that if the DFG is sound, the respective language-equivalent Petri-net is also sound. Definition 5 (Soundness of the Directly Follows Graph [5]). Let 𝐺 = (𝑉, 𝐸, 𝑤) be a DFG for an event log 𝐿. 𝐺 is sound if every node 𝑣 ∈ 𝑉 is on a path from 𝑆𝑡𝑎𝑟𝑡 to 𝐸𝑛𝑑, i.e. ∀𝑣 ∈ 𝑉 : ∃𝑢1 , . . . , 𝑢𝑛 : 𝑢1 = 𝑆𝑡𝑎𝑟𝑡 ∧ 𝑢𝑛 = 𝐸𝑛𝑑 ∧ ∃𝑗 ∈ {1, . . . 𝑛} : 𝑢𝑗 = 𝑣 ∧ ∀𝑖 ∈ {1, . . . , 𝑛 − 1} : (𝑢𝑖 , 𝑢𝑖+1 ) ∈ 𝐸 As a running example for this paper we take a set of activities 𝑇 = {𝑎, 𝑏, 𝑐, 𝑑} and an artificial event log 𝐿 which is defined as 𝐿 = [⟨𝑎, 𝑏, 𝑐, 𝑏⟩100 , ⟨𝑎, 𝑐, 𝑏⟩50 , ⟨𝑑, 𝑏, 𝑑⟩100 , ⟨𝑏, 𝑒⟩1000 , ⟨𝑑, 𝑒⟩1000 , ⟨𝑓, 𝑔, 𝑓, 𝑔, 𝑓, 𝑔⟩100 ]. To construct the Correlation Matrix, we simply count how often an event 𝑒1 ∈ 𝑇 is followed by an event 𝑒2 in the event log and add this quantity to the Correlation Matrix in row 𝑒1 and column 𝑒2 . 𝐸𝑛𝑑 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑆𝑡𝑎𝑟𝑡 0 150 1000 0 1100 0 100 0 𝑎 0 0 100 50 0 0 0 0 𝑏 150 0 0 100 100 1000 0 0 𝑐 0 0 150 0 0 0 0 0 𝑑 100 0 100 0 0 1000 0 0 𝑒 2000 0 0 0 0 0 0 0 𝑓 0 0 0 0 0 0 0 300 𝑔 100 0 0 0 0 0 200 0 The artificial event 𝑆𝑡𝑎𝑟𝑡 (𝐸𝑛𝑑) is used to record how often an event 𝑒 ∈ 𝑇 is the first (last) event in a trace of 𝐿. The Correlation Matrix is very handy to calculate values needed for the execution of hypothesis tests, which are described in the next section. 3. Hypothesis Tests Rather recently, Petrak et al. [7] showed in their article that hypothesis tests can be used to determine whether the direct neighborhood of two events that was observed in the event log should be classified as infrequent behavior, i.e. noise, or not. We shortly revisit this idea, simplify it slightly, and give some simple examples that show how this method differs from simply considering the direct neighborhood that is the least frequent in the event log as noise. Generally speaking, hypothesis tests can be used to check whether a certain hypothesis is valid with a "sufficiently high" probability. Such a hypothesis makes a statement about a certain property of the objects in a population. As a simple example, take the set of all researchers in the field of process mining as the population and the statement "At least 80% of researchers in the field of process mining like process discovery". Since this is a statement that can’t be proved or falsified without asking every single researcher in the population (which would be a rather costly operation) one could easily doubt the correctness of this statement. So next to the already stated Null-Hypothesis 𝐻0 , we formulate an Alternative Hypothesis 𝐻1 stating the contrary: "Less than 80% of researchers in the field of process mining like process discovery". To check which of the two hypotheses is correct, we take a small sample of the population and check how many researchers of this population like process discovery. We then perform a left-sided hypothesis test to check whether this sample implies that the Null-Hypothesis 𝐻0 or the Alternative Hypothesis 𝐻1 is true. In general, let 𝑝 ∈ [0, 1] be an unknown probability (in our example the actual percentile of researchers who love process discovery) and 𝑝0 ∈ [0, 1] a parameter given by the user (in our example 𝑝0 = 0.8). Using data from a sample of the population, a left-sided hypothesis test can be used to decide whether 𝑝 ∈ [0, 𝑝0 [ or 𝑝 ∈ [𝑝0 , 1] is true, i.e. whether the Null Hypothesis 𝐻0 : 𝑝 ≥ 𝑝0 or the Alternative Hypothesis 𝐻1 : 𝑝 < 𝑝0 is true. Since a hypothesis test comes to this decision based on a sample of the full data, there is a certain probability that the wrong decision will be made. The probability that this happens can be estimated, making it possible to formulate rather reliable statements about which of the two hypotheses holds. The probability that we chose 𝐻1 when in reality 𝐻0 is true is called the 𝛼-error; the probability that we chose 𝐻0 when in reality 𝐻1 is true, is called 𝛽-error. In the context of this paper, we are especially interested in the 𝛼-error, which can be bounded by a given 𝛼 when using hypothesis tests. We then want to find a value 𝑘 such that for a random variable 𝑋 (which can be understood as the number of elements in a random sample that fulfill the property defined in 𝐻0 ) 𝑃 (𝑋 ≤ 𝑘 | 𝐻0 is true) ≤ 𝛼 is true. The values of 𝑘 for which this inequality holds can be determined by using the density-function of the given probability distribution. However, since this computation is quite time-consuming, we use an approximation for 𝑘 √︀ if said probability distribution is a binomial distribution and the standard-deviation 𝜎𝑛 := 𝑛 · 𝑝0 · (1 − 𝑝0 ) satisfies 𝜎𝑛 > 3. In this case, we approximate 𝑘 by 𝑘 = ⌈𝑛𝑝0 − 𝜎𝑛 · 𝑢1−𝛼 ⌉, (1) where 𝑢1−𝛼 is the (1 − 𝛼)-quantile of the standard normal distribution. A more formal descrip- tion of hypothesis tests can be found in [11]. To detect noise in an event log, Petrak et al. [7] understand the neighborhood relation between two events as a binomial distribution and use hypothesis tests to check whether the sighting of two events 𝑒1 , 𝑒2 ∈ 𝑇 directly following each other is main behavior (𝐻0 is true) or infrequent behavior (𝐻1 is true). To achieve this, they define the population as the pairs (𝑒, 𝑒′ ) ∈ 𝑇 that can follow each other in the process and where 𝑒 = 𝑒1 or 𝑒′ = 𝑒2 and view the event log as a sample where some of these direct neighborhoods were randomly drawn. Hence, the population is defined as: 𝑃(𝑒1 ,𝑒2 ) := {(𝑥, 𝑦) ∈ 𝑇𝑆 × 𝑇𝐸 | (𝑥 = 𝑒1 ∨ 𝑦 = 𝑒2 ) ∧ |(𝑥, 𝑦)|>𝐿 > 0}, which defines the sample size 𝑛 as the number of events that can follow 𝑒1 or can be followed by 𝑒2 in the process: ∑︁ ∑︁ 𝑛 := |𝑃(𝑒1 ,𝑒2 ) | = |(𝑒1 , 𝑒)|>𝐿 + |(𝑒, 𝑒2 )|>𝐿 − |(𝑒1 , 𝑒2 )|>𝐿 . 𝑒∈𝑇𝐸 𝑒∈𝑇𝑆 With the user-given constants 1 − 𝑝0 and 𝛼 they then execute a right-sided hypothesis test, which means, for a pair of events (𝑒1 , 𝑒2 ), they estimate 𝑘 by ⌈𝑛𝑝0 − 𝜎𝑛 · 𝑢1−𝛼 ⌉ and decide for 𝐻0 when 𝑛 − |(𝑒1 , 𝑒2 )|>𝐿 < 𝑘. Since this is equivalent to estimate 𝑘 by the formula in Equation 1 and deciding for 𝐻0 when |(𝑒1 , 𝑒2 )|>𝐿 > 𝑘, we can instead perform a left-sided hypothesis test with the probability 𝑝0 as described at the start of this section. With this idea we can calculate the critical value 𝑘 for which 𝑃 (𝑋 ≤ 𝑘 | 𝐻0 is true) ≤ 𝛼 holds and check afterwards whether |(𝑒1 , 𝑒2 )|>𝐿 > 𝑘. If this is true, we accept the Null Hypoth- esis 𝐻0 and consider 𝑒2 directly following 𝑒1 as main behavior. Otherwise, we reject the Null Hypothesis, which means we accept the Alternative Hypothesis 𝐻1 and consider 𝑒2 directly following 𝑒1 as infrequent behavior. However, if 𝜎𝑛 ≤ 3, we need to calculate the critical value by evaluating the binomial distribution and finding a 𝑘 for which the integral of the density function from 0 to 𝑘 is less than 1 − 𝛼. For a more detailed description of this idea, see [7] and [11]. Using these ideas leads to the procedure shown in algorithm 1. For our running example, we set 𝑝0 = 0.05 (so we consider a pair of direct neighbors as infrequent if this behavior affects less than 5% of the population) and 𝛼 = 0.05 (so we accept that the probability for classifying main behavior as infrequent is at most 5%). We show the decision for whether a pair of direct neighbors is main or infrequent behavior on the following two examples: Check (𝑎, 𝑐) ∈>𝐿 : 𝐸𝑛𝑑 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑆𝑡𝑎𝑟𝑡 0 150 1000 0 1100 0 100 0 𝑛 = 250 𝑎 0 0 100 50 0 0 0 0 𝜎𝑛 ≈ 3.4 > 3 𝑏 150 0 0 100 100 1000 0 0 ⇒𝑘=7 𝑐 0 0 150 0 0 0 0 0 |(𝑎, 𝑐)|>𝐿 = 50 > 7 = 𝑘 𝑑 100 0 100 0 0 1000 0 0 𝑒 2000 0 0 0 0 0 0 0 ⇒ (𝑎, 𝑐) ∈>𝐿 is main 𝑓 0 0 0 0 0 0 0 300 behavior. 𝑔 100 0 0 0 0 0 200 0 Algorithm 1 Detecting main and infrequent neighbors Input: 𝑝0 ∈ [0, 1], 𝛼 ∈ [0, 1], 𝐿 : 𝑇 * → N0 Output: 𝑚𝑎𝑖𝑛, 𝑖𝑛𝑓 𝑟𝑒𝑞 ⊆ (𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑})2 𝑚𝑎𝑖𝑛 ← ∅ 𝑖𝑛𝑓 𝑟𝑒𝑞 ← ∅ 2 ) ∈ 𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑} for (𝑒1 , 𝑒∑︀ ∑︀ do 𝑛 ← 𝑒∈𝑇𝐸 |(𝑒1 , 𝑒)|>𝐿 + 𝑒∈𝑇𝑆 |(𝑒, 𝑒2 )|>𝐿 − |(𝑒1 , 𝑒2 )|>𝐿 √︀ 𝜎𝑛 ← 𝑛 · 𝑝0 · (1 − 𝑝0 ) if 𝜎𝑛 > 3 then 𝑘 ← ⌈𝑛𝑝0 − 𝜎𝑛 · 𝑢1−𝛼 ⌉ else 𝑠𝑢𝑚 = 0 while 𝑠𝑢𝑚 < 1 − 𝛼(︀ do 𝑠𝑢𝑚 ← 𝑠𝑢𝑚 + 𝑛𝑘 · 𝑝𝑘0 · (1 − 𝑝0 )𝑛−𝑘 )︀ 𝑘 ←𝑘+1 end while end if if |(𝑒1 , 𝑒2 )|>𝐿 > 𝑘 then ◁ 𝑘 > 0, so |(𝑒1 , 𝑒2 )|>𝐿 > 0. 𝑚𝑎𝑖𝑛 ← 𝑚𝑎𝑖𝑛 ∪ {(𝑒1 , 𝑒2 )} else 𝑖𝑛𝑓 𝑟𝑒𝑞 ← 𝑖𝑛𝑓 𝑟𝑒𝑞 ∪ {(𝑒1 , 𝑒2 )} end if end for return 𝑚𝑎𝑖𝑛, 𝑖𝑛𝑓 𝑟𝑒𝑞 It is notable that in this example the direct neighborhood of the events 𝑎 and 𝑐 has the lowest frequency, but the result of the hypothesis test implies that 𝑎 being directly followed by 𝑐 is main behavior. This is due to the fact that 𝑎 is only followed by 𝑏 or 𝑐 and 𝑐 is only preceded by 𝑎 or 𝑏. The number of sightings where 𝑏 precedes 𝑐 or 𝑏 follows after 𝑎 is rather small, so that the neighborhood between 𝑎 and 𝑐 can be considered as main behavior. Check (𝑏, 𝑑) ∈>𝐿 : 𝐸𝑛𝑑 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑆𝑡𝑎𝑟𝑡 0 150 1000 0 1100 0 100 0 𝑛 = 2450 𝑎 0 0 100 50 0 0 0 0 𝜎𝑛 ≈ 10.7 > 3 𝑏 150 0 0 100 100 1000 0 0 ⇒ 𝑘 = 105 𝑐 0 0 150 0 0 0 0 0 |(𝑏, 𝑑)|>𝐿 = 100 ≤ 105 = 𝑘 𝑑 100 0 100 0 0 1000 0 0 𝑒 2000 0 0 0 0 0 0 0 ⇒ (𝑏, 𝑑) ∈>𝐿 is infrequent 𝑓 0 0 0 0 0 0 0 300 behavior. 𝑔 100 0 0 0 0 0 200 0 Executing the hypothesis test for every pair of events leads to the result that the pairs (𝑏, 𝑑), (𝑏, 𝐸𝑛𝑑), (𝑑, 𝑏), (𝑑, 𝐸𝑛𝑑) and (𝑆𝑡𝑎𝑟𝑡, 𝑓 ), of direct neighbors are infrequent. Petrak et al. propose to simply delete these direct neighborhood relations from the footprint and use the footprint that was constructed in this way to mine a process model. 4. Using the results to construct a DFG We use the approach presented by Petrak et al., but construct a Directly Follows Graph (see Definition 4) instead of a footprint. Gaining a Petri-net from a DFG is rather simple, and we can easily check the soundness of a DFG by performing two Depth First Searches, which gives us the possibility to easily construct a DFG that is sound. By applying the hypothesis tests to the event log with Algorithm 1 we immediately get a set 𝑀 ⊆ (𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑})2 of direct neighbors that are part of the main behavior of the event log and a set 𝐼 ⊆ (𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑})2 of direct neighbors that are infrequent behavior. Obviously, no direct neighborhood can be both main and infrequent behavior, so 𝑀 ∩ 𝐼 = ∅. Let 𝐺 = (𝑉, 𝐸) be the DFG for our event log. Since we only add behavior to 𝑀 or 𝐼 if we have seen the direct neighborhood at least once in the event log, 𝑀 ⊆ 𝐸 and 𝐼 ⊆ 𝐸 hold. On the other hand, due to the definition of 𝐸, there is no pair of events in 𝑀 ∪ 𝐼 that is not present in 𝐸, since 𝐸 contains every pair of direct neighbors that was observed in the event log. Therefore, we can conclude that 𝑀 ∪ 𝐼 = 𝐸. The DFG for our running example is shown in Figure 1, the black edges are those of the set 𝑀 and the highlighted edges are those of the set 𝐼. 300 𝑓 𝑔 200 0 10 50 10 𝑎 𝑐 150 0 100 100 𝑠𝑡𝑎𝑟𝑡 1000 150 150 𝑏 𝑒𝑛𝑑 11 150 0 00 200 100 100 𝑑 𝑒 1000 100 Figure 1: The DFG for the event log of our running example 𝐿. Blue edges were classified as infrequent behavior, black edges were classified as main behavior by the hypothesis tests. When deleting edges from the DFG, we need to be careful, since the deletion of the whole set 𝐼 may lead to a result that is not sound according to Definition 5. This is also the case for the DFG in Figure 1, since without all the highlighted edges, the events 𝑓 and 𝑔 do not lie on a path from 𝑆𝑡𝑎𝑟𝑡 to 𝐸𝑛𝑑. A process model constructed directly from such a DFG is not sound, so the goal is to delete as many infrequent edges as possible without sacrificing the soundness of the DFG. Deleting every highlighted edge except the edges (𝑆𝑡𝑎𝑟𝑡, 𝑓 ) and (𝑔, 𝐸𝑛𝑑) leads to the result shown in Figure 2. 300 𝑓 𝑔 200 0 10 50 10 𝑎 𝑐 150 0 100 100 𝑠𝑡𝑎𝑟𝑡 1000 150 𝑏 𝑒𝑛𝑑 11 150 0 00 200 𝑑 𝑒 1000 Figure 2: The modified DFG for our running example log 𝐿, where infrequent edges that were not necessary for the soundness were deleted. In general, it is not always clear which set 𝐽 ⊆ 𝐼 should be deleted from the DFG, since the deletion of an edge 𝑒 could lead to a situation where other edges are needed for soundness, whereas said edges would not be needed if 𝑒 wasn’t deleted. Finding the "best" subset of 𝐼 to delete from the DFG is a difficult problem, since it is not clear when one subset is better than another. If we define a function 𝑓 : 𝒫(𝐼) → N0 which maps every subset of 𝐼 to a numerical value, where a high value indicates that the subset is a better candidate for deletion than another candidate with a lower value, we have an optimization problem at hand: Maximize 𝑓 (𝐽) where 𝐺′ = (𝑉, 𝐸 ∖ 𝐽) is a sound DFG and 𝐽 ⊆ 𝐼. As a default for 𝑓 we propose 𝑓 (𝐽) := |𝐽|, so that edge sets that contain more infrequent edges are preferred for deletion. A naive implementation that solves this problem can be found in Algorithm 2. Algorithm 2 Deleting infrequent edges from the DFG Input: 𝐺 = (𝑉, 𝐸), 𝑀, 𝐼 ⊆ 𝐸 with 𝑀 ∪ 𝐼 = 𝐸 and 𝑀 ∩ 𝐼 = ∅, 𝑓 : 𝒫(𝐼) → N0 Output: 𝐺′ = (𝑉, 𝐸 ′ ) 𝐽𝑜𝑝𝑡 ← ∅ for each 𝐽 ⊆ 𝐼 do 𝐺′′ ← (𝑉, 𝐸 ∖ 𝐽) if 𝑓 (𝐽) > 𝑓 (𝐽𝑜𝑝𝑡 ) and 𝐺′′ is sound then 𝐽𝑜𝑝𝑡 ← 𝐽 end if end for return (𝑉, 𝐸 ∖ 𝐽𝑜𝑝𝑡 ) This algorithm needs 𝑂(2|𝐼| · (|𝑉 | + |𝐸|)) time and is therefore exponential in time, but since the number of infrequent edges, and therefore the cardinality |𝐼| of 𝐼, is rather small in the most cases, we accept this naive implementation, since it allows for an easy exchange of the user-given function 𝑓 that should be optimized, and is easy to read. In our running example, this algorithm leads to the sound DFG shown in Figure 2. 5. Handling of Loops As can be seen in the example of Figure 1, loops in the DFG can have a high impact on the population for our hypothesis tests. In this example, executing the loop between 𝑓 and 𝑔 some times artificially increases the number of times when 𝑓 was preceded by another event than 𝑆𝑡𝑎𝑟𝑡 and the number of times when after 𝑔 followed another event than 𝐸𝑛𝑑. Even though the edges (𝑆𝑡𝑎𝑟𝑡, 𝑓 ) and (𝑔, 𝐸𝑛𝑑) are only encountered in the same trace as the edges (𝑓, 𝑔) and (𝑔, 𝑓 ), the repetition of this loop leads to the effect that (𝑆𝑡𝑎𝑟𝑡, 𝑓 ) and (𝑔, 𝐸𝑛𝑑) are classified as infrequent behavior. In the example of Figure 1 this is not a problem, since these two infrequent edges are not deleted for the sake of soundness. However, the following simple event log shows that we are not always in such a lucky situation: 𝐿⟳ = [⟨𝑎, 𝑏, . . . , 𝑏, 𝑑⟩10 , ⟨𝑎, 𝑏, 𝑐, 𝑑⟩40 , ⟨𝑎, 𝑐, 𝑑⟩100 ] ⏟ ⏞ 51 times Executing hypothesis tests on this log leads to the classification of the direct neighborhood between 𝑏 and 𝑑 as infrequent. Check (𝑏, 𝑑) ∈>𝐿⟳ : 𝐸𝑛𝑑 𝑎 𝑏 𝑐 𝑑 𝑛 = 690 𝑆𝑡𝑎𝑟𝑡 0 150 0 0 0 𝜎𝑛 ≈ 5.7 > 3 𝑎 0 0 50 100 0 ⇒ 𝑘 = 26 𝑏 0 0 500 40 10 |(𝑏, 𝑑)|>𝐿 = 10 ≤ 26 = 𝑘 𝑐 0 0 0 0 140 𝑑 150 0 0 0 0 ⇒ (𝑏, 𝑑) ∈>𝐿⟳ is infrequent behavior. Every other direct neighborhood is classified as main behavior by the hypothesis tests, so we get the DFG for ℒ that is shown in Figure 3. 500 50 𝑏 10 150 150 𝑠𝑡𝑎𝑟𝑡 𝑎 40 𝑑 𝑒𝑛𝑑 100 𝑐 150 Figure 3: The DFG for the event log 𝐿⟳ with the only infrequent edge highlighted in blue. Deletion of the highlighted edge would not violate the soundness of the DFG, but doing so would remove the entire behavior of 𝑎 being followed by one or more 𝑏s, which is followed by a single 𝑑. This is a problem, since the neighborhood (𝑏, 𝑑) would not be classified as infrequent behavior if the event log would feature fewer iterations of the 𝑏-loop: Check (𝑏, 𝑑) ∈>𝐿⟳ : 𝐸𝑛𝑑 𝑎 𝑏 𝑐 𝑑 𝑛 = 200 𝑆𝑡𝑎𝑟𝑡 0 150 0 0 0 𝜎𝑛 ≈ 3.08 > 3 𝑎 0 0 50 100 0 ⇒𝑘=5 𝑏 0 0 10 40 10 |(𝑏, 𝑑)|>𝐿 = 10 > 5 = 𝑘 𝑐 0 0 0 0 140 𝑑 150 0 0 0 0 ⇒ (𝑏, 𝑑) ∈>𝐿⟳ is main behavior. To eliminate this problem, we propose a preprocessing step for the hypothesis tests to reduce the number of loop iterations if there is a loop present in the process. To do so, we alter the traces of our event log that contain such a loop for the hypothesis tests. The idea is to construct a DFG for this single trace and find a path from 𝑆𝑡𝑎𝑟𝑡 to 𝐸𝑛𝑑 that cycles less frequently through the loop in the trace, if this is possible. To do so, we firstly introduce the notion of a Trace DFG: Definition 6 (Trace DFG). Let 𝜎 = ⟨𝑡1 , . . . , 𝑡𝑛 ⟩ be a trace. The Trace DFG for 𝜎 is a directed multigraph 𝐺 = (𝑉, 𝐸) with • 𝑉 := {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑} ∪ {𝑡1 , . . . , 𝑡𝑛 } and • 𝐸 : (𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑})2 → N0 , {︃ 1 if 𝑡1 = 𝑆𝑡𝑎𝑟𝑡 or 𝑡2 = 𝐸𝑛𝑑 𝐸((𝑡1 , 𝑡2 )) = |{𝑖 | 𝑡𝑖 = 𝑡1 ∧ 𝑡𝑖+1 = 𝑡2 }| otherwise For a trace 𝜎, the Trace DFG is therefore the DFG of the log [𝜎], where the edge weights of said DFG are interpreted as multiple edges. The Trace DFG for the trace ⟨𝑎, 𝑏, 𝑐, 𝑑⟩ for example is a simple path, the Trace DFG for the trace ⟨𝑎, 𝑏, . . . , 𝑏, 𝑑⟩ is shown in Figure 4. ⏟ ⏞ 51 times 50 1 1 1 1 𝑠𝑡𝑎𝑟𝑡 𝑎 𝑏 𝑑 𝑒𝑛𝑑 Figure 4: The Trace DFG for ⟨𝑎, 𝑏, . . . , 𝑏, 𝑑⟩ ∈ ℒ. Edge weights denote the multiplicity of edges, the artificial dashed edge (which is not part of the Trace DFG) makes the DFG eulerian. To reduce the number of loop iterations in the Trace DFG we first introduce a new, artificial edge from 𝐸𝑛𝑑 to 𝑆𝑡𝑎𝑟𝑡, as shown by the dashed dart in Figure 4. The resulting multigraph 𝐺 is eulerian, which means there is a eulerian cycle in 𝐺. Theorem 1. Let 𝜎 ∈ 𝑇 * be a trace and 𝐺 = (𝑉, 𝐸) the Trace DFG of 𝜎. Then, the graph 𝐺′ = (𝑉, 𝐸 ∪ {(𝐸𝑛𝑑, 𝑆𝑡𝑎𝑟𝑡)} is eulerian, i.e. contains a eulerian cycle. Proof. By construction of the Trace DFG, 𝜎 is a path from 𝑆𝑡𝑎𝑟𝑡 to 𝐸𝑛𝑑 that uses every edge in 𝐺 exactly once. With the artificial edge (𝐸𝑛𝑑, 𝑆𝑡𝑎𝑟𝑡) in 𝐺′ this path becomes a cycle that uses every edge in 𝐺′ exactly once, i.e. a eulerian cycle. Since this extended Trace DFG 𝐺′ is eulerian, we can find the shortest cycle through 𝐺′ that uses every type of edge in 𝐺′ at least once, but no more than the multiplicity of the edges allow. The problem of finding this shortest cycle is known as the Chinese Postman Problem, which was introduced by Edmonds et al. [9] and can be solved in 𝑂(|𝑉 |3 ) time. In the example for 𝐿⟳ , the shortest cycle is (𝑠𝑡𝑎𝑟𝑡, 𝑎, 𝑏, 𝑏, 𝑑, 𝑒𝑛𝑑, 𝑠𝑡𝑎𝑟𝑡), which cycles only once through the loop at the vertex 𝑏. From this cycle we can easily construct the shortened trace ⟨𝑎, 𝑏, 𝑏, 𝑑⟩ that features less loop iterations than the original trace and hence does not disrupt the hypothesis tests. We therefore use this trace instead of the original trace to execute the hypothesis tests and find that the neighborhood (𝑏, 𝑑) is main behavior. However, when constructing the DFG for 𝐿⟳ , we calculate the multiplicities based on the original (i.e. not shortened) traces. 6. Evaluation We implemented the algorithm described in this paper in python, version 3.8, using the process mining library pm4py [12]. The latter provides functions to calculate the fitness, precision, generalization and simplicity for a given Petri net. To calculate these quality-measures, we used the following techniques provided by pm4py: • Fitness: The alignment-based fitness is computed and the percentage of completely fit traces is returned, as well as the average fitness value of the single traces, • Precision: The Align-ETConformance is computed and returned, see [13], • Generalization: Computed as described in [14], • Simplicity: Computed as described in [15]. To check the Petri net for soundness, the external tool WOFLAN [16] is used. Aside from pm4py there are other tools, like Entropia [17], that are also interesting for efficient and robust conformance checking. We plan to investigate these tools in the future and use them to further evaluate our results. We chose some real-life event logs that are often used to evaluate new process discovery techniques, to test and compare our implementation to other mining-techniques. We used • BPI_Challenge_2012.xes (BPI12), which can be found at [18], • DomesticDeclarations.xes (DD), • InternationalDeclarations.xes (ID), • PermitLog.xes (PL), • PrepaidTravelCost.xes (PTC), • RequestForPayment.xes (RFP), all of which can be found at [19]. Table 1 Characteristics of the event logs that were chosen for the evaluation. BPI12 DD ID PL PTC RFP Number of unique traces 4 366 99 753 1 478 202 89 Total number of traces 13 087 10 500 6 449 7 065 2 099 6 886 Number of cycles 43 566 121 585 3 842 113 80 Longest Trace Length 175 24 27 90 21 20 Shortest Trace Length 3 1 3 3 1 1 Average Trace Length 20.04 5.37 11.19 12.25 8.69 5.34 Table 1 gives an overview over some important characteristics of these event logs. First, we compared our approach with the fixed parameters 𝑝0 = 0.05 and 𝛼 = 0.05 with the following well-known mining approaches that consider outliers in the event log: • The Inductive Miner infrequent (IMi) [6] creates a process tree out of the DFG for an event log and creates a Petri net from that process tree. In a preprocessing step, all edges that are "too infrequent" are removed, by going through all events 𝑒 and deleting all outgoing edges of 𝑒 with a frequency less than 𝑘 times the frequency of the strongest outgoing edge of 𝑒, where 𝑘 is a user-chosen parameter between 0 and 1. We chose 0.2 as the parameter for IMi. • The Directly Follows Miner (DFM) [5] creates a DFG and scans it for infrequent behavior by determining the edges with the least frequency. Then all traces that feature the direct neighborhood modeled with an infrequent edge are removed from a copy of the event log. This step is repeated until at most a fraction of 𝑡 traces were removed, where 𝑡 is a user-chosen threshold between 0 and 1. Then, the DFG for the new event log is computed (or the old DFG is adjusted accordingly) and a Petri net is constructed from the DFG. We chose 0.1 as the parameter for DFM. We also compared our approach to the technique presented in [7], where hypothesis tests are executed, and all infrequent edges are deleted from the DFG without the goal to maintain soundness. For this approach, we chose 𝑝0 = 0.95 and 𝛼 = 0.05, so the hypothesis tests are performed exactly like with the new technique. Since the DFM and our approach result in a DFG instead of a Petri net, we also need to convert the DFG into a Petri net to compute the quality measures as implemented in pm4py. To do so we constructed a labeled Petri net 𝑁 = (𝑃, 𝑇, 𝐹, 𝑙) from a DFG 𝐺 = (𝑉, 𝐸, 𝑤) as follows: • For each node 𝑣 ∈ 𝑉 a place 𝑣 ∈ 𝑃 is added. • For each edge (𝑢, 𝑣) ∈ 𝑉 × (𝑉 ∖ {𝐸𝑛𝑑}), a transition 𝑡𝑢,𝑣 ∈ 𝑇 with 𝑙(𝑡𝑢,𝑣 ) = 𝑣 is added. • For each edge (𝑢, 𝐸𝑛𝑑) ∈ 𝑉 × {𝐸𝑛𝑑}, we also add an edge 𝑡𝑢,𝐸𝑛𝑑 ∈ 𝑇 . However, this is a silent transition, i.e. 𝑙(𝑡𝑢,𝐸𝑛𝑑 ) = 𝜏 . • Finally, for each transition 𝑡𝑢,𝑣 ∈ 𝑇 two arcs are added: (𝑢, 𝑡𝑢,𝑣 ) ∈ 𝐹 and (𝑡𝑢,𝑣 , 𝑣) ∈ 𝐹 . The results of executing the algorithms and the conformance checking methods on the resulting Petri nets can be found in Figure 5. For all evaluated methods and logs, Fitness and Generalization were similarly good. Simplicity also showed consistently good values, except for a few very good results from the DFM. The percentage of fitting traces was consistently very good for DFM, while the other techniques achieved variously good values for different logs. The largest differences between the methods were observed for the Precision and the F-Score. Here, the results for different logs vary greatly for the DFM, while IMi achieves rather low results. Both of our new approaches achieve very good values most of the time in these two categories. It is notable that the approach of [7] (which we call HT) always leads to a better precision than our new approach, which is not surprising considering that we leave some infrequent edges in the graph to maintain soundness. For example, if there is a single trace featuring much infrequent behavior, from which almost everything is deleted from the DFG except for an edge 𝑒, it is possible to use this edge 𝑒 in the resulting Petri Net, even though it does not fit to any behavior seen in the event log. Due to the higher precision, the F-Score is also better than in our approach. Further, the simplicity of the net mined by HT is often much higher than in the new approach, simply because more edges are deleted. However, contrary to our new approach, HT never led to a sound net for any of the tested event logs. IMi scores similary to our approach, except for precision and therefore F-Score. This is due to the fact that IMi classifies edges as infrequent with a rather local argument – the frequency of an edge is compared to the frequencies of all other outgoing edges, but not to the frequencies in the full graph. Hence, IMi may often delete single edges from many traces. For example, for the very simple event log [⟨𝑎, 𝑏, 𝑐⟩𝑙 ], it can easily be forced to delete the edge (𝑏, 𝑐) by introducing a trace 𝑙 ⟨𝑥, 𝑏, 𝑦⟩ 𝑘 +1 to the event log. Then, the edge (𝑏, 𝑐) in the DFG becomes infrequent compared to the other outgoing edge (𝑏, 𝑦) and is therefore deleted. Then, the trace ⟨𝑎, 𝑏, 𝑐⟩ can not be replayed, which leads to a worse fitness, but the trace ⟨𝑎, 𝑏, 𝑦⟩ can be replayed, even though it is not present in the event log. With the user-chosen threshold that affects the behavior of the DFM, we have direct control over the fitness of the mined model, so this metric features always very good values. However, since DFM deletes only traces and not single edges, it seems odd that the precision of DFM is as low as it is for some event logs. This may be associated with the fact that in the DFG there is exactly one vertex for every event present in the event log, and therefore some traces that share an event name but have nothing else in common lead to a Petri net where traces can be replayed that are not part of the event log. However, our approach shares the same problem concerning the DFG, which makes it unclear why the precision of our approaches have a much higher precision most of the time. To investigate this behavior further could be an interesting task for future work. When comparing the simpler version of our method to the one with loop reduction, it can be seen that both versions achieve similar results, which can be explained by the structure of the chosen event logs. It may be interesting to investigate further how the two variants differ on event logs with many loops. During the execution of the different algorithms, the times required to construct a model in each case were measured. These do not include the calculation of the metrics that were afterwards calculated on it. In Table 2, all execution times are given in seconds. Hence, for all event logs and algorithms evaluated, a result could be calculated within a few seconds. Table 2 The execution times in seconds for each investigated approach and each chosen event log. BPI12 DD ID PL PTC RFP HTs 00.87397 s 00.09229 s 00.20281 s 02.73899 s 00.07453 s 00.06550 s HTl 02.69569 s 00.10818 s 00.36392 s 01.00158 s 00.10343 s 00.07757 s HT 05.493242 s 00.09222 s 00.18860 s 00.34547 s 00.05466 s 00.06677 s IMi 12.15239 s 01.67628 s 03.43188 s 07.16169 s 01.54955 s 01.86956 s DFM 01.99909 s 00.14701 s 00.31106 s 00.68258 s 00.05291 s 00.09879 s 7. Conclusion In this paper, we revisited the use of hypothesis tests of Petrak et al. [7] to detect infrequent neighborhood relations in a given event log. We proposed a simpler variant for these tests in the form of a left-sided hypothesis test and changed the output to a Directly Follows Graph instead of a footprint, in order to be able to selectively remove infrequent behavior without destroying the soundness of the resulting DFG. Furthermore, we showed theoretically that high loop iterations in an event log can lead to undesired results of the hypothesis tests and solved this problem by implementing a preprocessing step that reduces the number of loop iterations to an amount small enough that the sample size used for the hypothesis tests is not bloated by the loop. The proposed techniques are very easy to understand and implement. We compared both the hypothesis tests without loop-shortening and the hypothesis tests with loop-shortening with the results of the old technique using hypothesis tests [7], the Inductive Miner infrequent [6] and the Directly Follows Miner [5] and found that our results for fitness, precision, generalization, simplicity and F-score are rather consistent and high – also compared to these techniques. Further research should include the evaluation of logs with high loop iterations to verify that the shortening of loops is not only reasonable in theory, but also in practice. It is interesting how often the loop-shortening is relevant in real-life logs and under what circumstances (concerning the underlying process) the necessity of loop-shortening is probable. Also, there are some more techniques that outliers before mining a process model [20, 21, 22], some of which also use statistical ideas. It would be very interesting to see how our approach competes to these techniques. Since the DFG loses information on concurrency, a preprocessing step to detect concurrency may be a topic of further research – as well as the investigation on the existence of event logs that lead to bad results with our approach due to concurrency. References [1] S. J. J. Leemans, D. Fahland, W. M. P. van der Aalst, Discovering Block-Structured Process Models from Event Logs Containing Infrequent Behaviour, in: Business Process Manage- ment Workshops - BPM 2013 International Workshops, Beijing, China, August 26, 2013, Revised Papers, volume 171 of Lecture Notes in Business Information Processing, 2013, pp. 66–78. [2] C. W. Günther, W. M. P. van der Aalst, Fuzzy Mining – Adaptive Process Simplification Based on Multi-perspective Metrics, in: Business Process Management, 2007, pp. 328–343. [3] A. Weijters, W. Aalst, van der, A. Alves De Medeiros, Process mining with the Heuristic- sMiner algorithm, BETA publicatie : working papers, Technische Universiteit Eindhoven, 2006. [4] A. J. M. M. Weijters, J. T. S. Ribeiro, Flexible Heuristics Miner (FHM), in: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), 2011, pp. 310–317. [5] S. J. J. Leemans, E. Poppe, M. T. Wynn, Directly Follows-Based Process Mining: Exploration & a Case Study, in: International Conference on Process Mining, ICPM 2019, Aachen, Germany, June 24-26, 2019, IEEE, 2019, pp. 25–32. [6] S. J. J. Leemans, D. Fahland, W. M. P. van der Aalst, Discovering Block-Structured Process Models from Incomplete Event Logs, in: G. Ciardo, E. Kindler (Eds.), Application and Theory of Petri Nets and Concurrency - 35th International Conference, PETRI NETS 2014, Tunis, Tunisia, June 23-27, 2014. Proceedings, volume 8489 of Lecture Notes in Computer Science, Springer, 2014, pp. 91–110. [7] L. Petrak, R. Lorenz, Detecting Infrequent Behavior in Event Logs using Statistical Inference, in: Proceedings of the International Workshop on Algorithms & Theories for the Analysis of Event Data 2020, June 24, 2020, volume 2625 of CEUR Workshop Proceedings, 2020, pp. 33–48. [8] J. Carmona, B. F. van Dongen, A. Solti, M. Weidlich, Conformance Checking - Relating Processes and Models, Springer, 2018. URL: https://doi.org/10.1007/978-3-319-99414-7. doi:10.1007/978-3-319-99414-7. [9] J. R. Edmonds, E. L. Johnson, Matching, Euler tours and the Chinese postman, Math. Program. 5 (1973) 88–124. [10] W. M. van der Aalst, B. F. van Dongen, Discovering Petri Nets from Event Logs, in: Transactions on Petri Nets and Other Models of Concurrency VII, Springer, 2013, pp. 372–422. [11] G. Hornsteiner, Daten und Statistik, Eine praktische Einführung für den Bachelor in Psychologie und Sozialwissenschaften. Berlin/Heidelberg (2012). [12] A. Berti, S. J. Van Zelst, W. van der Aalst, Process Mining for Python (pm4py): bridging the gap between process-and data science, arXiv preprint arXiv:1905.06169 (2019). [13] A. Adriansyah, J. Munoz-Gama, J. Carmona, B. Dongen, W. Aalst, Measuring preci- sion of modeled behavior, Information Systems and e-Business Management 13 (2014). doi:10.1007/s10257-014-0234-7. [14] J. Buijs, B. Dongen, W. Aalst, Quality Dimensions in Process Discovery: The Importance of Fitness, Precision, Generalization and Simplicity, International Journal of Cooperative Information Systems 23 (2014) 1440001. doi:10.1142/S0218843014400012. [15] F. R. Blum, Metrics in process discovery, 2015. [16] W. M. P. van der Aalst, Woflan: A Petri-net-based Workflow Analyzer, Systems analysis modelling simulation 35 (1999) 345–357. URL: https://publications.rwth-aachen.de/record/ 714622. [17] A. Polyvyanyy, H. Alkhammash, C. D. Ciccio, L. García-Bañuelos, A. A. Kalenkova, S. J. J. Leemans, J. Mendling, A. Moffat, M. Weidlich, Entropia: A Family of Entropy-Based Conformance Checking Measures for Process Mining, CoRR abs/2008.09558 (2020). URL: https://arxiv.org/abs/2008.09558. arXiv:2008.09558. [18] B. van Dongen, BPI Challenge 2012, 2012. URL: https://data.4tu.nl/articles/dataset/ BPI_Challenge_2012/12689204/1. doi:10.4121/uuid:3926db30-f712-4394-aebc- 75976070e91f. [19] B. van Dongen, BPI Challenge 2020, 2020. URL: https://data.4tu.nl/collections/ BPI_Challenge_2020/5065541/1. doi:10.4121/uuid:52fb97d4-4588-43c9-9d04- 3604d4613b51. [20] R. Conforti, M. L. Rosa, A. H. t. Hofstede, Filtering Out Infrequent Behavior from Business Process Event Logs, IEEE Transactions on Knowledge and Data Engineering 29 (2017) 300–314. doi:10.1109/TKDE.2016.2614680. [21] X. Lu, D. Fahland, W. M. van der Aalst, Interactively Exploring Logs and Mining Models with Clustering, Filtering, and Relabeling, in: BPM, 2016. [22] D. Brons, R. Scheepens, D. Fahland, Striking a new Balance in Accuracy and Simplicity with the Probabilistic Inductive Miner, in: 2021 3rd International Conference on Process Mining (ICPM), 2021, pp. 32–39. doi:10.1109/ICPM53251.2021.9576864. Figure 5: Comparison of the resulting quality-metrics for the approach described in this paper, using hypothesis tests and maintaining soundness (HTs), the same approach using loop-reduction (HTl) the old technique using hypothesis tests (HT), the Inductive Miner infrequent (IMi) and the Directly Follows Miner (DFM).