Taking on Infrequent Behavior in Event Logs using
Hypothesis Tests
Patrizia Schalk1 , Lisa Petrak1
1
    University of Augsburg, Universitätsstraße 6a, 86159 Augsburg


                                         Abstract
                                         When an event log is generated based on the real-life data of an existing process, there is a high proba-
                                         bility that apart from events that happen frequently and therefore should be represented in the process
                                         model, infrequent behavior is featured in the event log that would make a model generated from this log
                                         hardly readable. There are many process discovery algorithms that work with such outliers by filtering
                                         infrequent behavior just before or during the creation of an appropriate process model. Less common
                                         methods make use of statistical tests to detect infrequent behavior in a preprocessing step. This paper
                                         aims to simplify the use of hypothesis tests introduced by Petrak et al. and proposes an approach to
                                         delete infrequent behavior without generating unwanted side effects for the process model. Further-
                                         more, we solve a problem regarding the use of hypothesis tests for a process that contains loops and
                                         compare the results of hypothesis tests to well-known discovery techniques.

                                         Keywords
                                         Process Mining, Process Discovery, Filtering Outliers, Detection of Infrequent Behavior, Hypothesis
                                         Test, Directly Follows Graph, Loop Shortening


1. Introduction
Finding mathematical models for real-life processes is a highly relevant task, since such models
can easily be analyzed in a way that shows exactly where the process has its weaknesses and
where it can be optimized. However, constructing such a process model by hand leads to results
that describe what the process should look like instead of how it actually looks. Therefore, it is
highly desired to create such models automatically and analyze them afterwards, so there is no
bias present in the model. To automatically construct such a model, a record of the behavior of
the process, e.g. in the form of an event log, is needed. However, there are several challenges
that make it hard to construct a model from an event log, especially if such a model has to meet
some criteria like being readable.
   One example of such a challenge is the existence of outliers in the event log – data that is
present in the log but is so infrequent that it does not belong to the important parts of a process.
Related to the existence of outliers is the existence of noise – data in the event log that cannot
occur in the process. There are several causes for noise, such as human error ("I meant to log
a different task than I actually did") or technical failure ("A part of the system shut down and
didn’t log a status during a period of time"). While noise is erroneous behavior and should not
be featured in the event log, outliers can very well be part of the actual behavior of the process.
Algorithms & Theories for the Analysis of Event Data (ATAED) 2022
   patrizia.schalk@informatik.uni-augsburg.de (P. Schalk); lisa.petrak@informatik.uni-augsburg.de (L. Petrak)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
However, removing edge cases of the event log promises to make the remaining behavior and
therefore a suitable process model more clear and readable. Finding such a reasonable model
despite the existence of outliers in the event log is the task of process discovery. Most such
techniques handle noise by using heuristics or by setting thresholds for the maximum amount of
behavior that may be deleted [1, 2, 3, 4, 5, 6]. Fewer discovery techniques make use of statistical
theory to detect infrequent behavior in event logs [7].
   As mentioned before, a discovered model needs to meet some quality measures. For example,
we want a process model that is able to replay most of the behavior present in the event log
(but not everything, since infrequent behavior should be excluded) – this trait is measured by
the fitness of a process model. However, fitness alone is not enough for a good process model,
since we can accomplish perfect fitness by simply allowing every behavior, even behavior
that has nothing to do with the process. Therefore, another quality measure is the precision,
which ensures that the process model cannot replay way more than is present in the event
log. The notion of generalization describes how open the model is for reasonable extensions,
i.e. behavior that was not recorded in the log but can happen in the process. Finally, a process
model should be simple and easy to read, so that a human can understand and analyze it –
this trait is measured by the simplicity of the discovered process model. More information on
quality measures for process models can be found in [8].
   In this paper, we continue research on the question of how to discover a "good" process
model for the main behavior of an event log by filtering out infrequent behavior. The idea to
use hypothesis tests to find such outliers in an event log was presented in [7], where one-sided
hypothesis tests are used to decide whether the direct neighborhood of two events should
be classified as main or infrequent behavior. However, the approach of said paper creates
unwanted side effects by deleting single events out of a trace and therefore creating new direct
neighborhood relations that might be simply wrong. We present a new approach without this
problem that is graphical, therefore easy to follow, and gives the user more power over the
result he wishes to get. We first lay the foundation for our approach by defining needed notions
in Section 2 and revisit the general idea to use hypothesis tests to find infrequent behavior
in Section 3 by simplifying the used techniques described in [7] and investigating a running
example that is small and easy to follow. In Section 4 we propose to use a Directly Follows Graph
to represent the directly follows relations between events after we detected infrequent direct
neighbors using hypothesis tests and show how we can delete infrequent behavior without
risking side effects. We address a problem concerning loops that feature many iterations in the
event log in Section 5 and propose a simple solution that is based on the well-studied Chinese
Postman Problem [9]. Section 6 evaluates the results by comparing the accomplished fitness,
precision, generalization and simplicity with those of well-known discovery techniques, namely
the Inductive Miner infrequent [6] and the Directly-Follows Miner [5], with which our approach
shares similar ideas. Section 7 concludes the paper.
2. Basic Definitions
We denote the set of natural numbers {1, 2, 3, . . . } by N and define N0 := N ∪ {0}. For an
arbitrary set 𝑇 we call 𝑚 : 𝑇 → N0 a multiset over 𝑇 . For any 𝑎 ∈ 𝑇 , 𝑚(𝑎) is the number
of occurences of the element 𝑎 in the multiset 𝑚. We write 𝑎 ∈ 𝑚 if this number is greater
than zero, i.e. 𝑎 ∈ 𝑚 :⇔ 𝑚(𝑎) > 0. We also write a finite multiset 𝑚 over 𝑇 with elements
                              𝑚(𝑎 )          𝑚(𝑎 )
𝑎1 , . . . , 𝑎𝑛 in the form [𝑎1 1 , . . . , 𝑎𝑛 𝑛 ].

Definition 1 (Event, Trace, Event log [10]). Let 𝑇 be a set of activity names. A sequence of
activities 𝜎 = ⟨𝑡1 , . . . , 𝑡𝑛 ⟩ ∈ 𝑇 * is called a trace. An activity 𝑎 ∈ 𝑇 is contained in 𝜎 if it occurs
in 𝜎 at any time, i.e. 𝑎 ∈ 𝜎 :⇔ 𝜎 = ⟨𝑡1 , . . . , 𝑡𝑛 ⟩ ∧ ∃𝑖 ∈ {1, . . . , 𝑛} : 𝑡𝑖 = 𝑎. For an arbitrary
trace 𝜎 = ⟨𝑡1 , . . . , 𝑡𝑛 ⟩ ∈ 𝑇 * with 𝑛 ≥ 1 we define 𝑓 𝑖𝑟𝑠𝑡(𝜎) := 𝑡1 and 𝑙𝑎𝑠𝑡(𝜎) := 𝑡𝑛 .
An event log 𝐿 : 𝑇 * → N0 over 𝑇 is a multiset of traces. The occurrence of an activitiy 𝑎 ∈ 𝑇 in
𝐿 is called an event.

  Regarding a real-life process, we give every event that takes place a name, which can be an
artificial one (like 𝑎, 𝑏, 𝑐, . . . ) or a descriptive one (like 𝑡𝑎𝑘𝑒_𝑜𝑟𝑑𝑒𝑟). A trace then describes a
sequence of events that happened for a special case, e.g. for a customer or a general object.

Definition 2 (Ordering relations [10]). Let 𝐿 be an event log over a set of activities 𝑇 . For
any 𝑎, 𝑏 ∈ 𝑇 we introduce the following binary causal relations on 𝑇 :
    • 𝑎 >𝐿 𝑏 if and only if 𝑎 is directly followed by 𝑏 somewhere in a trace 𝜎 in 𝐿, i.e. if a trace
      𝜎 = ⟨𝑡1 , . . . , 𝑡𝑛 ⟩ ∈ 𝐿 and a number 𝑖 ∈ {1, . . . , 𝑛 − 1} exist, with 𝑡𝑖 = 𝑎 and 𝑡𝑖+1 = 𝑏
    • 𝑎 →𝐿 𝑏 if and only if 𝑎 >𝐿 𝑏 and 𝑏 ≯𝐿 𝑎
    • 𝑎 ←𝐿 𝑏 if and only if 𝑎 ≯𝐿 𝑏 and 𝑏 >𝐿 𝑎
    • 𝑎 #𝐿 𝑏 if and only if 𝑎 ≯𝐿 𝑏 and 𝑏 ≯𝐿 𝑎
    • 𝑎 ‖𝐿 𝑏 if and only if 𝑎 >𝐿 𝑏 and 𝑏 >𝐿 𝑎.

  From these ordering relations >𝐿 will be of great interest for us, since we want to investigate
the neighborhood of two events and decide whether two events directly following each other
happen often (i.e. main behavior) or rarely (i.e. infrequent behavior). To do so, we need to know
how often two events follow each other, which we can store in the Correlation Matrix:

Definition 3 (Correlation matrix). Let 𝐿 be an event log over 𝑇 with 𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑 ̸∈ 𝑇 . Fur-
ther, take 𝑇𝑆 := 𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡}, 𝑇𝐸 := 𝑇 ∪ {𝐸𝑛𝑑} and 𝜆 := ⟨⟩, the empty trace. We denote the
number of times 𝑎 ∈ 𝑇𝑆 is directly followed by 𝑏 ∈ 𝑇𝐸 in all traces contained in 𝐿 by |(𝑎, 𝑏)|>𝐿 .
The correlation matrix for an event log 𝐿 is defined as the square matrix 𝐶 𝐿 : 𝑇𝑆 × 𝑇𝐸 → N0
with                    ⎧
                        ⎪
                        ⎪
                        ⎪  |(𝑎, 𝑏)|>𝐿                    if 𝑎, 𝑏 ∈ 𝑇
                        ⎨∑︀                      𝐿(𝜎) if 𝑎 = Start, 𝑏 ∈ 𝑇
                        ⎪
                   𝐿
                 𝐶𝑎,𝑏 = ∑︀𝜎∈𝐿∖{𝜆},𝑓 𝑖𝑟𝑠𝑡(𝜎)=𝑏
                        ⎪
                        ⎪
                        ⎪     𝜎∈𝐿∖{𝜆},𝑙𝑎𝑠𝑡(𝜎)=𝑎 𝐿(𝜎)     if 𝑎 ∈ 𝑇 , 𝑏 = End
                                                         if 𝑎 = Start, 𝑏 = End.
                        ⎪
                        ⎩𝐿(𝜆)

Recall that 𝐿 is a multiset and hence, for any 𝜎 ∈ 𝐿, 𝐿(𝜎) denotes the number of times the trace
𝜎 is contained in 𝐿.
   Apart from the Correlation Matrix, we can store the information related to which events
are directly followed by one another in the Directly Follows Graph, which is a simple directed
graph containing all events as vertices. Two vertices 𝑢 and 𝑣 are connected by an edge (𝑢, 𝑣) if
somewhere in the event log 𝑢 is directly followed by 𝑣. The weight on the edge shows how
often this happens in the event log.

Definition 4 (Directly Follows Graph). Let 𝐿 be an event log over a set of activities 𝑇 . The
Directly Follows Graph (DFG) for 𝐿 is a weighted directed graph 𝐺 = (𝑉, 𝐸, 𝑤) with
    • 𝑉 := 𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑},
    • 𝐸 := >𝐿 ∪ {(𝑆𝑡𝑎𝑟𝑡, 𝑡) | ∃𝜎 ∈ 𝐿 : 𝑡 = 𝑓 𝑖𝑟𝑠𝑡(𝜎)}
              ∪ {(𝑡, 𝐸𝑛𝑑) | ∃𝜎 ∈ 𝐿 : 𝑡 = 𝑙𝑎𝑠𝑡(𝜎)} and
    • 𝑤 : 𝐸 → N0 , 𝑤((𝑎, 𝑏)) := 𝐶𝑎,𝑏
                                 𝐿 .


   An example for a Directly Follows Graph will be given in Figure 1 in Section 4, where
we first need the DFG. A useful property of the DFG is that it can easily be translated to a
language-equivalent Petri-net. Leemans et al. [5] showed that if the DFG is sound, the respective
language-equivalent Petri-net is also sound.

Definition 5 (Soundness of the Directly Follows Graph [5]). Let 𝐺 = (𝑉, 𝐸, 𝑤) be a DFG
for an event log 𝐿. 𝐺 is sound if every node 𝑣 ∈ 𝑉 is on a path from 𝑆𝑡𝑎𝑟𝑡 to 𝐸𝑛𝑑, i.e.
∀𝑣 ∈ 𝑉 : ∃𝑢1 , . . . , 𝑢𝑛 : 𝑢1 = 𝑆𝑡𝑎𝑟𝑡 ∧
                            𝑢𝑛 = 𝐸𝑛𝑑 ∧
                            ∃𝑗 ∈ {1, . . . 𝑛} : 𝑢𝑗 = 𝑣 ∧
                            ∀𝑖 ∈ {1, . . . , 𝑛 − 1} : (𝑢𝑖 , 𝑢𝑖+1 ) ∈ 𝐸

  As a running example for this paper we take a set of activities 𝑇 = {𝑎, 𝑏, 𝑐, 𝑑} and an artificial
event log 𝐿 which is defined as
      𝐿 = [⟨𝑎, 𝑏, 𝑐, 𝑏⟩100 , ⟨𝑎, 𝑐, 𝑏⟩50 , ⟨𝑑, 𝑏, 𝑑⟩100 , ⟨𝑏, 𝑒⟩1000 , ⟨𝑑, 𝑒⟩1000 , ⟨𝑓, 𝑔, 𝑓, 𝑔, 𝑓, 𝑔⟩100 ].
To construct the Correlation Matrix, we simply count how often an event 𝑒1 ∈ 𝑇 is followed
by an event 𝑒2 in the event log and add this quantity to the Correlation Matrix in row 𝑒1 and
column 𝑒2 .
                                𝐸𝑛𝑑      𝑎       𝑏   𝑐          𝑑     𝑒         𝑓      𝑔
                      𝑆𝑡𝑎𝑟𝑡       0     150    1000 0          1100   0        100     0
                        𝑎         0      0      100 50           0    0         0      0
                        𝑏       150      0       0  100        100 1000         0      0
                        𝑐         0      0      150  0           0    0         0      0
                        𝑑       100      0     100   0          0   1000        0      0
                        𝑒       2000     0       0   0          0     0         0      0
                        𝑓         0      0       0   0           0    0         0     300
                        𝑔       100      0       0   0          0     0        200     0

The artificial event 𝑆𝑡𝑎𝑟𝑡 (𝐸𝑛𝑑) is used to record how often an event 𝑒 ∈ 𝑇 is the first (last)
event in a trace of 𝐿. The Correlation Matrix is very handy to calculate values needed for the
execution of hypothesis tests, which are described in the next section.
3. Hypothesis Tests
Rather recently, Petrak et al. [7] showed in their article that hypothesis tests can be used
to determine whether the direct neighborhood of two events that was observed in the event
log should be classified as infrequent behavior, i.e. noise, or not. We shortly revisit this idea,
simplify it slightly, and give some simple examples that show how this method differs from
simply considering the direct neighborhood that is the least frequent in the event log as noise.
   Generally speaking, hypothesis tests can be used to check whether a certain hypothesis is
valid with a "sufficiently high" probability. Such a hypothesis makes a statement about a certain
property of the objects in a population. As a simple example, take the set of all researchers in
the field of process mining as the population and the statement "At least 80% of researchers in the
field of process mining like process discovery". Since this is a statement that can’t be proved or
falsified without asking every single researcher in the population (which would be a rather costly
operation) one could easily doubt the correctness of this statement. So next to the already stated
Null-Hypothesis 𝐻0 , we formulate an Alternative Hypothesis 𝐻1 stating the contrary: "Less than
80% of researchers in the field of process mining like process discovery". To check which of the two
hypotheses is correct, we take a small sample of the population and check how many researchers
of this population like process discovery. We then perform a left-sided hypothesis test to check
whether this sample implies that the Null-Hypothesis 𝐻0 or the Alternative Hypothesis 𝐻1 is
true.
   In general, let 𝑝 ∈ [0, 1] be an unknown probability (in our example the actual percentile of
researchers who love process discovery) and 𝑝0 ∈ [0, 1] a parameter given by the user (in our
example 𝑝0 = 0.8). Using data from a sample of the population, a left-sided hypothesis test
can be used to decide whether 𝑝 ∈ [0, 𝑝0 [ or 𝑝 ∈ [𝑝0 , 1] is true, i.e. whether the Null Hypothesis
𝐻0 : 𝑝 ≥ 𝑝0 or the Alternative Hypothesis 𝐻1 : 𝑝 < 𝑝0 is true. Since a hypothesis test comes to
this decision based on a sample of the full data, there is a certain probability that the wrong
decision will be made. The probability that this happens can be estimated, making it possible to
formulate rather reliable statements about which of the two hypotheses holds.
   The probability that we chose 𝐻1 when in reality 𝐻0 is true is called the 𝛼-error; the
probability that we chose 𝐻0 when in reality 𝐻1 is true, is called 𝛽-error. In the context of this
paper, we are especially interested in the 𝛼-error, which can be bounded by a given 𝛼 when using
hypothesis tests. We then want to find a value 𝑘 such that for a random variable 𝑋 (which can
be understood as the number of elements in a random sample that fulfill the property defined in
𝐻0 ) 𝑃 (𝑋 ≤ 𝑘 | 𝐻0 is true) ≤ 𝛼 is true. The values of 𝑘 for which this inequality holds can be
determined by using the density-function of the given probability distribution. However, since
this computation is quite time-consuming, we use an approximation for 𝑘 √︀        if said probability
distribution is a binomial distribution and the standard-deviation 𝜎𝑛 := 𝑛 · 𝑝0 · (1 − 𝑝0 )
satisfies 𝜎𝑛 > 3. In this case, we approximate 𝑘 by

                                     𝑘 = ⌈𝑛𝑝0 − 𝜎𝑛 · 𝑢1−𝛼 ⌉,                                     (1)

where 𝑢1−𝛼 is the (1 − 𝛼)-quantile of the standard normal distribution. A more formal descrip-
tion of hypothesis tests can be found in [11].
   To detect noise in an event log, Petrak et al. [7] understand the neighborhood relation
between two events as a binomial distribution and use hypothesis tests to check whether the
sighting of two events 𝑒1 , 𝑒2 ∈ 𝑇 directly following each other is main behavior (𝐻0 is true)
or infrequent behavior (𝐻1 is true). To achieve this, they define the population as the pairs
(𝑒, 𝑒′ ) ∈ 𝑇 that can follow each other in the process and where 𝑒 = 𝑒1 or 𝑒′ = 𝑒2 and view the
event log as a sample where some of these direct neighborhoods were randomly drawn. Hence,
the population is defined as:

            𝑃(𝑒1 ,𝑒2 ) := {(𝑥, 𝑦) ∈ 𝑇𝑆 × 𝑇𝐸 | (𝑥 = 𝑒1 ∨ 𝑦 = 𝑒2 ) ∧ |(𝑥, 𝑦)|>𝐿 > 0},

which defines the sample size 𝑛 as the number of events that can follow 𝑒1 or can be followed
by 𝑒2 in the process:
                                   ∑︁                ∑︁
              𝑛 := |𝑃(𝑒1 ,𝑒2 ) | =    |(𝑒1 , 𝑒)|>𝐿 +    |(𝑒, 𝑒2 )|>𝐿 − |(𝑒1 , 𝑒2 )|>𝐿 .
                                𝑒∈𝑇𝐸               𝑒∈𝑇𝑆

With the user-given constants 1 − 𝑝0 and 𝛼 they then execute a right-sided hypothesis test,
which means, for a pair of events (𝑒1 , 𝑒2 ), they estimate 𝑘 by ⌈𝑛𝑝0 − 𝜎𝑛 · 𝑢1−𝛼 ⌉ and decide
for 𝐻0 when 𝑛 − |(𝑒1 , 𝑒2 )|>𝐿 < 𝑘. Since this is equivalent to estimate 𝑘 by the formula in
Equation 1 and deciding for 𝐻0 when |(𝑒1 , 𝑒2 )|>𝐿 > 𝑘, we can instead perform a left-sided
hypothesis test with the probability 𝑝0 as described at the start of this section.
   With this idea we can calculate the critical value 𝑘 for which 𝑃 (𝑋 ≤ 𝑘 | 𝐻0 is true) ≤ 𝛼
holds and check afterwards whether |(𝑒1 , 𝑒2 )|>𝐿 > 𝑘. If this is true, we accept the Null Hypoth-
esis 𝐻0 and consider 𝑒2 directly following 𝑒1 as main behavior. Otherwise, we reject the Null
Hypothesis, which means we accept the Alternative Hypothesis 𝐻1 and consider 𝑒2 directly
following 𝑒1 as infrequent behavior. However, if 𝜎𝑛 ≤ 3, we need to calculate the critical value
by evaluating the binomial distribution and finding a 𝑘 for which the integral of the density
function from 0 to 𝑘 is less than 1 − 𝛼. For a more detailed description of this idea, see [7] and
[11]. Using these ideas leads to the procedure shown in algorithm 1.

   For our running example, we set 𝑝0 = 0.05 (so we consider a pair of direct neighbors as
infrequent if this behavior affects less than 5% of the population) and 𝛼 = 0.05 (so we accept
that the probability for classifying main behavior as infrequent is at most 5%). We show the
decision for whether a pair of direct neighbors is main or infrequent behavior on the following
two examples:
Check (𝑎, 𝑐) ∈>𝐿 :
           𝐸𝑛𝑑      𝑎      𝑏       𝑐      𝑑      𝑒      𝑓      𝑔
  𝑆𝑡𝑎𝑟𝑡     0      150 1000        0    1100     0     100     0    𝑛 = 250
    𝑎       0       0     100     50      0      0       0     0    𝜎𝑛 ≈ 3.4 > 3
    𝑏      150      0      0     100 100 1000            0     0    ⇒𝑘=7
    𝑐       0       0     150      0      0      0       0     0    |(𝑎, 𝑐)|>𝐿 = 50 > 7 = 𝑘
    𝑑      100      0     100     0       0    1000      0     0
    𝑒     2000      0      0      0       0      0       0     0    ⇒ (𝑎, 𝑐) ∈>𝐿 is main
    𝑓       0       0      0      0       0      0       0    300      behavior.
    𝑔      100      0      0      0       0      0     200     0
  Algorithm 1 Detecting main and infrequent neighbors
  Input: 𝑝0 ∈ [0, 1], 𝛼 ∈ [0, 1], 𝐿 : 𝑇 * → N0
  Output: 𝑚𝑎𝑖𝑛, 𝑖𝑛𝑓 𝑟𝑒𝑞 ⊆ (𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑})2
    𝑚𝑎𝑖𝑛 ← ∅
    𝑖𝑛𝑓 𝑟𝑒𝑞 ← ∅
               2 ) ∈ 𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑}
    for (𝑒1 , 𝑒∑︀                    ∑︀ do
        𝑛 ← 𝑒∈𝑇𝐸 |(𝑒1 , 𝑒)|>𝐿 + 𝑒∈𝑇𝑆 |(𝑒, 𝑒2 )|>𝐿 − |(𝑒1 , 𝑒2 )|>𝐿
                √︀
        𝜎𝑛 ← 𝑛 · 𝑝0 · (1 − 𝑝0 )
        if 𝜎𝑛 > 3 then
             𝑘 ← ⌈𝑛𝑝0 − 𝜎𝑛 · 𝑢1−𝛼 ⌉
        else
             𝑠𝑢𝑚 = 0
             while 𝑠𝑢𝑚 < 1 − 𝛼(︀ do
                  𝑠𝑢𝑚 ← 𝑠𝑢𝑚 + 𝑛𝑘 · 𝑝𝑘0 · (1 − 𝑝0 )𝑛−𝑘
                                    )︀

                  𝑘 ←𝑘+1
             end while
        end if
        if |(𝑒1 , 𝑒2 )|>𝐿 > 𝑘 then                            ◁ 𝑘 > 0, so |(𝑒1 , 𝑒2 )|>𝐿 > 0.
             𝑚𝑎𝑖𝑛 ← 𝑚𝑎𝑖𝑛 ∪ {(𝑒1 , 𝑒2 )}
        else
             𝑖𝑛𝑓 𝑟𝑒𝑞 ← 𝑖𝑛𝑓 𝑟𝑒𝑞 ∪ {(𝑒1 , 𝑒2 )}
        end if
    end for
    return 𝑚𝑎𝑖𝑛, 𝑖𝑛𝑓 𝑟𝑒𝑞
   It is notable that in this example the direct neighborhood of the events 𝑎 and 𝑐 has the lowest
frequency, but the result of the hypothesis test implies that 𝑎 being directly followed by 𝑐 is
main behavior. This is due to the fact that 𝑎 is only followed by 𝑏 or 𝑐 and 𝑐 is only preceded by
𝑎 or 𝑏. The number of sightings where 𝑏 precedes 𝑐 or 𝑏 follows after 𝑎 is rather small, so that
the neighborhood between 𝑎 and 𝑐 can be considered as main behavior.

Check (𝑏, 𝑑) ∈>𝐿 :
          𝐸𝑛𝑑      𝑎       𝑏      𝑐     𝑑        𝑒      𝑓      𝑔
 𝑆𝑡𝑎𝑟𝑡      0   150      1000     0    1100      0     100     0    𝑛 = 2450
   𝑎        0      0     100     50      0       0      0      0    𝜎𝑛 ≈ 10.7 > 3
   𝑏      150      0      0     100    100     1000     0      0    ⇒ 𝑘 = 105
   𝑐        0      0     150      0      0       0      0      0    |(𝑏, 𝑑)|>𝐿 = 100 ≤ 105 = 𝑘
   𝑑      100      0     100     0      0      1000     0      0
   𝑒      2000     0      0      0      0        0      0      0    ⇒ (𝑏, 𝑑) ∈>𝐿 is infrequent
   𝑓        0      0      0      0      0        0      0     300     behavior.
   𝑔      100      0      0      0      0        0     200     0
   Executing the hypothesis test for every pair of events leads to the result that the pairs
(𝑏, 𝑑), (𝑏, 𝐸𝑛𝑑), (𝑑, 𝑏), (𝑑, 𝐸𝑛𝑑) and (𝑆𝑡𝑎𝑟𝑡, 𝑓 ), of direct neighbors are infrequent. Petrak et al.
propose to simply delete these direct neighborhood relations from the footprint and use the
footprint that was constructed in this way to mine a process model.


4. Using the results to construct a DFG
We use the approach presented by Petrak et al., but construct a Directly Follows Graph (see
Definition 4) instead of a footprint. Gaining a Petri-net from a DFG is rather simple, and we can
easily check the soundness of a DFG by performing two Depth First Searches, which gives us
the possibility to easily construct a DFG that is sound.
   By applying the hypothesis tests to the event log with Algorithm 1 we immediately get a
set 𝑀 ⊆ (𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑})2 of direct neighbors that are part of the main behavior of the
event log and a set 𝐼 ⊆ (𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑})2 of direct neighbors that are infrequent behavior.
Obviously, no direct neighborhood can be both main and infrequent behavior, so 𝑀 ∩ 𝐼 = ∅.
Let 𝐺 = (𝑉, 𝐸) be the DFG for our event log. Since we only add behavior to 𝑀 or 𝐼 if we have
seen the direct neighborhood at least once in the event log, 𝑀 ⊆ 𝐸 and 𝐼 ⊆ 𝐸 hold. On the
other hand, due to the definition of 𝐸, there is no pair of events in 𝑀 ∪ 𝐼 that is not present in
𝐸, since 𝐸 contains every pair of direct neighbors that was observed in the event log. Therefore,
we can conclude that 𝑀 ∪ 𝐼 = 𝐸. The DFG for our running example is shown in Figure 1, the
black edges are those of the set 𝑀 and the highlighted edges are those of the set 𝐼.

                                                       300
                                             𝑓                 𝑔
                                                       200
                                   0
                              10
                                                        50
                                                                          10


                                             𝑎                 𝑐
                              150
                                                                            0


                                                       100
                                       100


                𝑠𝑡𝑎𝑟𝑡         1000                       150
                                                               150
                                             𝑏                                   𝑒𝑛𝑑
                            11                          150                 0
                                 00                                      200
                                       100

                                                 100


                                             𝑑                 𝑒
                                                       1000
                                                                   100


Figure 1: The DFG for the event log of our running example 𝐿. Blue edges were classified as infrequent
behavior, black edges were classified as main behavior by the hypothesis tests.


  When deleting edges from the DFG, we need to be careful, since the deletion of the whole set
𝐼 may lead to a result that is not sound according to Definition 5. This is also the case for the
DFG in Figure 1, since without all the highlighted edges, the events 𝑓 and 𝑔 do not lie on a path
from 𝑆𝑡𝑎𝑟𝑡 to 𝐸𝑛𝑑. A process model constructed directly from such a DFG is not sound, so the
goal is to delete as many infrequent edges as possible without sacrificing the soundness of the
DFG. Deleting every highlighted edge except the edges (𝑆𝑡𝑎𝑟𝑡, 𝑓 ) and (𝑔, 𝐸𝑛𝑑) leads to the
result shown in Figure 2.

                                                300
                                            𝑓             𝑔
                                                200
                                  0
                             10
                                                 50


                                                                   10
                                            𝑎             𝑐
                             150


                                                                     0
                                                100


                                      100
                𝑠𝑡𝑎𝑟𝑡        1000                150
                                            𝑏                                  𝑒𝑛𝑑
                            11                   150                 0
                                 00                               200

                                            𝑑             𝑒
                                                1000


Figure 2: The modified DFG for our running example log 𝐿, where infrequent edges that were not
necessary for the soundness were deleted.


  In general, it is not always clear which set 𝐽 ⊆ 𝐼 should be deleted from the DFG, since the
deletion of an edge 𝑒 could lead to a situation where other edges are needed for soundness,
whereas said edges would not be needed if 𝑒 wasn’t deleted. Finding the "best" subset of 𝐼 to
delete from the DFG is a difficult problem, since it is not clear when one subset is better than
another. If we define a function 𝑓 : 𝒫(𝐼) → N0 which maps every subset of 𝐼 to a numerical
value, where a high value indicates that the subset is a better candidate for deletion than another
candidate with a lower value, we have an optimization problem at hand: Maximize 𝑓 (𝐽) where
𝐺′ = (𝑉, 𝐸 ∖ 𝐽) is a sound DFG and 𝐽 ⊆ 𝐼. As a default for 𝑓 we propose 𝑓 (𝐽) := |𝐽|, so that
edge sets that contain more infrequent edges are preferred for deletion. A naive implementation
that solves this problem can be found in Algorithm 2.

  Algorithm 2 Deleting infrequent edges from the DFG
  Input: 𝐺 = (𝑉, 𝐸), 𝑀, 𝐼 ⊆ 𝐸 with 𝑀 ∪ 𝐼 = 𝐸 and 𝑀 ∩ 𝐼 = ∅, 𝑓 : 𝒫(𝐼) → N0
  Output: 𝐺′ = (𝑉, 𝐸 ′ )
    𝐽𝑜𝑝𝑡 ← ∅
    for each 𝐽 ⊆ 𝐼 do
        𝐺′′ ← (𝑉, 𝐸 ∖ 𝐽)
        if 𝑓 (𝐽) > 𝑓 (𝐽𝑜𝑝𝑡 ) and 𝐺′′ is sound then
            𝐽𝑜𝑝𝑡 ← 𝐽
        end if
    end for
    return (𝑉, 𝐸 ∖ 𝐽𝑜𝑝𝑡 )
  This algorithm needs 𝑂(2|𝐼| · (|𝑉 | + |𝐸|)) time and is therefore exponential in time, but since
the number of infrequent edges, and therefore the cardinality |𝐼| of 𝐼, is rather small in the
most cases, we accept this naive implementation, since it allows for an easy exchange of the
user-given function 𝑓 that should be optimized, and is easy to read. In our running example,
this algorithm leads to the sound DFG shown in Figure 2.


5. Handling of Loops
As can be seen in the example of Figure 1, loops in the DFG can have a high impact on the
population for our hypothesis tests. In this example, executing the loop between 𝑓 and 𝑔 some
times artificially increases the number of times when 𝑓 was preceded by another event than
𝑆𝑡𝑎𝑟𝑡 and the number of times when after 𝑔 followed another event than 𝐸𝑛𝑑. Even though the
edges (𝑆𝑡𝑎𝑟𝑡, 𝑓 ) and (𝑔, 𝐸𝑛𝑑) are only encountered in the same trace as the edges (𝑓, 𝑔) and
(𝑔, 𝑓 ), the repetition of this loop leads to the effect that (𝑆𝑡𝑎𝑟𝑡, 𝑓 ) and (𝑔, 𝐸𝑛𝑑) are classified as
infrequent behavior. In the example of Figure 1 this is not a problem, since these two infrequent
edges are not deleted for the sake of soundness. However, the following simple event log shows
that we are not always in such a lucky situation:
                         𝐿⟳ = [⟨𝑎, 𝑏, . . . , 𝑏, 𝑑⟩10 , ⟨𝑎, 𝑏, 𝑐, 𝑑⟩40 , ⟨𝑎, 𝑐, 𝑑⟩100 ]
                                   ⏟ ⏞
                                       51 times

  Executing hypothesis tests on this log leads to the classification of the direct neighborhood
between 𝑏 and 𝑑 as infrequent.

Check (𝑏, 𝑑) ∈>𝐿⟳ :
             𝐸𝑛𝑑      𝑎        𝑏       𝑐      𝑑          𝑛 = 690
    𝑆𝑡𝑎𝑟𝑡     0      150      0       0        0         𝜎𝑛 ≈ 5.7 > 3
      𝑎       0       0       50     100       0         ⇒ 𝑘 = 26
      𝑏       0       0      500      40      10         |(𝑏, 𝑑)|>𝐿 = 10 ≤ 26 = 𝑘
      𝑐       0       0       0       0      140
      𝑑      150      0       0       0       0          ⇒ (𝑏, 𝑑) ∈>𝐿⟳ is infrequent behavior.
  Every other direct neighborhood is classified as main behavior by the hypothesis tests, so we
get the DFG for ℒ that is shown in Figure 3.

                                                        500

                                                  50     𝑏        10
                                    150                                     150
                        𝑠𝑡𝑎𝑟𝑡              𝑎                 40         𝑑          𝑒𝑛𝑑
                                                  100    𝑐        150


Figure 3: The DFG for the event log 𝐿⟳ with the only infrequent edge highlighted in blue.


  Deletion of the highlighted edge would not violate the soundness of the DFG, but doing so
would remove the entire behavior of 𝑎 being followed by one or more 𝑏s, which is followed by a
single 𝑑. This is a problem, since the neighborhood (𝑏, 𝑑) would not be classified as infrequent
behavior if the event log would feature fewer iterations of the 𝑏-loop:

Check (𝑏, 𝑑) ∈>𝐿⟳ :

             𝐸𝑛𝑑      𝑎      𝑏     𝑐      𝑑             𝑛 = 200
   𝑆𝑡𝑎𝑟𝑡      0      150     0     0       0            𝜎𝑛 ≈ 3.08 > 3
     𝑎        0       0     50    100      0            ⇒𝑘=5
     𝑏        0       0     10    40      10            |(𝑏, 𝑑)|>𝐿 = 10 > 5 = 𝑘
     𝑐        0       0      0     0     140
     𝑑       150      0     0      0      0             ⇒ (𝑏, 𝑑) ∈>𝐿⟳ is main behavior.

   To eliminate this problem, we propose a preprocessing step for the hypothesis tests to reduce
the number of loop iterations if there is a loop present in the process. To do so, we alter the
traces of our event log that contain such a loop for the hypothesis tests. The idea is to construct
a DFG for this single trace and find a path from 𝑆𝑡𝑎𝑟𝑡 to 𝐸𝑛𝑑 that cycles less frequently through
the loop in the trace, if this is possible. To do so, we firstly introduce the notion of a Trace DFG:

Definition 6 (Trace DFG). Let 𝜎 = ⟨𝑡1 , . . . , 𝑡𝑛 ⟩ be a trace. The Trace DFG for 𝜎 is a directed
multigraph 𝐺 = (𝑉, 𝐸) with

    • 𝑉 := {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑} ∪ {𝑡1 , . . . , 𝑡𝑛 } and
    • 𝐸 : (𝑇 ∪ {𝑆𝑡𝑎𝑟𝑡, 𝐸𝑛𝑑})2 → N0 ,
                              {︃
                                1                                      if 𝑡1 = 𝑆𝑡𝑎𝑟𝑡 or 𝑡2 = 𝐸𝑛𝑑
              𝐸((𝑡1 , 𝑡2 )) =
                                |{𝑖 | 𝑡𝑖 = 𝑡1 ∧ 𝑡𝑖+1 = 𝑡2 }|           otherwise

   For a trace 𝜎, the Trace DFG is therefore the DFG of the log [𝜎], where the edge weights of
said DFG are interpreted as multiple edges. The Trace DFG for the trace ⟨𝑎, 𝑏, 𝑐, 𝑑⟩ for example
is a simple path, the Trace DFG for the trace ⟨𝑎, 𝑏, . . . , 𝑏, 𝑑⟩ is shown in Figure 4.
                                                  ⏟ ⏞
                                                        51 times

                                                   50

                                   1           1           1             1
                        𝑠𝑡𝑎𝑟𝑡            𝑎          𝑏              𝑑           𝑒𝑛𝑑


Figure 4: The Trace DFG for ⟨𝑎, 𝑏, . . . , 𝑏, 𝑑⟩ ∈ ℒ. Edge weights denote the multiplicity of edges, the
artificial dashed edge (which is not part of the Trace DFG) makes the DFG eulerian.

   To reduce the number of loop iterations in the Trace DFG we first introduce a new, artificial
edge from 𝐸𝑛𝑑 to 𝑆𝑡𝑎𝑟𝑡, as shown by the dashed dart in Figure 4. The resulting multigraph 𝐺
is eulerian, which means there is a eulerian cycle in 𝐺.
Theorem 1. Let 𝜎 ∈ 𝑇 * be a trace and 𝐺 = (𝑉, 𝐸) the Trace DFG of 𝜎. Then, the graph
𝐺′ = (𝑉, 𝐸 ∪ {(𝐸𝑛𝑑, 𝑆𝑡𝑎𝑟𝑡)} is eulerian, i.e. contains a eulerian cycle.

Proof. By construction of the Trace DFG, 𝜎 is a path from 𝑆𝑡𝑎𝑟𝑡 to 𝐸𝑛𝑑 that uses every edge
in 𝐺 exactly once. With the artificial edge (𝐸𝑛𝑑, 𝑆𝑡𝑎𝑟𝑡) in 𝐺′ this path becomes a cycle that
uses every edge in 𝐺′ exactly once, i.e. a eulerian cycle.

   Since this extended Trace DFG 𝐺′ is eulerian, we can find the shortest cycle through 𝐺′ that
uses every type of edge in 𝐺′ at least once, but no more than the multiplicity of the edges allow.
The problem of finding this shortest cycle is known as the Chinese Postman Problem, which
was introduced by Edmonds et al. [9] and can be solved in 𝑂(|𝑉 |3 ) time. In the example for 𝐿⟳ ,
the shortest cycle is (𝑠𝑡𝑎𝑟𝑡, 𝑎, 𝑏, 𝑏, 𝑑, 𝑒𝑛𝑑, 𝑠𝑡𝑎𝑟𝑡), which cycles only once through the loop at
the vertex 𝑏. From this cycle we can easily construct the shortened trace ⟨𝑎, 𝑏, 𝑏, 𝑑⟩ that features
less loop iterations than the original trace and hence does not disrupt the hypothesis tests. We
therefore use this trace instead of the original trace to execute the hypothesis tests and find
that the neighborhood (𝑏, 𝑑) is main behavior. However, when constructing the DFG for 𝐿⟳ ,
we calculate the multiplicities based on the original (i.e. not shortened) traces.


6. Evaluation
We implemented the algorithm described in this paper in python, version 3.8, using the process
mining library pm4py [12]. The latter provides functions to calculate the fitness, precision,
generalization and simplicity for a given Petri net. To calculate these quality-measures, we used
the following techniques provided by pm4py:

    • Fitness: The alignment-based fitness is computed and the percentage of completely fit
      traces is returned, as well as the average fitness value of the single traces,
    • Precision: The Align-ETConformance is computed and returned, see [13],
    • Generalization: Computed as described in [14],
    • Simplicity: Computed as described in [15].

To check the Petri net for soundness, the external tool WOFLAN [16] is used. Aside from
pm4py there are other tools, like Entropia [17], that are also interesting for efficient and robust
conformance checking. We plan to investigate these tools in the future and use them to further
evaluate our results.
   We chose some real-life event logs that are often used to evaluate new process discovery
techniques, to test and compare our implementation to other mining-techniques. We used

    • BPI_Challenge_2012.xes (BPI12), which can be found at [18],
    • DomesticDeclarations.xes (DD),
    • InternationalDeclarations.xes (ID),
    • PermitLog.xes (PL),
    • PrepaidTravelCost.xes (PTC),
    • RequestForPayment.xes (RFP), all of which can be found at [19].
Table 1
Characteristics of the event logs that were chosen for the evaluation.
                                        BPI12      DD        ID      PL     PTC     RFP
            Number of unique traces      4 366      99       753    1 478    202     89
             Total number of traces     13 087    10 500    6 449   7 065   2 099   6 886
               Number of cycles         43 566     121       585    3 842    113     80
             Longest Trace Length         175       24        27     90      21      20
             Shortest Trace Length         3         1         3      3       1       1
             Average Trace Length       20.04      5.37     11.19   12.25   8.69    5.34


Table 1 gives an overview over some important characteristics of these event logs.
  First, we compared our approach with the fixed parameters 𝑝0 = 0.05 and 𝛼 = 0.05 with
the following well-known mining approaches that consider outliers in the event log:

    • The Inductive Miner infrequent (IMi) [6] creates a process tree out of the DFG for an event
      log and creates a Petri net from that process tree. In a preprocessing step, all edges that
      are "too infrequent" are removed, by going through all events 𝑒 and deleting all outgoing
      edges of 𝑒 with a frequency less than 𝑘 times the frequency of the strongest outgoing
      edge of 𝑒, where 𝑘 is a user-chosen parameter between 0 and 1. We chose 0.2 as the
      parameter for IMi.
    • The Directly Follows Miner (DFM) [5] creates a DFG and scans it for infrequent behavior
      by determining the edges with the least frequency. Then all traces that feature the direct
      neighborhood modeled with an infrequent edge are removed from a copy of the event
      log. This step is repeated until at most a fraction of 𝑡 traces were removed, where 𝑡 is a
      user-chosen threshold between 0 and 1. Then, the DFG for the new event log is computed
      (or the old DFG is adjusted accordingly) and a Petri net is constructed from the DFG. We
      chose 0.1 as the parameter for DFM.

We also compared our approach to the technique presented in [7], where hypothesis tests
are executed, and all infrequent edges are deleted from the DFG without the goal to maintain
soundness. For this approach, we chose 𝑝0 = 0.95 and 𝛼 = 0.05, so the hypothesis tests are
performed exactly like with the new technique.
  Since the DFM and our approach result in a DFG instead of a Petri net, we also need to
convert the DFG into a Petri net to compute the quality measures as implemented in pm4py. To
do so we constructed a labeled Petri net 𝑁 = (𝑃, 𝑇, 𝐹, 𝑙) from a DFG 𝐺 = (𝑉, 𝐸, 𝑤) as follows:

    • For each node 𝑣 ∈ 𝑉 a place 𝑣 ∈ 𝑃 is added.
    • For each edge (𝑢, 𝑣) ∈ 𝑉 × (𝑉 ∖ {𝐸𝑛𝑑}), a transition 𝑡𝑢,𝑣 ∈ 𝑇 with 𝑙(𝑡𝑢,𝑣 ) = 𝑣 is added.
    • For each edge (𝑢, 𝐸𝑛𝑑) ∈ 𝑉 × {𝐸𝑛𝑑}, we also add an edge 𝑡𝑢,𝐸𝑛𝑑 ∈ 𝑇 . However, this is
      a silent transition, i.e. 𝑙(𝑡𝑢,𝐸𝑛𝑑 ) = 𝜏 .
    • Finally, for each transition 𝑡𝑢,𝑣 ∈ 𝑇 two arcs are added: (𝑢, 𝑡𝑢,𝑣 ) ∈ 𝐹 and (𝑡𝑢,𝑣 , 𝑣) ∈ 𝐹 .

  The results of executing the algorithms and the conformance checking methods on the
resulting Petri nets can be found in Figure 5. For all evaluated methods and logs, Fitness and
Generalization were similarly good. Simplicity also showed consistently good values, except for
a few very good results from the DFM. The percentage of fitting traces was consistently very
good for DFM, while the other techniques achieved variously good values for different logs. The
largest differences between the methods were observed for the Precision and the F-Score. Here,
the results for different logs vary greatly for the DFM, while IMi achieves rather low results.
Both of our new approaches achieve very good values most of the time in these two categories.
   It is notable that the approach of [7] (which we call HT) always leads to a better precision
than our new approach, which is not surprising considering that we leave some infrequent
edges in the graph to maintain soundness. For example, if there is a single trace featuring much
infrequent behavior, from which almost everything is deleted from the DFG except for an edge
𝑒, it is possible to use this edge 𝑒 in the resulting Petri Net, even though it does not fit to any
behavior seen in the event log. Due to the higher precision, the F-Score is also better than in
our approach. Further, the simplicity of the net mined by HT is often much higher than in the
new approach, simply because more edges are deleted. However, contrary to our new approach,
HT never led to a sound net for any of the tested event logs.
   IMi scores similary to our approach, except for precision and therefore F-Score. This is due to
the fact that IMi classifies edges as infrequent with a rather local argument – the frequency of an
edge is compared to the frequencies of all other outgoing edges, but not to the frequencies in the
full graph. Hence, IMi may often delete single edges from many traces. For example, for the very
simple event log [⟨𝑎, 𝑏, 𝑐⟩𝑙 ], it can easily be forced to delete the edge (𝑏, 𝑐) by introducing a trace
          𝑙
⟨𝑥, 𝑏, 𝑦⟩ 𝑘 +1 to the event log. Then, the edge (𝑏, 𝑐) in the DFG becomes infrequent compared
to the other outgoing edge (𝑏, 𝑦) and is therefore deleted. Then, the trace ⟨𝑎, 𝑏, 𝑐⟩ can not be
replayed, which leads to a worse fitness, but the trace ⟨𝑎, 𝑏, 𝑦⟩ can be replayed, even though it
is not present in the event log.
   With the user-chosen threshold that affects the behavior of the DFM, we have direct control
over the fitness of the mined model, so this metric features always very good values. However,
since DFM deletes only traces and not single edges, it seems odd that the precision of DFM is
as low as it is for some event logs. This may be associated with the fact that in the DFG there
is exactly one vertex for every event present in the event log, and therefore some traces that
share an event name but have nothing else in common lead to a Petri net where traces can be
replayed that are not part of the event log. However, our approach shares the same problem
concerning the DFG, which makes it unclear why the precision of our approaches have a much
higher precision most of the time. To investigate this behavior further could be an interesting
task for future work.
   When comparing the simpler version of our method to the one with loop reduction, it can be
seen that both versions achieve similar results, which can be explained by the structure of the
chosen event logs. It may be interesting to investigate further how the two variants differ on
event logs with many loops.
   During the execution of the different algorithms, the times required to construct a model
in each case were measured. These do not include the calculation of the metrics that were
afterwards calculated on it. In Table 2, all execution times are given in seconds. Hence, for all
event logs and algorithms evaluated, a result could be calculated within a few seconds.
Table 2
The execution times in seconds for each investigated approach and each chosen event log.
                 BPI12          DD            ID            PL           PTC          RFP
      HTs     00.87397 s     00.09229 s   00.20281 s    02.73899 s    00.07453 s   00.06550 s
      HTl     02.69569 s     00.10818 s   00.36392 s    01.00158 s    00.10343 s   00.07757 s
      HT      05.493242 s    00.09222 s   00.18860 s    00.34547 s    00.05466 s   00.06677 s
      IMi     12.15239 s     01.67628 s   03.43188 s    07.16169 s    01.54955 s   01.86956 s
      DFM     01.99909 s     00.14701 s   00.31106 s    00.68258 s    00.05291 s   00.09879 s


7. Conclusion
In this paper, we revisited the use of hypothesis tests of Petrak et al. [7] to detect infrequent
neighborhood relations in a given event log. We proposed a simpler variant for these tests in
the form of a left-sided hypothesis test and changed the output to a Directly Follows Graph
instead of a footprint, in order to be able to selectively remove infrequent behavior without
destroying the soundness of the resulting DFG. Furthermore, we showed theoretically that
high loop iterations in an event log can lead to undesired results of the hypothesis tests and
solved this problem by implementing a preprocessing step that reduces the number of loop
iterations to an amount small enough that the sample size used for the hypothesis tests is not
bloated by the loop. The proposed techniques are very easy to understand and implement. We
compared both the hypothesis tests without loop-shortening and the hypothesis tests with
loop-shortening with the results of the old technique using hypothesis tests [7], the Inductive
Miner infrequent [6] and the Directly Follows Miner [5] and found that our results for fitness,
precision, generalization, simplicity and F-score are rather consistent and high – also compared
to these techniques.
   Further research should include the evaluation of logs with high loop iterations to verify that
the shortening of loops is not only reasonable in theory, but also in practice. It is interesting how
often the loop-shortening is relevant in real-life logs and under what circumstances (concerning
the underlying process) the necessity of loop-shortening is probable. Also, there are some
more techniques that outliers before mining a process model [20, 21, 22], some of which also
use statistical ideas. It would be very interesting to see how our approach competes to these
techniques. Since the DFG loses information on concurrency, a preprocessing step to detect
concurrency may be a topic of further research – as well as the investigation on the existence
of event logs that lead to bad results with our approach due to concurrency.
References
 [1] S. J. J. Leemans, D. Fahland, W. M. P. van der Aalst, Discovering Block-Structured Process
     Models from Event Logs Containing Infrequent Behaviour, in: Business Process Manage-
     ment Workshops - BPM 2013 International Workshops, Beijing, China, August 26, 2013,
     Revised Papers, volume 171 of Lecture Notes in Business Information Processing, 2013, pp.
     66–78.
 [2] C. W. Günther, W. M. P. van der Aalst, Fuzzy Mining – Adaptive Process Simplification
     Based on Multi-perspective Metrics, in: Business Process Management, 2007, pp. 328–343.
 [3] A. Weijters, W. Aalst, van der, A. Alves De Medeiros, Process mining with the Heuristic-
     sMiner algorithm, BETA publicatie : working papers, Technische Universiteit Eindhoven,
     2006.
 [4] A. J. M. M. Weijters, J. T. S. Ribeiro, Flexible Heuristics Miner (FHM), in: 2011 IEEE
     Symposium on Computational Intelligence and Data Mining (CIDM), 2011, pp. 310–317.
 [5] S. J. J. Leemans, E. Poppe, M. T. Wynn, Directly Follows-Based Process Mining: Exploration
     & a Case Study, in: International Conference on Process Mining, ICPM 2019, Aachen,
     Germany, June 24-26, 2019, IEEE, 2019, pp. 25–32.
 [6] S. J. J. Leemans, D. Fahland, W. M. P. van der Aalst, Discovering Block-Structured Process
     Models from Incomplete Event Logs, in: G. Ciardo, E. Kindler (Eds.), Application and
     Theory of Petri Nets and Concurrency - 35th International Conference, PETRI NETS 2014,
     Tunis, Tunisia, June 23-27, 2014. Proceedings, volume 8489 of Lecture Notes in Computer
     Science, Springer, 2014, pp. 91–110.
 [7] L. Petrak, R. Lorenz, Detecting Infrequent Behavior in Event Logs using Statistical Inference,
     in: Proceedings of the International Workshop on Algorithms & Theories for the Analysis
     of Event Data 2020, June 24, 2020, volume 2625 of CEUR Workshop Proceedings, 2020, pp.
     33–48.
 [8] J. Carmona, B. F. van Dongen, A. Solti, M. Weidlich, Conformance Checking - Relating
     Processes and Models, Springer, 2018. URL: https://doi.org/10.1007/978-3-319-99414-7.
     doi:10.1007/978-3-319-99414-7.
 [9] J. R. Edmonds, E. L. Johnson, Matching, Euler tours and the Chinese postman, Math.
     Program. 5 (1973) 88–124.
[10] W. M. van der Aalst, B. F. van Dongen, Discovering Petri Nets from Event Logs, in:
     Transactions on Petri Nets and Other Models of Concurrency VII, Springer, 2013, pp.
     372–422.
[11] G. Hornsteiner, Daten und Statistik, Eine praktische Einführung für den Bachelor in
     Psychologie und Sozialwissenschaften. Berlin/Heidelberg (2012).
[12] A. Berti, S. J. Van Zelst, W. van der Aalst, Process Mining for Python (pm4py): bridging
     the gap between process-and data science, arXiv preprint arXiv:1905.06169 (2019).
[13] A. Adriansyah, J. Munoz-Gama, J. Carmona, B. Dongen, W. Aalst, Measuring preci-
     sion of modeled behavior, Information Systems and e-Business Management 13 (2014).
     doi:10.1007/s10257-014-0234-7.
[14] J. Buijs, B. Dongen, W. Aalst, Quality Dimensions in Process Discovery: The Importance
     of Fitness, Precision, Generalization and Simplicity, International Journal of Cooperative
     Information Systems 23 (2014) 1440001. doi:10.1142/S0218843014400012.
[15] F. R. Blum, Metrics in process discovery, 2015.
[16] W. M. P. van der Aalst, Woflan: A Petri-net-based Workflow Analyzer, Systems analysis
     modelling simulation 35 (1999) 345–357. URL: https://publications.rwth-aachen.de/record/
     714622.
[17] A. Polyvyanyy, H. Alkhammash, C. D. Ciccio, L. García-Bañuelos, A. A. Kalenkova, S. J. J.
     Leemans, J. Mendling, A. Moffat, M. Weidlich, Entropia: A Family of Entropy-Based
     Conformance Checking Measures for Process Mining, CoRR abs/2008.09558 (2020). URL:
     https://arxiv.org/abs/2008.09558. arXiv:2008.09558.
[18] B. van Dongen, BPI Challenge 2012, 2012. URL: https://data.4tu.nl/articles/dataset/
     BPI_Challenge_2012/12689204/1. doi:10.4121/uuid:3926db30-f712-4394-aebc-
     75976070e91f.
[19] B. van Dongen, BPI Challenge 2020, 2020. URL: https://data.4tu.nl/collections/
     BPI_Challenge_2020/5065541/1.          doi:10.4121/uuid:52fb97d4-4588-43c9-9d04-
     3604d4613b51.
[20] R. Conforti, M. L. Rosa, A. H. t. Hofstede, Filtering Out Infrequent Behavior from Business
     Process Event Logs, IEEE Transactions on Knowledge and Data Engineering 29 (2017)
     300–314. doi:10.1109/TKDE.2016.2614680.
[21] X. Lu, D. Fahland, W. M. van der Aalst, Interactively Exploring Logs and Mining Models
     with Clustering, Filtering, and Relabeling, in: BPM, 2016.
[22] D. Brons, R. Scheepens, D. Fahland, Striking a new Balance in Accuracy and Simplicity
     with the Probabilistic Inductive Miner, in: 2021 3rd International Conference on Process
     Mining (ICPM), 2021, pp. 32–39. doi:10.1109/ICPM53251.2021.9576864.
Figure 5: Comparison of the resulting quality-metrics for the approach described in this paper, using
hypothesis tests and maintaining soundness (HTs), the same approach using loop-reduction (HTl) the
old technique using hypothesis tests (HT), the Inductive Miner infrequent (IMi) and the Directly Follows
Miner (DFM).