=Paper= {{Paper |id=Vol-2454/paper_46 |storemode=property |title=k-process: Model-Conformance-based Clustering of Process Instances |pdfUrl=https://ceur-ws.org/Vol-2454/paper_46.pdf |volume=Vol-2454 |authors=Florian Richter,Florian Wahl,Alona Sydorova,Thomas Seidl |dblpUrl=https://dblp.org/rec/conf/lwa/RichterWS019 }} ==k-process: Model-Conformance-based Clustering of Process Instances== https://ceur-ws.org/Vol-2454/paper_46.pdf
            k-process: Model-Conformance-based
                 Clustering of Process Instances


        Florian Richter, Florian Wahl, Alona Sydorova and Thomas Seidl
                               LMU Munich, Germany
                          {richter, seidl}@dbs.ifi.lmu.de
                       {f.wahl,alona.sydorova}@campus.lmu.de




        Abstract.    Clustering of objects is a prominent task in many applica-
        tions. The goal is to uncover similarities or correlations for later reason-
        ing. In process mining trace clustering reveals groups of process instances
        which behave similar in some dimensions, i.e. usage of same resources,
        time proles, activity sequences. Conservative clustering on event se-
        quences is usually performed by mapping case attributes into a vec-
        tor space and applying a vector-based clustering method on these data
        points. The choice of a suitable vector-space is not trivial and inuences
        the mined results. Our novel approach is designed as a process oriented
        alternative and avoids a vector-space embedding. Combining the simplic-
        ity of k-means as a clustering strategy and the expressiveness of process
        models as cluster representatives results in a data-driven aggregation of
        instances instead of a tedious vector space evaluation.
        Keywords: Trace Clustering, Log Partitioning, Process Mining



1     Introduction


Clustering is one of the most fundamental tasks in the wide eld of data sci-
ence. The various domains of its applications are as numerous as the number of
clustering techniques. Objects of dierent domains and of various shapes shall
be aggregated based on the similarity of their object properties. In this work we
focus on the aggregation of event sequences.
    In the prominent eld of process mining [1], event sequences are the main
research focus and represent process executions. Especially large enterprises store
these executions, which consist of their daily operations and data in the form of
log les. Mostly relying on enterprise-resource-planning software every action in
the business is logged. The approaches to transform this raw logged data into
valuable business intelligence are covered by the term business process mining.
However, processes occur in many domains, from low-scale scenarios like serving
customers in a restaurant to high-scale applications like customer support in a
global company.
    Especially in large-scale applications with many process executions not all
instances are performed in the same way. Either due to changes over time caused
    Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
2      F. Richter, F. Wahl, A. Sydorova, T. Seidl

by changing requirements or by dierent actors performing the same task dif-
ferently, a common process can be split into subprocesses. Each event sequence
can be a member of one of the subprocesses or a candidate for more than one
subprocess. Some applications require the option to allow sequences to be out-
liers or noise. In this work we perform partition clustering which assigns every
sequence to exactly one cluster. Although events can contain various information
like their duration or used resources, we only consider the sequences of actions
and perform trace clustering. As an example, imagine an airplane boarding pro-
cess. The time spent for the boarding phase is usually a very costly process and
every delay reduces the income of the airline. Here it is benecial to identify po-
tential clusters, which refer to groups of people, that take more time due to more
complex boarding steps. After identication of dicult and time-consuming cus-
tomer groups, the process can be redesigned by shifting actions to other time
frames, e.g. loading wheelchairs in advance, pre-validate complex travel grants,
put families with small children into extra queues etc.
    Our novel approach k-process clusters sequences of events depending on how
well they t into a common process model. We utilize the very well known
k -means framework as a baseline and substitute the two main steps centroid
determination and cluster assignment with appropriate process mining methods.
Process models can be discovered with process mining techniques like the α-
Miner [2]. Process models provide a suitable representation of trace clusters.
Fitness computations like token replay perform the cluster assignment then.


2   Related Work



Clustering is a very well researched domain in data mining. Even the closer
eld of sequence clustering, which is very prominent in bioinformatics, is too
large to discuss related work suciently here. Therefore we will focus only on
the clustering of event sequences. An event sequence represents a consecutively
perceived execution of actions. Although the perception is sequential, events
might also occur concurrently or with repetition. As such event sequences are
called traces we refer to this domain by the more popular term of trace clustering.
In this clustering domain, large sets of traces called event logs are considered.
They are the main starting point of process mining. The goal is to split the whole
log into sublogs, so that each sublog describes similar action patterns while the
similarity of behavior between two sublogs shall be minimized.
    There are several dierent approaches used for trace clustering. The rst es-
tablished methods rely on a vector space embedding by mapping features of the
sequences to points in a vector space. The basic approach is a bag-of-events em-
bedding, which ignores the sequential information and represents the sequence
by the frequencies of each sequence item. Bag-of-motifs or n-gram approaches
increase the dimensionality of the representation in case of long sequences. At-
tached meta data can also be embedded into the vector space. This allows to
use the full range of clusterings operating on vector spaces like k-means.
        k-process: Model-Conformance-based Clustering of Process Instances       3

    The vector space embedding of traces was rst explored by Greco et al.
[11]. Song et al. [13] introduced trace proles which allow dierent perspectives
and changes of focus on certain aspects in the traces. Ferreira et al. [10] clus-
tered traces by building rst-order Markov models for each cluster with the
expectation-maximization technique. In [4] Bose et al. introduced a specialized
edit distance on traces and proposed an agglomerative hierarchical clustering
approach. The same year they developed a second approach presented in [5]
that uses subtraces of arbitrary length instead of n-grams to represent traces as
feature vectors.
    These techniques usually yield a partitioning of traces into sublogs such that
the clusters have a high intra-similarity and a low inter-similarity as desired.
However, the similarity of the sublog traces is not considered. Thus these tech-
niques can be aggregated as passive trace clustering while we now traverse to
active trace clustering approaches, which was rstly introduced by De Weerdt
[6]. ActiTraC, as presented in [6] takes the process discovery perspective into
account. Traces are not mapped into a vector space but are greedily aggregated
to clusters until the tness of the model for this cluster drops. Sun et al. [14]
also followed the hierarchical clustering method with compound trace clustering
CTC. They developed a top-down trace clustering splitting logs in sublogs such
that the complexity of each sublog matches the average complexity. The log with
the highest model complexity is then split again.
    Our novel approach follows the principle of active trace clustering. However,
the discussed works consist of hierarchical top-down approaches or feature space
embeddings. To the best of our knowledge there are no model-aware trace clus-
tering techniques so far without using a hierarchy. We want to propose an active
trace clustering method which allows to specify the number of clusters without
dening any distance functions. We achieve this minimal parameter setting by
using the framework of k-means while avoiding a vector space embedding to keep
the model-awareness.

3   Preliminaries.


Now we want to give a brief but formal introduction of the terminology used in
process mining.
    Every process allows a set of actions to be performed which are formally called
activities. We refer to the nite set of all valid activities as A. This is usually
an application dependent set of textual action descriptors. A single execution of
an activity is called an event. Events can be assigned to a common case using
an identier. We assume that the cases can be identied using a natural number
for simplicity.
Denition 1. An event e = (c, a, t) ∈ N × A × N is a tuple consisting of a case
identier c, an activity a and a timestamp t.
   Each event describes the execution of a certain activity corresponding to the
case c at time t. We can continue with cases and logs:
4       F. Richter, F. Wahl, A. Sydorova, T. Seidl

Denition 2. A log L : N × A × N → N ∪ {0} is a multiset of events.

    Semantically an event is a unique occurrence of an activity. Nevertheless, it
can happen that the same case contains repetitions of a particular activity. If
such events occur consecutively in a very small time frame, they might share
the same event representation. In most applications the temporal granularity
is usually chosen with regards to the frequency of events. This means that the
resolution of the timestamps prevents multiple occurrences of identical events.
As this is rather a technical issue we will consider only sets instead of multisets.
Finally, we can introduce the denition for traces:

Denition 3. Let L be a log and c a case identier. A trace is a nite sequence
of events σc = (e1 , e2 , . . . , en ) where πc (ei ) = c for all 1 ≤ i ≤ n. Furthermore,
the trace is ordered based on the event timestamps, formally πt (ei ) ≤ πt (ej ) ⇐⇒
i ≤ j.

    Although a trace consists of consecutive events, the ordering does not en-
force a strict linear structure for the trace clustering. In sequence clustering the
similarity of two sequences is usually determined by the order and occurrence
of its sequence objects. Considering trace clustering, the similarity is depending
on the process. Two traces might be dierent regarding their activity ordering.
However if the process allows both execution orders, we can consider both traces
as similar. In the following we will introduce our novel approach which puts
emphasis on this important aspect by design. Using process mining algorithms
includes the process view into the clustering.

4    Algorithm


The central idea of k-process is to use the k-means framework and replace mean
computation and cluster assignment with techniques from process mining.
    We start by randomly assigning each trace to one of the k cluster candidates
which partitions the log into k sublogs. Random assignment is a common strategy
for the start of k-means. We use process discovery to determine a representative
model for each cluster candidate. So in each iteration process models for all k
clusters are recomputed rst. As shown in Tab. 1, the process model is used as a
centroid for the particular cluster. In the cluster assignment step we determine
for each trace the model, that allows the best replay for this trace. This is the
bottleneck of this method as conformance checking can be slow for many traces,
especially for a large number of activities. Like in k-means we repeat these steps
until stability of cluster assignment is reached or a given iteration threshold
is exceeded. We used the Heuristic Miner [3, 15] for Model Discovery. Other
discovery algorithms can also be applied as long as they generate a model usable
for conformance checking. It generates workow nets, a special case of Petri
nets. Token Replay [3] is used for Conformance Checking by replaying traces on
a model and handles also varying trace lengths.
        k-process: Model-Conformance-based Clustering of Process Instances              5

Table 1. Comparison of the traditional k -means framework for vector data with our
novel framework k-process incorporating model discovery and conformance checking.

                k-means                             k-process
Preprocessing   Vector-space embedding              -
Start           Assign all points to k partitions   Assign all sequences to k sublogs
Assignment




Centroid      Calculate mean for each partition Discover process model for each
Determination                                   sublog


                                                                   Discovery




Cluster         Assign each point to the nearest Assign each sequence to the best
Assignment      cluster centroid                 tting model




Output after    Partitioning of points represented Partitioning of sequences repre-
Convergence     by mean points                     sented by process models




4.1   Adjustments of the Heuristic Miner
Process discovery is performed for exploratory analysis and conformance check-
ing is mostly used for a small series of traces to check their validity regarding an
existing model. We use both methods exhaustively as many iterations can con-
tain many re-discoveries and a magnicent number of conformance checks. The
performance of the process discovery is mainly neglectable. The conformance
checking, however, reduces the performance signicantly. Each trace has to be
6        F. Richter, F. Wahl, A. Sydorova, T. Seidl

replayed k times in each iteration. Therefore we adjust the Heuristic Miner by re-
ducing the accuracy of the tness computation. We do not need a most accurate
tness score as we only need to determine the best tting model. In the next step
the models are recomputed again, which causes the determined tness scores to
become deprecated after each iteration. To describe parallel executions and forks
in the process model the Heuristic Miner uses AND and XOR-relations. Token
replay can be performed faster if the model only contains AND-conjunctions, so
we discard all XOR-disjunctions.
     For a given multiset of activity pairs (ai , aj ) where aj occurs directly after
ai , a logical expression can be computed to describe the relations between ac-
tivities. The output of the Heuristic Miner is a so called Causal Matrix which
contains input and output expressions for each considered activity. To eciently
obtain a WF-net which meets the aforementioned requirement we use a simple
and straightforward approximation of the relations between activities. First of
all we consider only the output expressions and ignore the input expressions
which saves a lot of computation time. Furthermore, we only use expressions in
the Conjunctive Normal Form (CNF). To approximate an appropriate logical
expression for an output set which fullls these requirements we present the fol-
lowing algorithm: Given an output set O of an activity X we select the biggest
subset S where all nodes are in an AND-relation, formally:
     ∀a, b ∈ S : X ⇒L a ∧ b
     ∀Z ⊆ O : (∀a, b ∈ Z : X ⇒L a ∧ b) ⇒ ((S = Z) ∨ (|Z| ≤ |S|))
For denition of an AND-relationship see [3, 15].
    Therefore, in the expression we generate the elements of S are then in an
AND-relation. If this set is empty, all elements are in a XOR-relation and we
are done. Otherwise we obtain a conjunction of activities. In the beginning each
disjunction of this conjunction only consists of one activity. Then each element
e ∈ (O \ S) is checked if it can be added to a disjunction. An activity is added
to a disjunction if the number of elements of this disjunction which are in XOR-
relation to this activity divided by the total number of elements in this disjunc-
tion is higher than a threshold. Formally, an element e is added to a disjunction
D when |{∀a∈D|X⇒ |D|
                     L a∨e}|
                             > 0.5.
    We obtain an output expression in CNF for every activity which should be
considered in the process model. On this basis, a WF-net can be easily generated.
At rst we create a labeled transition for each activity. Then we need to translate
the output expression into places and arcs. Fig. 1 illustrates an exemplary process
model. The determined expressions are A → (B ∨ C) ∧ (D ∨ C), B → F , C → F ,
D → E , E → G, F → G, and G → N one.
    As all expressions are in CNF, we can translate these conjunctions directly to
AND-Splits. An AND-Split in a WF-net is represented by arcs from a transition
to places, so we create a place for each element of the conjunction (which is a
disjunction) and insert an arc from each origin activity to each of its new places.
The next step is to insert arcs from the places to the activities of the disjunction.
Finally we insert source and sink places.
          k-process: Model-Conformance-based Clustering of Process Instances     7




Fig. 1. A process model is transformed into an approximated WF-net by replacing all
XOR-disjunctions.


5     Experimental Evaluation


We used real world event logs from the Business Process Intelligence (BPI) Chal-
lenge from years 2012 [7], 2015 [8], and 2017 [9]. The datasets can be downloaded
from the 4TU.Centre for Research Data1 . In the following these datasets are de-
noted by L12 , L15 and L17 . Table 2 gives an overview of the size of these event
logs.

                      Table 2. Overview over real life event logs

           event log number of traces number of activities number of events
              L12             13.087                   24          262.200
              L15               5.649                 500          262.628
              L17             31.509                   26        1.202.267


   The event logs L12 and L17 describe the same loan application process of a
Dutch nancial institute, and have 6 activities in common. For our experiments
we used excerpts from both of these event logs each containing 100 traces. Then
we merged them to one event log and used k-process in order to see if the
1
    https://data.4tu.nl/repository/collection:event_logs_real
8      F. Richter, F. Wahl, A. Sydorova, T. Seidl

merged log is split into the two original event logs again. For each application
we randomly sampled excerpts anew. In the following we denote the random
sampled event logs as L12 ∪ L17 .
    The event log L15 consists of 5 event logs each from a dierent Dutch muni-
cipality. We processed it the same way as the event logs above. The excerpts
also contain 100 traces and we tried to split the merged event log into 5 event
logs with the help of k-process. L15 denotes in the following the random sampled
event logs.

5.1   Examination of clustering goodness measure
We know the ground truth clustering of the event log L12 ∪ L17 which is the
splitting in the two original event logs of the years 2012 and 2017. We randomly
assigned traces of L12 ∪ L17 to one of the two possible clusters to obtain a ran-
domly generated clustering. Then we compared this clustering with the ground
truth to determine the F1-measure and calculated the Silhouette-Coecient [12]
for the generated clustering. Figure 2 (a) shows the achieved F1-Measure of the
random clustering against the Silhouette-Coecient of the clustering for each
application. One can clearly see that the value of the Silhouette-Coecient lin-
early correlates with the F1-measure,which can be seen from the linear regression
line in Figure 2 (a). The value of the Pearson coecient of 0.995 also indicates
a nearly perfect positive linear correlation.
    For the second experiment we used the event log of the rst municipality from
the event log L15 denoted by L15,1 . We created a copy of L15,1 with reversed
order of events in each trace and denote this new log as L15,1̃ . Analogously to
the rst experiment we merged the two event logs (L15,1 ∪ L15,1̃ ) and generated
random clusterings. Figure 2 (b) shows the achieved F1-measure of the random
clustering against the Silhouette-Coecient of this clustering, for multiple ap-
plications. Again the Silhouette-Coecient correlates linear to the F1-measure
even if the Pearson coecient with a value of 0.990 is slightly worse as in the
rst experiment. Both experiments show that the Silhouette-Coecient with the
Levenshtein-Distance as distance function is a suitable indicator for the goodness
of trace clusterings.

5.2   Comparison between k-process and n-grams approach
To compare the proposed k-process method and the n-grams approach we applied
both trace clustering techniques to the event logs L12 ∪ L17 and L15 . For the
n-grams approach we used n = 2 since we found out that this setting produced
the best results.
    We clustered L12 ∪ L17 using the following parameter setting for the Heuris-
tic Miner: Min-Occurrence-Threshold =0.3; Dependency Relation Threshold =0.5.
Figure 3 (a) shows average values of the Silhouette-Coecient after each iter-
ation of the n-grams approach and k-process. By observing the increase of the
Silhouette-Coecient values one can clearly see how k-process iteratively im-
proves the clustering. In contrast to that, the n-grams approach was not able to
         k-process: Model-Conformance-based Clustering of Process Instances         9




                (a) L12 ∪ L17                           (b) L15,1 ∪ L15,1̃

Fig. 2. F1-measure and the Silhouette-Coecient for randomly generated clustering
for L12 ∪ L17 and L15,1 ∪ L15,1̃ .


improve the clustering: the average value of the Silhouette-Coecient stagnates
remaining around 0.31. We also applied k-process to L15 with the following pa-
rameters for the Heuristic Miner: Min-Occurrence-Threshold =0.1; Dependency
Relation Threshold =0.3; AND-Threshold =0.1. As one can see in Figure 3 (b)
k -process produced better results than the n-grams approach for this event log
as well.




                (a) L12 ∪ L17                               (b) L15

Fig. 3. Average values of the Silhouette-Coecient after each iteration of the n-grams
approach and k-process applied to L12 ∪ L17 and L15 .




5.3    Sensitiveness of parameters of the Heuristic Miner
In this section we show the sensitiveness of the parameters Min-Occurrence-
Threshold and Dependency Relation Threshold of the Heuristic Miner. It depends
10      F. Richter, F. Wahl, A. Sydorova, T. Seidl

on these parameters if the Heuristic Miner generates an appropriate model that
is neither over- nor undertted, so that the reassignment step can work prop-
erly. With a good parameter setting this works ne as we have seen in the last
section. But it is not obvious how to select these parameters while they could
aect the whole clustering result. For instance, in the last section we clustered
the event log L12 ∪ L17 with the parameters Min-Occurrence-Threshold =0.3 and
Dependency Relation Threshold =0.5 and almost always received a good cluster-
ing. But if we change the parameters' values to Min-Occurrence-Threshold =0.2
and Dependency Relation Threshold =0.7, the Heuristic Miner generates less ap-
propriate models so that k-process does not reach such good results as when
applied with the old parameter setting. Figure 4 (a) shows average values of the
Silhouette-Coecient of the clusterings after each iteration of k-process for the
old parameter setting used in 5.2 and for the new parameter conguration. The
clusterings produced using this new conguration are slightly worse than those
with the old one. To encounter the sensitiveness of the parameters we introduce
and evaluate a start heuristic in the following section.




                   (a)                                       (b)

Fig. 4. (a) average values of the Silhouette-Coecient of the clusterings after each
iteration of k-process applied to L12 ∪ L17 using the old parameter setting from 6.3
and the new parameter setting. (b) the Silhouette-Coecient for the application of
k-process on L12 ∪ L17 with and without a start heuristic (using the new parameter
setting), and for the application of the n-grams approach.




5.4 k-process with start heuristic
Sometimes the process models generated for each cluster do not dier signi-
cantly enough from each other. So, k-process tends to put all the traces into one
big cluster while other clusters result empty. To prevent this we introduce a start
heuristic, so that the initial clustering no longer follows a random distribution.
We also show that using a start heuristic lowers the sensitivity of the parameters
of the Heuristic Miner, i.e. even if the parameter setting for the Heuristic Miner
        k-process: Model-Conformance-based Clustering of Process Instances      11

is not optimal k-process will still produce good clusterings. We use the n-grams
approach with k-Means to pre-cluster traces. Thereby, we ensure that there is
some essential dierence between the initial clusters. Then we apply k-process
and improve the present clustering.
    To test this hypothesis we applied k-process once without and once with a
start heuristic to the event log L12 ∪ L17 . We used the parameter setting for the
Heuristic Miner which did not produce very good results without a start heuris-
tic: Min-Occurrence-Threshold =0.2; Dependency Relation Threshold =0.7. As a
start heuristic we used the n-grams approach with n=2. Figure 4 (b) shows the
results for k-process when applied to L12 ∪ L17 with the not optimal parameter
setting with a random clustering initialization, and when a start heuristic is used
with the same parameter setting. One can clearly see that the clustering results
produced in combination with the start heuristic are much better. We want to
remark that the n-grams alone do not produce such a good clustering (see Figure
4 (b)).

6   Conclusion


Clustering is a very prominent data exploration step. Trace clustering in partic-
ular helps to identify subprocesses and variants of various process applications.
Here we proposed a novel partitioning method based on the well-known k-means
framework, which consists of nding representatives for partition candidates and
redetermine the next representatives until convergence is reached. Instead of a
vector space based k-means we showed that it is possible to use process model
discovery and conformance checking to establish process-aware trace clustering.
The quality of the results is better than the results of a clustered vector space
embedding. Using the framework of k-means, the core steps in k-process can
be replaced with other discovery and conformance checking methods. This is a
major benet as it allows to adapt the approach to particular applications when
needed. Process analysts, who are familiar with process mining techniques, are
required to operate the discovery algorithms instead of performing an exhaustive
feature engineering to nd a suitable vector space embedding.
     For future work it would be interesting to experiment with other discovery
and conformance checking methods. As both core methods are applied multiple
times, faster methods may be preferred to methods producing higher quality
models in some scenarios. Adapting online conformance checking methods here
might be a promising direction. In the beginning the accuracy is not of great
importance and we only need a coarse partitioning of the traces. A strategy to
be investigated is to start with a low accuracy high performance assignment and
switch gradually to more performance when the nal clustering has almost been
reached according to cluster metrics like the silhouette coecient. A hybrid that
starts with low quality but fast candidate clustering for the initial steps and
ends with more complex methods as a renement might provide exactly this
improvement. For some applications, online clustering might also be needed, as it
is interesting to detect deviating subprocesses shortly after they emerge. In many
12      F. Richter, F. Wahl, A. Sydorova, T. Seidl

dynamic applications the process clusters change to quickly and an advantage
can only been drawn if the trace clusters are identied fast enough. Then it
is possible to analyze them in time and derive useful insights to improve the
process. The strategies to determine the initial clustering should be investigated
as it determines the number of iterations and the nal result, analogous to k-
means. Extensive research has been done in this eld and we have to nd out
the suitable strategies for k-process, since processes behave dissimilar to well-
researched vector spaces.

References

 1. van der Aalst, W.: Process Mining: Data Science in Action. Springer (2016)
 2. Van der Aalst, W., Weijters, T., Maruster, L.: Workow mining: Discovering pro-
    cess models from event logs. IEEE Transactions on Knowledge & Data Engineering
    (9), 11281142 (2004)
 3. Van der Aalst, W.M.: Process mining. In: Process Mining, Data Science in Action,
    Second Edition. Springer (2016)
 4. Bose, R.J.C., Van der Aalst, W.M.: Context aware trace clustering: Towards im-
    proving process mining results. In: Proceedings of the 2009 SIAM International
    Conference on Data Mining. pp. 401412. SIAM (2009)
 5. Bose, R.J.C., van der Aalst, W.M.: Trace clustering based on conserved patterns:
    Towards achieving better process models. In: International Conference on Business
    Process Management. pp. 170181. Springer (2009)
 6. De Weerdt, J., vanden Broucke, S., Vanthienen, J., Baesens, B.: Active trace clus-
    tering for improved process discovery. IEEE Transactions on Knowledge and Data
    Engineering 25(12), 27082720 (2013)
 7. van         Dongen,         B.:       Bpi        challenge       2012         (2012),
    https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f
 8. van Dongen, B.: Bpi challenge 2015 (2015), https://doi.org/10.4121/uuid:31a308ef-
    c844-48da-948c-305d167a0ec1
 9. van     Dongen,      B.:    Bpi    challenge     2017     -   oer     log    (2017),
    https://doi.org/10.4121/uuid:7e326e7e-8b93-4701-8860-71213edf0fbe
10. Ferreira, D., Zacarias, M., Malheiros, M., Ferreira, P.: Approaching process mining
    with sequence clustering: Experiments and ndings. In: International Conference
    on Business Process Management. pp. 360374. Springer (2007)
11. Greco, G., Guzzo, A., Pontieri, L., Sacca, D.: Discovering expressive process models
    by clustering log traces. IEEE Transactions on Knowledge and Data Engineering
    18(8), 10101027 (2006)
12. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation
    of cluster analysis. Journal of computational and applied mathematics 20, 5365
    (1987)
13. Song, M., Günther, C.W., Van der Aalst, W.M.: Trace clustering in process min-
    ing. In: International Conference on Business Process Management. pp. 109120.
    Springer (2008)
14. Sun, Y., Bauer, B., Weidlich, M.: Compound trace clustering to generate accurate
    and simple sub-process models. In: International Conference on Service-Oriented
    Computing. pp. 175190. Springer (2017)
15. Weijters, A., van Der Aalst, W.M., De Medeiros, A.A.: Process mining with the
    heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep. WP
    166, 134 (2006)