Generating Performance Improvement
     Suggestions by using Cross-Organizational
                  Process Mining

                        Onur Yilmaz and Pinar Karagoz

                   {onur.yilmaz, karagoz}@ceng.metu.edu.tr
    Computer Engineering Department, Middle East Technical University, Turkey


      Abstract Process mining is a relatively young and developing research
      area with the main idea of discovering, monitoring and improving pro-
      cesses by extracting information from event logs. With the increase of
      cloud computing and shared infrastructures, event logs of multiple or-
      ganizations are available for analysis where cross-organizational process
      mining stands with the opportunity for organizations learning from each
      other. The approach proposed in this study mines process models of
      organizations and calculates performance indicators; followed by cluster-
      ing of organizations based on performance indicators and finally spots
      mismatches between the process models to generate recommendations.
      This approach is implemented as an extensible and configurable plug-in
      set in ProM framework and tested by synthetic and real life logs where
      successful and suitable results are achieved with defined evaluation met-
      rics. Generated recommendation results indicate that the use of this
      approach can help users to focus on the parts of process models with
      potential performance improvement, which are difficult to spot manually
      and visually.


Keywords: Process Mining, Cross-organizational Process Mining, Performance
Indicators, Clustering, Process Performance Improvement


1   Introduction
Process mining is a relatively young and developing research area with the
roots in computational intelligence, data mining; and process modeling and
analysis [5]. Main idea in this research area is to discover, monitor and improve
processes by extracting information from event logs. Traditional process mining
approaches work on a single organization; however, with the increase of cloud
computing and shared infrastructures, event logs of multiple organizations are
currently available for analysis where cross-organizational process mining stands
out. In the cross-organizational process mining area, recent studies focus on
commonality and collaboration between organizations, especially on how similar
the process models and behaviors of organizations under cross comparison are
[11] and challenges based on partitioning of tasks and process models between


                                       3
organizations [2]. This study is based on the environment where processes are
executed in several organizations and cross-organizational process mining is
applied with the idea of unsupervised learning where predictor variables related
to performances of organizations are used. In this environment, underlying
assumption of the appraoch is that the correlation between performance values
and mismatches hints at a causal relationship.
     The approach proposed in this study is a four-stage solution and it starts with
mining the process models of organizations; followed by performance indicator
analysis and then mismatch pattern analysis. Finally in the suggestion generation
stage, learning opportunities are created for each organization. With this approach
it is aimed to help business process management users to focus on the potentially
important parts of their business maps. Proposed methodology is implemented
in ProM framework [29] as a set of plug-ins corresponding for each stage and
packaged under the name of CrossOrgProcMin and tested on a synthetic and
real-life event logs. Performance of methodology is assessed with a set of defined
evaluation metrics for each stage and resulting recommendations are presented to
show how this approach helps users to focus on learning opportunities between
organizations with a performance improvement potential.
     The rest of the paper is organized as follows: In Section 2, related studies
in process mining area are presented. In Section 3, background information for
the relevant topics is explained. In Section 4, methodology proposed in this
study is presented with detail. In Section 5, methodology of this study is applied
on datasets and results are discussed. In Section 6, summary of this study is
presented with the final remarks and pointers for future work.


2    Related Work

In this section, studies related to the presented work are summarized. Firstly,
studies in the process mining area are explained and then studies from cross-
organizational process mining, which is the main topic of this research, are
introduced. Following these, studies related to similarity in process mining are
presented.
    Within the process mining framework, there are various different process
mining algorithms proposed which have the same aim of discovering underlying
processes. Considering the underlying approaches, algorithms can be grouped
as α-algorithms [7,26], inductive approaches [22,21], hierarchical clustering [19],
genetic approaches [6,17], and heuristic approaches [18]. Considering the scope of
this study; process discovery operations are undertaken with inductive methods
which is a robust, repeatable and mature set of approaches.
    Cross-organizational mining is based on cross-correlation of workflows and the
realized activities in different organization to compare in an objective approach.
In the study of Buijs et al. [11], process models and behaviors of organizations
are cross-compared with the idea of supporting each other and representing
differences. In the studies of van der Aalst [1,2], configurable process models
are proposed with the ideas of exploiting commonality and collaboration for the


                                      4
organizations sharing the same infrastructure and doing the similar work. In
this study, usage of cross-organizational process mining is based on exploiting
commonality where organizations can learn from each other.
    Similarity in process mining have various approaches which focus on metrics
[14], analytical comparison [12,31], ontology analysis [27], delta analysis [16,15,20]
and mismatch patterns [13]. In this research, combination of metric and mismatch
pattern approaches are used to identify variations between process models of
different organizations that execute the same tasks.


3    Background
In this section, process discovery methods and mismatch patterns are presented
within the scope of this work. In the process mining field, various process discovery
algorithms are proposed to address different challenges in process discovery and
using different notations. In this study, since the focus is learning lessons from
cross-organizational mining, we used Inductive Process Mining [23] for process
discovery, which is simple, highly applicable and configurable. In the literature,
its derivatives which handles infrequent behaviors [24]; incomplete logs [25]; and
model optimization [30] are also available. Inductive Miner Infrequent (IMi)
[24] extension is used in this study which is capable of filtering the infrequent
behavior and results with lower fitness, higher precision and equal generalization.
    In cross-organizational process mining environment, there is a need to align
processes of different organizations. In the study of Dijkman [13], a collection
of patterns to describe frequent mismatches between the similar process models
are presented. Within the scope of this study, the related mismatch patterns are
defined in study [13] as follows:
Skipped Activity An activity exists in one process but no equivalent activity
   is found in the other process.
Refined Activity An activity exists in one process but, as an equivalent, a
   collection of activities are existing in the other process to achieve the same
   task.
Activities at Different Moments in Processes Set of activities are under-
   taken with different orders in different processes.
Different Conditions for Occurrence Set of dependencies are same for two
   processes; however, occurrence condition is different.
Different Dependencies Dependency set of activities differ in different orga-
   nizations.
Additional Dependencies This pattern is a special case of different dependen-
   cies where one set of activities includes the other and results with additional
   dependencies.
     As mentioned in the study [13], their approach does not create a comprehensive
list to resolve all mismatches but includes the most common mismatch patterns
spotted during case studies. In addition, from their definitions and examples it
can be easily seen that these patterns are not orthogonal. Moreover, there are no


                                        5
algorithms provided to spot these mismatches in [13] or consequent studies, and
thus implementation of spotting mismatch patterns are performed within the
scope of this study.


4     Methodology
In this section, the methodology proposed in this study is presented. Firstly,
approach overview is described from a high-level perspective. Then, each stage
in the methodology is presented together with their importance in the study,
mathematical representations and definitions; and black-box diagrams. Finally,
implementation details of this methodology in ProM framework is explained in
detail with a software architecture overview.

4.1   Approach Overview
The approach proposed in this study consists of four main stages visualized in
Figure 1. Firstly, in Process Model Mining, process models are extracted from
event logs for each organization with a user specified noise threshold. Secondly,
in Performance Indicator Analysis, event logs are replayed on process models
and performance indicators are calculated for each organization then using these
indicators, organizations are clustered based on how well they are operating.
Thirdly, in Mismatch Pattern Analysis, differences between process models of
organizations are extracted with well-established mismatch patterns. Finally,
in Recommendation Generation, using the performance indicator clusterings
and differences between process models; a set of recommendations for each
organization is generated.


                      Figure 1: Overview of Methodology


4.2   Process Model Mining
Process model mining in the proposed approach has the aim of creating repro-
ducible and generalized process models from event logs. Considering the fact
that the process models may not be defined beforehand or outdated to reflect
latest state of the process, they are mined from event logs. However, if there
are process models that represent the event logs, this stage can be skipped. In


                                     6
order to mine process models, implementation of the Inductive Miner Infrequent
(IMi), which is proposed in [24] as an extension to Inductive Miner to handle
noise in the event logs, is used in this study. In order to set a filtering threshold,
a user-provided value between 0 to 1 is added as input to the method in addition
to event logs.


4.3     Performance Indicator Analysis

Performance indicator analysis stage focuses on calculating and analyzing the
performance values using the event logs and mined process models. This stage
consists of mainly two steps as a) alignment and calculation of performance
indicators; and b) clustering of organizations based on their performance values.
In order to evaluate the performance of an organization based on their process
models and past activities; there is a number of indicators in time dimension, cost
dimension and utilization [3]. However, in this study, process related performance
values are considered since differences in the process models are studied in the
next stages. To this aim, the following performance indicators are calculated:

Average Time Between Activities This is a simple but powerful perfor-
   mance metric for organizations since it can yield the average time to complete
   one task based on a starting point. From the performance perspective, orga-
   nizations want to minimize average time between activities to increase their
   throughput [4]. This notion can be defined as follows:

      Definition 1. Average
                      P     time between activity A and B in organization i is
                                                T imeBetweenc (A,B)
      AvgT imeiA→B =         Case c∈EventLogi
                                |OccurencesEvent Logi (A,B)|          where
      1. T imeBetweenc (A, B) = EndT imec (B) − StartT imec (A)
      2. StartT imec (A) is start time of activity A in case c,
      3. EndT imec (B) is end time of activity B in case c,
      4. |OccurencesEventLogi (A, B)| is number of occurrences of activity A fol-
         lowed by B in Event Logi .

Standard Deviation of Time Between Activities Time between activities
   in real life is not stable and they deviate due to various reasons such as the
   user responsible of tasks, size and the content of tasks or seasonality [3]. On
   the other hand, organizations want to be confident about their processes and
   therefore they want to minimize the deviation in the time between activities.
   Minimized deviation in time helps organizations to plan, act and re-organize
   the activities in the processes with high accuracy [4]. With the same approach
   above, the following formulation can be defined:

      Definition 2. Standard            deviation           time    between     activ-
      r P A and B in
      ity                            organization       i    is StdDevT imeiA→B     =
                             [T imeBetweenc (A,B)−AvgT imeiA→B ]2
          Case c∈EventLogi
                      OccurencesEvent Logi (A,B)


                                            7
Replay and Performance Indicator Calculation Replay of event logs on
process models is based on the idea of alignment which is formalized in [4] and the
basic assumption in this concept is that process models and event logs have the
same activity labels. For each organization, the steps of alignment and creating
transitions are performed with the corresponding event logs and process models;
and the resulting process performance summaries are used for further analysis.
Resulting data can be defined as follows:
Definition 3. Performance Summary data for any organization i is
P erf Sumi = {AvgT imeSumi ∪ StdDevT imeSumi } where
1. AvgT imeSumi = {AvgT imeiA→B |A, B ∈ Event Logi }
2. StdDevT imeSumi = {StdDevT imeiA→B |A, B ∈ Event Logi }

Performance Indicator Clustering Clustering is based on the idea of col-
lecting the set of observations into clusters so that observations within the same
cluster are similar whereas the observations from different clusters are dissimilar.
In this study, clustering is used to gather organizations based on their perfor-
mance indicator data. In this research, random initialization based k-means++
approach from the study of Arthur and Vassilvitskii [8] is used to cluster organi-
zations. Since the number of clusters are not known priori, k-means clustering
is applied starting k from 1 to the number of organizations. For each clustering
with different number of clusters, Sum of Squared Error (SSE) values are plotted
and user is asked to select the appropriate cluster size. For the selected cluster
size, clustering related information is used to generate recommendations in the
further steps. Resulting cluster analysis data is formulated as follows:
Definition 4. Cluster       Analysis      Data              is       a        tuple
(k, Assignments, Cluster Centroids) where
1. k is the number of clusters,
2. Assignments is a set of tuple (Organizationi , Clusterj ) where i is identifier
   for organization and j ≤ k is identifier for cluster,
3. Cluster Centroids is a set of tuple (Clusterj , T ype, Astart , Aend , V alue)
   where
   (a) T ype is performance indicator type which is Average or StandardDev,
   (b) Astart and Aend are starting and ending points of performance indicator,
   (c) V alue is the actual value of performance indicator,
   (d) Cluster Centroidsj is a function that returns a set of Centroid which
       is a tuple (T ype, Astart , Aend , V alue) for Clusterj .

4.4   Mismatch Pattern Analysis
In order to learn from other organizations, it is necessary to spot the differences
between process models of different organizations. In this phase, differences
between process models will be revealed by the mismatch patterns which are
defined by Dijkman [13]. Since performance indicators are calculated based on a


                                      8
starting and ending point in the process model, the same approach is applied to
locate mismatch patterns. In other words, differences of process models are located
through a starting activity to an ending activity. With this aim, each mismatch
pattern and its analyzers are defined by extending the following definitions.
For each organization, mismatch pattern analyzers are pipelined and mismatch
patterns are stored for further analysis.
Definition 5. Mismatch Pattern is a               tuple   M ismatch P atten     =
(O1 , O2 , ExtensionData, Astart , Aend ) where
 1. O1 is the first organization and
 2. O2 is the second organization in between the pattern occurs,
 3. ExtensionData is a set of tuples where mismatch related information is
    recorded,
 4. Astart and Aend are starting and ending points to check mismatch patterns.
Definition 6. Mismatch      Pattern          Analyzer    is    a    function
M ismatchP atternAnalyzer(O1 , O2 , Astart , Aend ) and it returns a set of
Mismatch Pattern for the organization O1 compared to O2 for the activities
between Astart and Aend .

4.5   Genarating Suggestions/Recommendations for Performance
      Improvement
Recommendation generation stage in the methodology is the final and core stage
where all information retrieved from the event logs until now are utilized. In this
study, idea of recommendation is based on providing a set of mismatch patterns
for each organization so that they can enhance their processes. These mismatch
patterns are generated by comparing the process models of other organizations,
particularly those that are performing better in terms of their performance
indicator values. Recommendation idea and recommendation generation function
is defined as following:
Definition 7. Recommendation          is   a   tuple      Recommendation        =
(O, Astart , Aend , M ismatch P atterns) where
 1. O is identifier for organization,
 2. Astart and Aend are starting and ending activities in between the recommen-
    dations are checked,
 3. M ismatch P atterns is collection of mismatch patterns.
Definition 8. Recommendation generation is a function                    that   is
RecGen(O, C, P ) and it returns a set of Recommendation where
 1. O is identifier for organization,
 2. C is Cluster Analysis Data which is result of cluster analysis stage,
 3. P is Performance Threshold which is a real number larger than or equal to 0
    and it is calculated over the same type of performance indicators of different
    organizations in Cluster Analysis Data.


                                       9
    Algorithm of recommendation generation function is based on the idea of
checking other clusters for a significant change in performance indicators, where
significance is defined by the threshold provided by user. Only mismatches
which are located between the activities that causes high level of difference in
performance indicators are analyzed. This approach is formalized in Algorithm 1.


    Algorithm 1: Recommendation Generation
   Input: O organization, C Cluster Analysis Data, P performance difference
           threshold
   Output: Recommendations a set of recommendations
 1 Recommendations ← {}
 2 i ← C(Assignments(O))
 3 for Centroid ∈ C(ClusterCentroidsi ) do
 4     for Centroid′ ∈ C(ClusterCentroidsj ) i ̸= j do
 5         if Centroid(Astart ) = Centroid′ (Astart ) & Centroid(Aend ) =
           Centroid′ (Aend ) then
 6             if (|Centroid(V alue) − Centroid′ (V alue)| ÷ Centroid(V alue)) ≥ P
               then
 7                 Astart ← Centroid(Astart )
 8                 Aend ← Centroid(Aend )
 9                 M ismatchP atterns ← {}
10                 for O′ ∈ C(Assignments(j)) do
11                     M ismatchP atterns
                       ← M ismatchP atternAnalysis(O,O’,Astart ,Aend )
12                Recommendations ← Recommendation(O,Astart ,Aend ,
                  M ismatchP atterns)

13   return Recommendations


4.6    Implementation in ProM Framework

Methodology of this study is implemented in ProM [29], which is an extensible
framework that supports a wide variety of process mining techniques in form of
plugins. Approach of this study is implemented with its each stage as a standalone
plugin that enables extensions for further studies. Developed set of plugins are
packaged with the name of CrossOrgProcMin1 and published open-source2 being
available in the latest version of ProM release.

1
    http://www.promtools.org/prom6/packages/CrossOrgProcMin
2
    http://github.com/onuryilmaz/cross-org-proc-min


                                      10
5     Experimental Analysis Results and Discussions
In this section, methodology presented in this study is applied on several data sets
and results are presented. Firstly, evaluation metrics are defined for each stage of
methodology to assess the performance of approach. Following this, methodology
is applied on two data sets and results are explained with discussions.
    Approach in this study is an aggregation of various methods and they are sig-
nificantly different from each other in their mathematical background. Therefore,
instead of a global evaluation metric for the complete methodology, each stage is
evaluated within its evaluation metrics. In process model mining, performance of
process mining stage is measured by fitness and appropriateness which are defined
in [28]. In performance indicator analysis, alignment costs [4] are compared with
process model mining metrics for replay phase. In clustering phase, within-SSE
analysis is undertaken to decide on the number of clusters. For mismatch pattern
analysis, number of mismatch patterns found are compared with the graph-edit
similarity [14] of process models. In recommendation generation, different thresh-
old values are tried to check how many mismatch patterns are generated for
organizations and how they could be used for focused analysis.

5.1   Loan Application Process
Loan Application Process dataset is synthetically created and consists of four
variants of a simple loan application in a financial institute. These event logs
are used for testing different approaches of discovering a configurable process
model from a collection of event logs [10]. In this dataset there are a total of 475
cases and 2440 events with a fairly even distribution between variants and these
variants are used as organizational logs and the methodology presented in this
study is be applied.
    In Process Model Mining stage, process models resulted with perfect fitness
and high appropriateness as it is expected from a synthetically generated dataset
without noise. In Performance Indicator Analysis stage, firstly event logs are
replayed over process models and performance indicators are calculated and then
organizations are clustered based on their performance indicators. In order to
avoid overfitting, with two clusters, Variant #1, #2, and #4 are grouped into
one cluster where only Variant #3 is left to other cluster. In Mismatch Pattern
Analysis stage, number of mismatch patterns are analyzed with the graph-edit
similarity between each two organization. As the similarity between process
models decreases our method spots more mismatch patterns and it ensures that
the developed mismatch pattern analyzers work as expected for this dataset.
In Recommendation Generation stage, for different threshold values, number of
performance indicators that are performing better for the selected organization
and spotted mismatch patterns are plotted in Figure 2. In order to construct the
data in Figure 2, every organization is selected one-by-one with different threshold
values. For each analysis, number of performance indicators and average number
of mismatch patterns causing them are plotted. In addition, total number of
mismatch patterns without clustering is added as an upper bound. With the help


                                      11
Figure 2: Recommendation Generation analysis for Loan Application Process
dataset


of this upper bound, responsiveness and degree of helping the user to focus on the
performance improvement can be analyzed. As can be seen, for each threshold
value, average number of mismatch patterns with performance indicator clustering
are very low compared to without clustering. In other words, when user wants to
improve its performance with any threshold, there is significantly less number of
mismatch patterns on average to check. This shows the methodology proposed
in this study can help users to focus on differences between organizations given
this dataset.

5.2   Environmental Permit Application Process
Environmental Permit Application Process dataset originates from the "Config-
urable Services for Local Governments (CoSeLoG)" project [1] which investigates
the similarities and dissimilarities between several processes of different municipal-
ities in Netherlands. Dataset contains records of receiving phase for the building
permit application process in 5 municipalities, which are comparable since activity
labels in the different event logs refer to the same activities performed in five
municipalities. In this dataset [9], there are 1214 cases and 2142 events with a
variable distribution between event logs of municipalities and municipalities are
used as organizational logs.
    In Process Model Mining stage, with 10 % of noise threshold, high fitness
values are achieved; however, some of the process models like Municipality #4 and
#5 resulted with low appropriateness values. In Performance Indicator Analysis
stage, after calculating the performance indicators, municipalities are clustered
and three clusters are created: Municipality #1 is located in the first cluster;
Municipality #2 and #4 are located in the second cluster; and Municipality #3
and #5 are grouped in to the last cluster. In Mismatch Pattern Analysis stage,
it can be stated that as the similarity between process models of municipalities
increases, number of mismatch patterns decreases for most of the cases. When
further analyzed, it can be seen that Municipalities #4 and #5, which have


                                       12
significantly more complex process models compared to others, fail in spotting
mismatch patterns under graph-edit similarity. In Recommendation Generation
stage, for different threshold values, number of performance indicators that are
performing better for the selected organization and spotted mismatch patterns
are plotted in Figure 3 for the thresholds of 25 %, 50 % and 75 % since these are
the breaking points. For instance, cluster of Municipality #1 performs worse in 6
indicators with the difference of 25 % and on average 5 mismatch patterns are
listed for each performance indicator. When it is compared to the total mismatch
patterns of Municipality #1, which is 357, proposed approach helps significantly
to the user for focusing performance improvement.


Figure 3: Recommendation Generation analysis for Environmental Permit Appli-
cation Process dataset (3 Clusters)


5.3   Discussions

When the evaluation of the stages for Loan Application Process and Environmental
Permit Application Process datasets are gathered together, the following results
can be expressed:

 – Process mining stage of the proposed methodology can mine the process
   models with high fitness appropriateness levels.
 – For the successfully mined models with high fitness values, replay and per-
   formance indicator calculation stage works seamlessly as expected. With
   this step, average and standard deviation time between each activity can be
   measured for each organization. Number of these metrics are quadratic to
   the number of activities in each organization’s process model and difficult to
   analyze with a cross comparison.


                                     13
– Internal measure of clusters indicates that the organizations can be clustered
  according to their performance indicators which yields a collective approach
  of organizations for their subprocesses. In other words, organizations are
  divided into clusters which shows that they can be grouped based on how
  well they are executing.
– Mismatch analysis spots the differences between process models in coherence
  with structural similarity of them. This indicates that the idea of using
  mismatch patterns to reveal differences between process models is a feasible
  approach since its results are comparable to the similarity metrics of process
  models in the literature.
– Recommendation generation aims to gather all generated information in this
  study to help focusing on the potentially important mismatch patterns for
  performance improvement. When the number of mismatch patterns with and
  without performance clusterings are checked, it shows that in a small dataset
  performance clustering lists 3 times less number of differences in Loan Appli-
  cation Example dataset. When it is impossible to locate mismatch patterns
  manually like in Environmental Permit Application Process, performance
  clustering spots 100 times less number of differences. This difference helps
  user to focus on the differences with a potential performance improvement
  which is one of the aims in this study.
– Although each step of methodology can be counted as successful based on
  their evaluation metrics, mismatch patterns recommended at the end of
  methodology can yield important observations as well as being irrelevant
  and infeasible. Since this decision is based on the business environment of
  organizations, evaluation of the quality of recommendations for business
  usefulness requires domain expertise. However, an example recommendation
  can be presented to provide an insight. In the analysis of Loan Application
  Process, Variant #3 performs worse 27 % on average time and 12 % on
  standard deviation time between activities "Calculate Capacity" and "Accept".
  When the mismatch patterns for these performance indicators are checked
  the following ones can be mentioned:
    • "Check Credit" is a Refined Activity of with "Check System (50 %)";
      "Check Paper Archive (42 %)"; "Send Credit Check Request (32 %)";
      "Process Credit Check Reply (31 %)" where the corresponding similarity
      values provided in parentheses.
    • "Calculate Capacity" is a Different Moments in Processes which have
       different previous activities in clusters.
  When these example mismatch patterns are checked, removing "Check Credit"
  activity and putting other activities instead of it might be the cause of
  performance improvement. With the same approach, putting "Calculate
  Capacity" on different orders in processes can effect the average and variance
  of time between activities. These mismatch patterns are also visualized on
  process model of Variant #3 and a variant from other cluster in Figure 4. In
  the process models, refined activities of "Check Credit" and different positions
  of "Calculate Capacity" are indicated.


                                    14
Figure 4: Visualization of example recommendation for Loan Application Process
dataset


6    Conclusion and Future Work
In this study, a new approach is proposed and tested for generating recom-
mendations using cross-organizational process mining for process performance
improvement. Cross-organizational process mining is applied with the idea of
unsupervised learning where predictor variables related to performances of orga-
nizations are used in an environment where processes are executed on several
organizations. Results show that it is possible to use cross-organizational process
mining and mismatch patterns for performance improvement recommendations.
In this study, proposed methodology is developed as extensible and configurable
set of plugins in ProM framework [29] and published as open-source. This makes
the methodology open to include new process mining methods, mismatch patterns
and clustering approaches as well as testing with different datasets.
    For the approach proposed in this study, the following issues can be listed as
pointers to future work:
 – In the process mining stage, instead of Inductive Miner, new techniques can
   be used which can mine complex process models with higher appropriateness
   levels while keeping the current high fitness values.
 – In the performance indicator analysis stage, new indicators can be defined
   based on the business environment, event log attributes and user needs. For
   instance, personnel and resource allocation indicators can be included as well
   as cost dimension.
 – For mismatch pattern analysis, new and business oriented mismatch patterns
   can be included in the analysis. In addition analyzers can fail when there
   are loops in the process models in current implementations, therefore more


                                     15
   robust implementations for process models with loops can be developed in
   the future.
 – For the generated recommendations, quality for business environment is not
   assessed within the scope of this study. However, when any feedback from
   a domain expert or BPM people is provided, the learning approach can be
   converted to semi-supervised learning from unsupervised learning.


References

 1. van der Aalst, W.M.P.: Business process configuration in the cloud: how to support
    and analyze multi-tenant processes? In: Web Services (ECOWS), 2011 Ninth IEEE
    European Conference on. pp. 3–10. IEEE (2011)
 2. van der Aalst, W.M.P.: Intra-and inter-organizational process mining: Discovering
    processes within and between organizations. In: The Practice of Enterprise Modeling,
    pp. 1–11. Springer (2011)
 3. van der Aalst, W.M.P.: Process mining: discovery, conformance and enhancement
    of business processes. Springer Science & Business Media (2011)
 4. van der Aalst, W.M.P., Adriansyah, A., van Dongen, B.: Replaying history on process
    models for conformance checking and performance analysis. Wiley Interdisciplinary
    Reviews: Data Mining and Knowledge Discovery 2(2), 182–192 (2012)
 5. van der Aalst, W.M.P., Adriansyah, A., de Medeiros, A., et al.: Process mining
    manifesto. In: Business process management workshops. pp. 169–194. Springer
    (2012)
 6. van der Aalst, W.M.P., de Medeiros, A., Weijters, A.J.M.M.: Genetic process mining.
    In: Applications and theory of Petri nets 2005, pp. 48–69. Springer (2005)
 7. van der Aalst, W.M.P., Weijters, T., Maruster, L.: Workflow mining: Discovering pro-
    cess models from event logs. Knowledge and Data Engineering, IEEE Transactions
    on 16(9), 1128–1142 (2004)
 8. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of carefull seeding. In:
    Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms.
    pp. 1027–1035 (2007)
 9. Buijs,     J.C.A.M.:       Environmental        permit      application     process
    (’wabo’),      coselog     project     (2014),      http://dx.doi.org/10.4121/uuid:
    26aba40d-8b2d-435b-b5af-6d4bfbd7a270
10. Buijs, J.C.A.M.: Flexible Evolutionary Algorithms for Mining Structured Pro-
    cess Models. Ph.D. thesis, PhD thesis. Eindhoven, The Netherlands: Technische
    Universiteit Eindhoven, 2014 (cit. on p. 179) (2014)
11. Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: Towards cross-
    organizational process mining in collections of process models and their executions.
    In: Business Process Management Workshops. pp. 2–13. Springer (2012)
12. Buijs, J.C.A.M., Reijers, H.A.: Comparing business process variants using models
    and event logs. In: Enterprise, Business-Process and Information Systems Modeling,
    pp. 154–168. Springer (2014)
13. Dijkman, R.: Mismatch Patterns in Similar Business Processes. Beta, Research
    School for Operations Management and Logistics (2007)
14. Dijkman, R., Dumas, M., van Dongen, B., Käärik, R., Mendling, J.: Similarity
    of business process models: Metrics and evaluation. Information Systems 36(2),
    498–516 (2011)


                                        16
15. Esgin, E., Karagoz, P.: Sequence alignment adaptation for process diagnostics and
    delta analysis. In: Hybrid Artificial Intelligent Systems, pp. 191–201. Springer (2013)
16. Esgin, E., Senkul, P.: Delta analysis: a hybrid quantitative approach for measuring
    discrepancies between business process models. In: Hybrid Artificial Intelligent
    Systems, pp. 296–304. Springer (2011)
17. Esgin, E., Senkul, P., Cimenbicer, C.: A hybrid approach for process mining: using
    from-to chart arranged by genetic algorithms. In: Hybrid Artificial Intelligence
    Systems, pp. 178–186. Springer (2010)
18. Esgin, E., Senkul, R.: A hybrid approach to process mining: Finding immediate suc-
    cessors of a process by using from-to chart. In: Machine Learning and Applications,
    2009. ICMLA’09. International Conference on. pp. 664–668. IEEE (2009)
19. Greco, G., Guzzo, A., Pontieri, L.: Mining hierarchies of models: From abstract
    views to concrete specifications. In: Business Process Management, pp. 32–47.
    Springer (2005)
20. Hallerbach, A., Bauer, T., Reichert, M.: Capturing variability in business process
    models: the provop approach. Journal of Software Maintenance and Evolution:
    Research and Practice 22(6-7), 519–546 (2010), http://dx.doi.org/10.1002/smr.
    491
21. Herbst, J.: Dealing with concurrency in workflow induction. In: European Concurrent
    Engineering Conference. SCS Europe. Citeseer (2000)
22. Herbst, J., Karagiannis, D.: Integrating machine learning and workflow management
    to support acquisition and adaptation of workflow models. In: Database and Expert
    Systems Applications, 1998. Proceedings. Ninth International Workshop on. pp.
    745–752. IEEE (1998)
23. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured
    process models from event logs-a constructive approach. In: Application and Theory
    of Petri Nets and Concurrency, pp. 311–329. Springer (2013)
24. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured
    process models from event logs containing infrequent behaviour. In: Business Process
    Management Workshops. pp. 66–78. Springer (2014)
25. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured
    process models from incomplete event logs. In: Application and Theory of Petri
    Nets and Concurrency, pp. 91–110. Springer (2014)
26. de Medeiros, A.K.A., van Dongen, B.F., van der Aalst, W.M.P., Weijters, A.J.M.M.:
    Process mining: Extending the α-algorithm to mine short loops (2004)
27. Pascalau, E., Rath, C.: Managing business process variants at ebay. In: Mendling,
    J., Weidlich, M., Weske, M. (eds.) Business Process Modeling Notation, Lecture
    Notes in Business Information Processing, vol. 67, pp. 91–105. Springer Berlin
    Heidelberg (2010), http://dx.doi.org/10.1007/978-3-642-16298-5_9
28. Rozinat, A., van der Aalst, W.M.P.: Conformance checking of processes based on
    monitoring real behavior. Information Systems 33(1), 64–95 (2008)
29. Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: Prom
    6: The process mining toolkit. Proc. of BPM Demonstration Track 615, 34–39
    (2010)
30. Weidlich, M., van der Werf, J.M.: On profiles and footprints–relational semantics
    for petri nets. In: Application and Theory of Petri Nets, pp. 148–167. Springer
    (2012)
31. Weidlich, M., Mendling, J., Weske, M.: A foundational approach for managing pro-
    cess variability. In: Mouratidis, H., Rolland, C. (eds.) Advanced Information Systems
    Engineering, Lecture Notes in Computer Science, vol. 6741, pp. 267–282. Springer
    Berlin Heidelberg (2011), http://dx.doi.org/10.1007/978-3-642-21640-4_21


                                         17