<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Method for Assessing Parameter Impact on Control-Flow Discovery Algorithms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Joel Ribeiro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Josep Carmona</string-name>
          <email>jcarmonag@cs.upc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Politecnica de Catalunya</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <fpage>83</fpage>
      <lpage>96</lpage>
      <abstract>
        <p>Given an event log L, a control- ow discovery algorithm f , and a quality metric m, this paper faces the following problem: what are the parameters in f that mostly in uence its application in terms of m when applied to L? This paper proposes a method to solve this problem, based on sensitivity analysis, a theory which has been successfully applied in other areas. Clearly, a satisfactory solution to this problem will be crucial to bridge the gap between process discovery algorithms and nal users. Additionally, recommendation techniques and meta-techniques like determining the representational bias of an algorithm may bene t from solutions to the problem considered in this paper. The method has been evaluated over a set of logs and the exible heuristic miner, and the preliminary results witness the applicability of the general framework described in this paper.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Control- ow discovery is considered as one of the crucial features of Process
Mining [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Intuitively, discovering the control- ow of a process requires to analyze
its executions and extract the causality relations between activities which, taken
together, illustrate the structure and ordering of the process under consideration.
      </p>
      <p>There are many factors that may hamper the applicability of a control- ow
discovery algorithm. On the one hand, the log characteristics may induce the use
of particular algorithms, e.g., in the presence of noise in the log it may be advisable
to consider a noise-aware algorithm. On the other hand, the representational bias
of an algorithm may hinder its applicability for elicitating the process underlying
in a log.</p>
      <p>Even in the ideal case where the more suitable control- ow discovery
algorithm is used for tackling the discovery task, it may be the case that the default
algorithm's parameters (designed to perform well over di erent scenarios) are not
appropriate for the log at hand. In that case, the user is left alone in the task of
con guring the best parameter values, a task which requires a knowledge of both
the algorithm and the log at hand.</p>
      <p>In this paper we present a method to automatically assess the impact of
parameters of control- ow discovery algorithms. In our approach, we use an e cient
technique from the discipline of sensitivity analysis for exploring the parameter
search space. In the next section, we charaterize this sensitivity analysis technique
and relate it with other work in the literature for similar purposes done in other
areas.</p>
      <p>
        We consider three direct applications of the method presented in this paper:
(A) As an aid to users of control- ow discovery algorithms: given a log, an
algorithm and a particular quality metric the user is interested in, a method like
the one presented in this paper will indicate the parameters to consider. Then
the user will be able to in uence (by assigning meaningful values to these
parameters) the discovery experiment.
(B) As an aid for recommending control- ow discovery algorithms: current
recommendation systems for control- ow process discovery (e.g., [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) do not consider
the parameters of the algorithms. Using the methodology of this paper, one
may determine classes of parameters whose impact refer to the same quality
metric, and those can be o ered as modes of the same algorithm tailored to
speci c metrics. Hence, the recommendation task (i.e., the selection of a
discovery algorithm) may then be guided towards a better use of a control- ow
technique.
(C) As a new form of assessing the representational bias of an algorithm: given
a log and an algorithm, it may well be the case that the impact of most of
the algorithm's parameters is negligible. In that case, then if additionally the
result obtained is not satisfactory, one may conclude that this is not the right
algorithm for the log at hand.
      </p>
      <p>The rest of the paper is organized as follows: Section 2 illustrates the
contribution and provides related work. Section 3 provides the necessary background and
main de nitions. Then, Section 4 presents the main methodology of this paper,
while Section 5 provides a general discussion on its complexity. Finally, Section 6
concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work and Contribution</title>
      <p>The selection of parameters for executing control- ow algorithms is usually a
challenging issue. The uncertainty of the inputs, the lack of information about
parameters, the diversity of outputs (i.e., the di erent process model types), and
the di culty of choosing a comprehensive quality measurement for assessing the
output of a control- ow algorithm make the selection of parameters a di cult
task.</p>
      <p>
        The parameter optimization is one of the most e ective approaches for
parameter selection. In this approach, the parameter space is searched in order to nd
the best parameters setting with respect to a speci c quality measure. Besides the
aforementioned challenges, the main challenge of this approach is to select a
robust strategy to search the parameter space. Grid (or exhaustive) search, random
search [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], gradient descent based search [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and evolutionary computation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] are
typical strategies, which have proven to be e ective in optimization problems,
but they are usually computationally costly. [
        <xref ref-type="bibr" rid="ref16 ref3 ref6">16,6,3</xref>
        ] are examples of parameter
optimization applications on a control- ow algorithm. Besides the fact that only a
single control- ow algorithm is considered, all of these approaches rely on quality
measurements that are especially designed to work on a speci c type of process
model.
      </p>
      <p>
        A di erent approach, which may also be used to facilitate the parameter
optimization, is known as sensibility analysis [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and consists of assessing the in
uence of the inputs of a mathematical model (or system) on the model's output.
This information may help on understanding the relationship between the inputs
and the output of the model, or identifying redundant inputs in speci c contexts.
Sensibility methods range from variance-based methods to screening techniques
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. One of the advantages of screening is that it requires a relatively low number
of evaluations when compared to other approaches. The Elementary E ect (EE)
method [
        <xref ref-type="bibr" rid="ref4 ref5 ref8">8,4,5</xref>
        ] is a screening technique for sensibility analysis that can be applied
to identify non-in uential parameters of computationally costly algorithms. In
this paper, the EE method is applied to assess the impact of the parameters of
control- ow algorithms.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Preliminaries</title>
      <p>This section contains the main de nitions used in this paper.
3.1</p>
      <sec id="sec-3-1">
        <title>Event Log and Process Model</title>
        <p>Process data describe the execution of the di erent process events of a business
process over time. An event log organizes process data as a set of process instances,
where a process instance represents a sequence of events describing the execution
of activities (or tasks).</p>
        <p>De nition 1 (Event Log). Let T be a set of events, T the set of all sequences
(i.e., process instances) that are composed of zero or more events of T , and 2 T
a process instance. An event log L is a set of process instances, i.e., L 2 P(T ).1</p>
        <p>
          A process model is an activity-centric model that describes the business
process in terms of activities and their dependency relations. Petri nets, Causal nets,
BPMN, and EPCs are examples of notations for modeling these models. For an
overview of process notations see [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. A process model can be seen as an
abstraction of how work is done in a speci c business. A process model can be discovered
from process data by applying some control- ow algorithm.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Control-Flow Algorithm</title>
        <p>A control- ow algorithm is a process discovery technique that can be used for
translating the process behavior described in an event log into a process model.
These algorithms may be driven by di erent discovery strategies and provide
di erent functionalities. Also, the execution of a control- ow algorithm may be
constrained (controlled) by some parameters.
1 P(X) denotes the powerset of some set X.
De nition 2 (Algorithm). Let L be an event log, P a list of parameters, and R
a process model. An (control- ow) algorithm A is de ned as a function
f A : (L; P ) ! R that represents in R the process behavior described in L,
and it is constrained by P . The execution of f A is designated as a discovery
experiment.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Quality Measure</title>
        <p>
          A measure can be de ned as a measurement that evaluates the quality of the result
of an (control- ow) algorithm. A measure can be categorized as follows [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
Simplicity measure: quanti es the results of an algorithm (i.e., a process model
mined from a speci c event log) in terms of readability and comprehension.
        </p>
        <p>The number of elements in the model is an example of a simplicity measure.
Fitness measure: quanti es how much behavior described in the log complies
with the behavior represented in the process model. The tness is 100% if the
model can describe every trace in the log.</p>
        <p>Precision measure: quanti es how much behavior represented in the process
model is described in the log. The precision is 100% if the log contains every
possible trace represented in the model.</p>
        <p>Generalization measure: quanti es the degree of abstraction beyond observed
behavior, i.e., a general model will accept not only traces in the log, but some
others that generalize these.</p>
        <p>De nition 3 (Measure). Let R be a process model and L an event log. A
measure M is de ned by
{ a function gM : (R) !
{ a function gM : (R; L) !</p>
        <p>R that quanti es the quality of R, or</p>
        <p>R that quanti es the quality of R according to L.</p>
        <p>The execution of gM is designated as a conformance experiment.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Problem De nition</title>
        <p>Given an event log L, a control- ow algorithm A constrained by the list of
parameters P = [p1 = v1; :::; pk = vk], and a quality measure M : Assess the impact
of each parameter p 2 P on the result of the execution of A over L, according to
M .
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The Elementary E ect Method</title>
      <p>
        The Elementary E ect (EE) method [
        <xref ref-type="bibr" rid="ref4 ref5 ref8">8,4,5</xref>
        ] is a technique for sensibility analysis
that can be applied to identify non-in uential parameters of control- ow
algorithms, which usually are computationally costly for estimating other sensitivity
analysis measures (e.g., variance-based measures). Rather than quantifying the
exact importance of parameters, the EE method provides insight into the
contribution of parameters to the results quality.
      </p>
      <p>
        One of the most e cient EE methods is based on Sobol quasi-random
numbers [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and a radial OAT strategy [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].2 The main idea is to analyze the parameter
space by performing experiments and assessing the impact of changing
parameters with respect to the results quality. A Sobol quasi-random generator is used
to determine a uniformly distributed set of points in the parameter space. Radial
OAT experiments [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] are executed over the generated points to measure the
impact of the parameters. This information can be used either (i) to guide on the
parameters setup by prioritizing the parameters to be tuned, or (ii) as a rst step
towards parameter optimization.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Radial OAT Experiments</title>
        <p>
          In this paper, an OAT experiment consists of a benchmark of some control- ow
algorithm where the algorithm's parameters are assessed one at a time according
to some quality measure. This means that k + 1 discovery and conformance
experiments are conducted, the rst to set a reference and the last k to compare the
impact of changing one of the k algorithm's parameters. The parameter settings
for establishing the reference and changing the parameter's values are de ned by
a pair of points from the parameter space. OAT experiments can use di erent
strategies to explore these points. Figure 1 presents the most common strategies
for performing OAT experiments. In the trajectory design, the parameter change
compares to the point of the previous experiment. In the radial design, the
parameter change compares always to the initial point. From these two, the radial
design has been proven to outperform the trajectory one [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>Radial OAT experiments can be de ned as follows. First, a pair of points
( ; ) is selected in the parameter space. Point , the base point (point (1; 1; 2)
in Figure 1), is used as the reference parameter setting of the experiment. A
discovery and conformance experiment is executed with this parameters setting
to set the reference quality value. Point , the auxiliary point (point (2; 2; 0) in
Figure 1), is used to compare the impact of changing the parameters, one at a
time, from to . For each parameter pi 2 P , a discovery and conformance
experiment is executed using the parameter values de ned by for a parameter
pj 2 P ^ pj 6= pi and the parameter value de ned by for pi (see the example in
Figure 1b). Insight into the impact of each parameter is provided by aggregating
the results of the radial OAT experiments.</p>
        <p>Let A be a control- ow algorithm, M a given measure, and L an event log. The
function f A M (L; P ) computes the quality of the result of A over L with respect
to M , where P = [p1 = v1; :::; pk = vk] is the list of parameters of A.
if M does not depend on a log
(1)
: gM (f A(L; P ); L)
otherwise
2 OAT stands for One (factor) At a Time.
(1,1,2)
(2,2,0)
(2,1,2)
(2,1,0)
(1,2,2)
(1,1,2)
(1,1,0)
(2,1,2)
p3
p1</p>
        <p>p3
(a) Trajectory.
(b) Radial.</p>
        <p>The elementary e ect of a parameter pi 2 P on a radial OAT experiment is
de ned by</p>
        <p>EEi =
where ; are parameter settings of P (the base and auxiliary points), i and
i are the ith elements of and , and f A M (L; - i i) is the function
f A M (L; 0) where 0 is with i replacing i. The measure ? for pi is de ned
by
i? =</p>
        <p>
          Pr
j=1 jEEij
r
;
where r is the number of radial OAT experiments to be executed, typically between
10 and 50 [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The total number of discovery and conformance experiments is
r(k + 1), where k is the number of parameters of A.
        </p>
        <p>
          The impact of a parameter pi 2 P is given as the relative value of i? compared
to that for the other parameters of P . A parameter pj 2 P (j 6= i) is considered
tpoj hanavdepmiaorree cimonpsaidcteroend tthoehraevseuletqsuqaul ailmitpyatchtaonnptihief rej?su&gt;lts qi?u.aTlihtye pifaraj?m=eteri?s.
The parameter pi is considered to have no impact on the results quality if i? = 0.
This measure is su cient to provide a reliable ranking of the parameters [
          <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
          ].
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Sobol Numbers</title>
        <p>Sobol quasi-random numbers (or sequences) are low-discrepancy sequences that
can be used to distribute uniformly a set of points over a multidimensional space.
These sequences are de ned by n points with m dimensions. Table 1 presents an
example of a Sobol sequence containing ten points with ten dimensions.</p>
        <p>Each element of a point of a Sobol sequence consists of a numerical value
between zero and one (e.g., the element representing the second dimension (d2) of
point x5 is 0:8750). A collection of these values (the entire point or part of it) may
be used to identify a speci c point in a parameter space. An element of a point of
a Sobol sequence can be converted into a parameter value by some normalization
process. For instance, a possible normalization process for an element e 2 [0; 1]
to one of the n distinct values of some discrete parameter p can be de ned by
be nc, which identi es the index of the parameter value in p corresponding to e.
Notice that the parameter space must be uniformly mapped by the normalization
process (e.g., each value of a Boolean parameter must be represented by 50% of
all possible elements).</p>
        <p>
          Using the approach proposed in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], a matrix of quasi-random Sobol numbers
of dimensions (r + 4,2k) can be used to analyze the elementary e ects of the k
parameters of a control- ow algorithm by executing r radial OAT experiments.
The rst k dimensions of the matrix's points de ne the base points, while the
last k dimensions de ne the auxiliary points. Given that the rst points of a
Sobol sequence have the tendency to provide similar base and auxiliary points,
it is identi ed in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] the need of discarding the rst four points of the sequence
for the auxiliary points (i.e., the k rightmost columns should be shifted upward).
Therefore, the base and auxiliary points can be computed from a Sobol sequence
as follows. Let eij be the element corresponding to the jth dimension (dj ) of the
ith point (xi) of the sequence. The ith base ( i) and auxiliary ( i) points are
de ned as following.
        </p>
        <p>i = (ei1; ei2; :::; eij ) and i = (eij++41; eij++42; :::; ei2+j4):
(4)
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Example: The FHM</title>
        <p>
          The following example is used to illustrate the analysis of the parameter space of
an algorithm in order to assess the impact of the algorithm's parameters on the
results quality. Let us consider an event log that is characterized by two distinct
traces: ABDEG and ACDF G. The frequency of any of these traces is high enough
to not be considered as noise. The behavior described by these traces does not
contain any kind of loop or parallelism, but it does contain two long-distance
dependencies: B ) E and C ) F . Let us also consider the Flexible Heuristics
Miner (FHM) [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] as the control- ow algorithm to explore the parameter space
in order to assess the impact of the FHM's parameters on the results quality. The
parameters of the FHM are summarized in Table 2. Notice that every parameter
of the FHM is continuous, with a range between zero and one. The relative-to-best
and the long-distance thresholds are optional. The former is only considered with
the all-tasks-connected heuristic. The latter is only taken into account when the
long-distance dependencies option is activated.
        </p>
        <sec id="sec-4-3-1">
          <title>Parameter</title>
        </sec>
        <sec id="sec-4-3-2">
          <title>Domain</title>
        </sec>
        <sec id="sec-4-3-3">
          <title>Optional?</title>
          <p>Relative-to-best Threshold
Dependency Threshold
Length-one-loops Threshold
Length-two-loops Threshold
Long-distance Threshold</p>
          <p>Figure 2 presents the two possible process models that can be mined with
the FHM on the aforementioned event log, using all combinations of parameter
values. Figure 2a shows the resulting Causal net where long-distance dependencies
are not taken into account. Figure 2b shows the resulting Causal net with the
long-distance dependencies. Notice that, depending on the quality measure, the
quality of these process models may di er (e.g., the precision of the model with
long-distance dependencies is higher than the other one). One may be interested
on the exploration of the FHM's parameter space to get the process model that
ful lls best some quality requirements.</p>
          <p>The analysis of the parameter space of the FHM starts with the generation of
the Sobol numbers. Let us consider that, for this analysis, one wants to execute
r = 30 radial OAT experiments for assessing the elementary e ects of the k = 5
FHM's parameters. So, a matrix of Sobol numbers of dimensions (30 + 4,2 5)
has to be generated (cf. Section 4.2). Table 1 shows the rst ten points of this
matrix. Table 3 presents the rst ve base and auxiliary points as well as the
parameter values corresponding to these points. Notice that the parameters are
represented in the points according to the same ordering in Table 2 (i.e., the rst
element of a point represents the rst parameter and so on). The normalization
(a) The Causal net without long-distance (b) The Causal net with long-distance
dedependency relations. pendency relations.
process in this example is de ned as follows. For the non-optional parameters (cf.
Table 2), an element e 2 [0; 1] of a point of a Sobol sequence can be directly used
to represent the value of the parameter. For the optional parameters, an element
e 2 [0; 1] of a point of a Sobol sequence is normalized to a value e0 2 [0; 2], which
maps the parameter space uniformly (i.e., the value of the parameter and whether
or not the parameter is enabled). If e0 1 then e0 is assigned as the value of the
parameter; the parameter is disabled otherwise.</p>
        </sec>
        <sec id="sec-4-3-4">
          <title>Point Base</title>
          <p>FHM, M the Node Arc Degree measure3, and L the aforementioned event log.
The elementary e ects are computed as described in Section 4.1.4 Notice that the
elementary e ect of a parameter can only be computed when the base and
auxiliary points provide distinct parameter values (e.g., in Table 4, the rst parameter
is not assessed because it is disabled in both base and auxiliary points).</p>
        </sec>
        <sec id="sec-4-3-5">
          <title>Parameter Values</title>
          <p>P</p>
          <p>Table 5 presents the results of the analysis of the FHM's parameter space.
The results identify the long-distance threshold as the only parameter to take
into account for the parameter exploration. As expected, all other parameters
have no impact on the results quality. This is explained by the fact that the log
does not contain any kind of loop or noise. Notice that the ? absolute value does
not provide any insight into how much a parameter in uences the results quality.
Instead, the ? measurement provides insight into the impact of a parameter on
the results quality, compared to others.
3 The Node Arc Degree measure consists of the average of incoming and outgoing arcs
of every node of the process model.
4 For computing EEi, i i is considered to be 1 when the parameter is changed from
a disabled to an enabled state, or the other way around (e.g., the last parameter in
Table 4).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Application</title>
      <p>
        The EE method presented in the previous section can be applied to any
controlow algorithm constrained by many parameters, using some event log and a
measure capable of quantifying the quality of the result of the algorithm. The
presented method can be easily implemented on some framework capable of
executing discovery and conformance experiments (e.g., ProM [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] or CoBeFra [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]).
Several open-source generators of Sobol numbers are available on the web.
      </p>
      <p>The computational cost of our approach can be de ned as follows. Let L be
an event log, A a control- ow algorithm constrained by the list of parameters
P = [p1 = v1; :::; pk = vk], and M a quality measure. The computational cost
of a discovery experiment using A (with some parameter setting) over L is given
by CD. Considering R as the result of a discovery experiment, the computational
cost of a conformance experiment over R and L (or just R) with regard to M is
given by CC . Therefore, the computational cost of a radial OAT experiment is
given by CE = (k + 1)(CD + CC ), where k is the number of parameters of A.
The computational cost of the EE method based on r radial OAT experiments is
given by C = r(k + 1)(CD + CC ).
5.1</p>
      <sec id="sec-5-1">
        <title>Perfomance Optimization</title>
        <p>
          Considering that both discovery and conformance experiments may be
computationally costly, performance may become a critical issue for the application of this
method. This issue can be partially addressed by identifying a set of potentially
irrelevant parameters, and considering those parameters as a group. Then, by
adjusting the ? measurement to work with groups of two or more parameters [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ],
the group of parameters can be analyzed together using radial experiments that
iterate over all elements of the same group simultaneously.
        </p>
        <p>Suppose, for instance, that it is known that a given log does not have loops. So,
for the FHM's parameters, the length-one-loops and length-two-loops thresholds
may be grouped in order to avoid the execution of discovery and conformance
experiments that are not relevant for the analysis. Recalling the example presented
in Section 4.3, the radial experiments will iterate over one group of two parameters
and three indepedent parameters (i.e., the dependency, the relative-to-best, and
the long-distance thresholds). This means that, for the group of parameters, all
elements of the same group are replaced simultaneously by the corresponding
elements from the auxiliary point. Table 6 presents the adjusted radial sampling
presented in Table 4. The rst line corresponds to the base point, while the others
consist of the base point in which the element(s) regarding a speci c parameter (or
group of parameters) is replaced by that from the auxiliary point; the underlined
values identify the replaced element(s) and the parameter (or group of parameters)
being assessed.</p>
        <sec id="sec-5-1-1">
          <title>Parameter Values</title>
          <p>where ; are parameter settings of P (the base and auxiliary points), G and</p>
          <p>G are the elements of G in and , and f A M (L; - G G) is the function
f A M (L; 0) where 0 is with G replacing G. The function dist(A; B) computes
the distance between A and B (e.g., the Euclidean distance). The measure ? for
G is de ned by
?G =</p>
          <p>Pr
j=1 jEEGj
r
;
where r is the number of radial experiments to be executed. The total number
of discovery and conformance experiments depends on the number of groups and
independent parameters being assessed.
6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>To the best of our knowledge, this work is the rst in presenting a methodology to
assess the impact of parameters in control- ow discovery algorithms. The method
relies on a modern sensitivity analysis technique that requires considerably less
exploration than traditional ones such as genetic algorithms or variance-based
methods.</p>
      <p>In this work, we have applied the methodology on the Flexible Heuristics Miner
algorithm using 13 event logs. The results suggest the e ectiveness of the method.
We have noticed that simple conformance measures (and, thus, less
computationally costly) are as good as any other complex measure for assessing the parameters
in uence. Nevertheless, we acknowledge that more experiments are necessary to
get a better insight.</p>
      <p>
        Future work is mainly oriented towards addressing three aspects, which are
mainly addressed to apply the method of this paper to other control- ow
algorithms. First, we are interested in the algorithmic perspective in order to study the
(5)
(6)
most e cient form of assessing the impact of a parameter, with the method
presented in this paper as a baseline. Second, we will try to incorporate the
methodology described in this paper in the RS4PD, a recommender system for process
discovery [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Finally, the application of the presented method with other goals,
e.g., estimating the representational bias of control- ow discovery algorithms may
be explored.
      </p>
      <p>Acknowledgments. This work as been partially supported by funds from the
Spanish Ministry for Economy and Competitiveness (MINECO) and the
European Union (FEDER funds) under grant COMMAS (ref. TIN2013-46181-C2-1-R).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <source>Gradient-Based Optimization of Hyperparameters. Neural computation</source>
          ,
          <volume>12</volume>
          (
          <issue>8</issue>
          ):
          <year>1889</year>
          {
          <year>1900</year>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J.</given-names>
            <surname>Bergstra</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>Random Search for Hyper-Parameter Optimization</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ):
          <volume>281</volume>
          {
          <fpage>305</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Sperduti</surname>
          </string-name>
          .
          <article-title>Automatic Determination of Parameters' Values for Heuristics Miner++</article-title>
          .
          <source>In Evolutionary Computation (CEC)</source>
          ,
          <source>2010 IEEE Congress on, pages 1{8</source>
          ,
          <string-name>
            <surname>July</surname>
          </string-name>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>F.</given-names>
            <surname>Campolongo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cariboni</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Saltelli</surname>
          </string-name>
          .
          <article-title>An E ective Screening Design for Sensitivity Analysis of Large Models</article-title>
          .
          <source>Environmental Modelling &amp; Software</source>
          ,
          <volume>22</volume>
          (
          <issue>10</issue>
          ):
          <volume>1509</volume>
          {
          <fpage>1518</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>F.</given-names>
            <surname>Campolongo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saltelli</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Cariboni</surname>
          </string-name>
          .
          <article-title>From Screening to Quantitative Sensitivity Analysis. A Uni ed Approach</article-title>
          .
          <source>Computer Physics Communications</source>
          ,
          <volume>182</volume>
          (
          <issue>4</issue>
          ):
          <volume>978</volume>
          {
          <fpage>988</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>L.</given-names>
            <surname>Ma</surname>
          </string-name>
          .
          <article-title>How to Evaluate the Performance of Process Discovery Algorithms: A Benchmark Experiment to Assess the Performance of Flexible Heuristics Miner</article-title>
          .
          <source>Master's thesis</source>
          , Eindhoven University of Technology, Eindhoven,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Michalewicz</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoenauer</surname>
          </string-name>
          .
          <article-title>Evolutionary Algorithms for Constrained Parameter Optimization Problems</article-title>
          . Evolutionary Computation,
          <volume>4</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>32</fpage>
          ,
          <string-name>
            <surname>March</surname>
          </string-name>
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.D.</given-names>
            <surname>Morris</surname>
          </string-name>
          .
          <article-title>Factorial Sampling Plans for Preliminary Computational Experiments</article-title>
          . Technometrics,
          <volume>33</volume>
          (
          <issue>2</issue>
          ):
          <volume>161</volume>
          {
          <fpage>174</fpage>
          ,
          <string-name>
            <surname>April</surname>
          </string-name>
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>J.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carmona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Misir</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sebag</surname>
          </string-name>
          .
          <article-title>A Recommender System for Process Discovery</article-title>
          . In S. Sadiq, P. So er, and H. Vlzer, editors,
          <source>Business Process Management</source>
          , volume
          <volume>8659</volume>
          of Lecture Notes in Computer Science, pages
          <volume>67</volume>
          {
          <fpage>83</fpage>
          . Springer International Publishing,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>A.</given-names>
            <surname>Saltelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Annoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Azzini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Campolongo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ratto</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Tarantola</surname>
          </string-name>
          .
          <article-title>Variance Based Sensitivity Analysis of Model Output. Design and Estimator for the Total Sensitivity Index</article-title>
          .
          <source>Computer Physics Communications</source>
          ,
          <volume>181</volume>
          (
          <issue>2</issue>
          ):
          <volume>259</volume>
          {
          <fpage>270</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>A.</given-names>
            <surname>Saltelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ratto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Andres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Campolongo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cariboni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gatelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saisana</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Tarantola. Global Sensitivity</surname>
          </string-name>
          <article-title>Analysis: The Primer</article-title>
          . Wiley,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>I.M.</given-names>
            <surname>Sobol</surname>
          </string-name>
          .
          <article-title>Uniformly Distributed Sequences With an Additional Uniform Property</article-title>
          .
          <source>USSR Computational Mathematics and Mathematical Physics</source>
          ,
          <volume>16</volume>
          (
          <issue>5</issue>
          ):
          <volume>236</volume>
          {
          <fpage>242</fpage>
          ,
          <year>1976</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>W.M.P. van der Aalst</surname>
          </string-name>
          .
          <source>Process Mining: Discovery, Conformance and Enhancement of Business Processes</source>
          . Springer, Berlin,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. S. vanden Broucke,
          <string-name>
            <given-names>J.D.</given-names>
            <surname>Weerdt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Baesens</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanthienen</surname>
          </string-name>
          .
          <string-name>
            <given-names>A Comprehensive</given-names>
            <surname>Benchmarking</surname>
          </string-name>
          <article-title>Framework (CoBeFra) for conformance analysis between procedural process models and event logs in ProM</article-title>
          .
          <source>In IEEE Symposium on Computational Intelligence and Data Mining</source>
          , Grand Copthorne Hotel, Singapore,
          <year>2013</year>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>H.M.W. Verbeek</surname>
            ,
            <given-names>J.C.A.M.</given-names>
          </string-name>
          <string-name>
            <surname>Buijs</surname>
            ,
            <given-names>B.F. van Dongen</given-names>
          </string-name>
          , and
          <string-name>
            <surname>W.M.P. van der Aalst.</surname>
          </string-name>
          <article-title>ProM 6: The Process Mining Toolkit</article-title>
          .
          <source>In Demo at the 8th International Conference on Business Process Management</source>
          , volume
          <volume>615</volume>
          <source>of CEUR-WS</source>
          , pages
          <volume>34</volume>
          {
          <fpage>39</fpage>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>A.J.M.M. Weijters</surname>
          </string-name>
          .
          <article-title>An Optimization Framework for Process Discovery Algorithms</article-title>
          .
          <source>In Proceedings of the International Conference on Data Mining</source>
          , Las Vegas, Nevada, USA,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>A.J.M.M. Weijters</surname>
            and
            <given-names>J.T.S.</given-names>
          </string-name>
          <string-name>
            <surname>Ribeiro. Flexible Heuristics</surname>
          </string-name>
          <article-title>Miner (FHM)</article-title>
          .
          <source>In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining</source>
          ,
          <string-name>
            <surname>CIDM</surname>
          </string-name>
          <year>2011</year>
          , Paris, France. IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>