<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Method for Assessing Parameter Impact on Control-Flow Discovery Algorithms</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Joel</forename><surname>Ribeiro</surname></persName>
							<email>jribeiro@cs.upc.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">Universitat Politècnica de Catalunya</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Josep</forename><surname>Carmona</surname></persName>
							<email>jcarmona@cs.upc.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">Universitat Politècnica de Catalunya</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Method for Assessing Parameter Impact on Control-Flow Discovery Algorithms</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">9AD2648A8B3A876C31F5EB03236F7E6D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T08:01+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Given an event log L, a control-flow discovery algorithm f , and a quality metric m, this paper faces the following problem: what are the parameters in f that mostly influence its application in terms of m when applied to L? This paper proposes a method to solve this problem, based on sensitivity analysis, a theory which has been successfully applied in other areas. Clearly, a satisfactory solution to this problem will be crucial to bridge the gap between process discovery algorithms and final users. Additionally, recommendation techniques and meta-techniques like determining the representational bias of an algorithm may benefit from solutions to the problem considered in this paper. The method has been evaluated over a set of logs and the flexible heuristic miner, and the preliminary results witness the applicability of the general framework described in this paper.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Control-flow discovery is considered as one of the crucial features of Process Mining <ref type="bibr" target="#b12">[13]</ref>. Intuitively, discovering the control-flow of a process requires to analyze its executions and extract the causality relations between activities which, taken together, illustrate the structure and ordering of the process under consideration.</p><p>There are many factors that may hamper the applicability of a control-flow discovery algorithm. On the one hand, the log characteristics may induce the use of particular algorithms, e.g., in the presence of noise in the log it may be advisable to consider a noise-aware algorithm. On the other hand, the representational bias of an algorithm may hinder its applicability for elicitating the process underlying in a log.</p><p>Even in the ideal case where the more suitable control-flow discovery algorithm is used for tackling the discovery task, it may be the case that the default algorithm's parameters (designed to perform well over different scenarios) are not appropriate for the log at hand. In that case, the user is left alone in the task of configuring the best parameter values, a task which requires a knowledge of both the algorithm and the log at hand.</p><p>In this paper we present a method to automatically assess the impact of parameters of control-flow discovery algorithms. In our approach, we use an efficient technique from the discipline of sensitivity analysis for exploring the parameter search space. In the next section, we charaterize this sensitivity analysis technique and relate it with other work in the literature for similar purposes done in other areas.</p><p>We consider three direct applications of the method presented in this paper:</p><p>(A) As an aid to users of control-flow discovery algorithms: given a log, an algorithm and a particular quality metric the user is interested in, a method like the one presented in this paper will indicate the parameters to consider. Then the user will be able to influence (by assigning meaningful values to these parameters) the discovery experiment. (B) As an aid for recommending control-flow discovery algorithms: current recommendation systems for control-flow process discovery (e.g., <ref type="bibr" target="#b8">[9]</ref>) do not consider the parameters of the algorithms. Using the methodology of this paper, one may determine classes of parameters whose impact refer to the same quality metric, and those can be offered as modes of the same algorithm tailored to specific metrics. Hence, the recommendation task (i.e., the selection of a discovery algorithm) may then be guided towards a better use of a control-flow technique. (C) As a new form of assessing the representational bias of an algorithm: given a log and an algorithm, it may well be the case that the impact of most of the algorithm's parameters is negligible. In that case, then if additionally the result obtained is not satisfactory, one may conclude that this is not the right algorithm for the log at hand.</p><p>The rest of the paper is organized as follows: Section 2 illustrates the contribution and provides related work. Section 3 provides the necessary background and main definitions. Then, Section 4 presents the main methodology of this paper, while Section 5 provides a general discussion on its complexity. Finally, Section 6 concludes the paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work and Contribution</head><p>The selection of parameters for executing control-flow algorithms is usually a challenging issue. The uncertainty of the inputs, the lack of information about parameters, the diversity of outputs (i.e., the different process model types), and the difficulty of choosing a comprehensive quality measurement for assessing the output of a control-flow algorithm make the selection of parameters a difficult task.</p><p>The parameter optimization is one of the most effective approaches for parameter selection. In this approach, the parameter space is searched in order to find the best parameters setting with respect to a specific quality measure. Besides the aforementioned challenges, the main challenge of this approach is to select a robust strategy to search the parameter space. Grid (or exhaustive) search, random search <ref type="bibr" target="#b1">[2]</ref>, gradient descent based search <ref type="bibr" target="#b0">[1]</ref> and evolutionary computation <ref type="bibr" target="#b6">[7]</ref> are typical strategies, which have proven to be effective in optimization problems, but they are usually computationally costly. <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b2">3]</ref> are examples of parameter optimization applications on a control-flow algorithm. Besides the fact that only a single control-flow algorithm is considered, all of these approaches rely on quality measurements that are especially designed to work on a specific type of process model.</p><p>A different approach, which may also be used to facilitate the parameter optimization, is known as sensibility analysis <ref type="bibr" target="#b10">[11]</ref> and consists of assessing the influence of the inputs of a mathematical model (or system) on the model's output. This information may help on understanding the relationship between the inputs and the output of the model, or identifying redundant inputs in specific contexts. Sensibility methods range from variance-based methods to screening techniques <ref type="bibr" target="#b10">[11]</ref>. One of the advantages of screening is that it requires a relatively low number of evaluations when compared to other approaches. The Elementary Effect (EE) method <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref> is a screening technique for sensibility analysis that can be applied to identify non-influential parameters of computationally costly algorithms. In this paper, the EE method is applied to assess the impact of the parameters of control-flow algorithms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Preliminaries</head><p>This section contains the main definitions used in this paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Event Log and Process Model</head><p>Process data describe the execution of the different process events of a business process over time. An event log organizes process data as a set of process instances, where a process instance represents a sequence of events describing the execution of activities (or tasks).</p><p>Definition 1 (Event Log). Let T be a set of events, T * the set of all sequences (i.e., process instances) that are composed of zero or more events of T , and δ ∈ T * a process instance. An event log L is a set of process instances, i.e., L ∈ P(T * ). <ref type="foot" target="#foot_0">1</ref>A process model is an activity-centric model that describes the business process in terms of activities and their dependency relations. Petri nets, Causal nets, BPMN, and EPCs are examples of notations for modeling these models. For an overview of process notations see <ref type="bibr" target="#b12">[13]</ref>. A process model can be seen as an abstraction of how work is done in a specific business. A process model can be discovered from process data by applying some control-flow algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Control-Flow Algorithm</head><p>A control-flow algorithm is a process discovery technique that can be used for translating the process behavior described in an event log into a process model. These algorithms may be driven by different discovery strategies and provide different functionalities. Also, the execution of a control-flow algorithm may be constrained (controlled) by some parameters.</p><p>Definition 2 (Algorithm). Let L be an event log, P a list of parameters, and R a process model. An (control-flow) algorithm A is defined as a function f A : (L, P ) → R that represents in R the process behavior described in L, and it is constrained by P . The execution of f A is designated as a discovery experiment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Quality Measure</head><p>A measure can be defined as a measurement that evaluates the quality of the result of an (control-flow) algorithm. A measure can be categorized as follows <ref type="bibr" target="#b12">[13]</ref>.</p><p>Simplicity measure: quantifies the results of an algorithm (i.e., a process model mined from a specific event log) in terms of readability and comprehension.</p><p>The number of elements in the model is an example of a simplicity measure. Fitness measure: quantifies how much behavior described in the log complies with the behavior represented in the process model. The fitness is 100% if the model can describe every trace in the log. Precision measure: quantifies how much behavior represented in the process model is described in the log. The precision is 100% if the log contains every possible trace represented in the model. Generalization measure: quantifies the degree of abstraction beyond observed behavior, i.e., a general model will accept not only traces in the log, but some others that generalize these.</p><p>Definition 3 (Measure). Let R be a process model and L an event log. A measure M is defined by a function g M : (R) → R that quantifies the quality of R, or a function g M : (R, L) → R that quantifies the quality of R according to L.</p><p>The execution of g M is designated as a conformance experiment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Problem Definition</head><p>Given an event log L, a control-flow algorithm A constrained by the list of pa-</p><formula xml:id="formula_0">rameters P = [p 1 = v 1 , ..., p k = v k ]</formula><p>, and a quality measure M : Assess the impact of each parameter p ∈ P on the result of the execution of A over L, according to M .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">The Elementary Effect Method</head><p>The Elementary Effect (EE) method <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref> is a technique for sensibility analysis that can be applied to identify non-influential parameters of control-flow algorithms, which usually are computationally costly for estimating other sensitivity analysis measures (e.g., variance-based measures). Rather than quantifying the exact importance of parameters, the EE method provides insight into the contribution of parameters to the results quality.</p><p>One of the most efficient EE methods is based on Sobol quasi-random numbers <ref type="bibr" target="#b11">[12]</ref> and a radial OAT strategy <ref type="bibr" target="#b4">[5]</ref>. 2 The main idea is to analyze the parameter space by performing experiments and assessing the impact of changing parameters with respect to the results quality. A Sobol quasi-random generator is used to determine a uniformly distributed set of points in the parameter space. Radial OAT experiments <ref type="bibr" target="#b4">[5]</ref> are executed over the generated points to measure the impact of the parameters. This information can be used either (i) to guide on the parameters setup by prioritizing the parameters to be tuned, or (ii) as a first step towards parameter optimization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Radial OAT Experiments</head><p>In this paper, an OAT experiment consists of a benchmark of some control-flow algorithm where the algorithm's parameters are assessed one at a time according to some quality measure. This means that k + 1 discovery and conformance experiments are conducted, the first to set a reference and the last k to compare the impact of changing one of the k algorithm's parameters. The parameter settings for establishing the reference and changing the parameter's values are defined by a pair of points from the parameter space. OAT experiments can use different strategies to explore these points. Figure <ref type="figure" target="#fig_2">1</ref> presents the most common strategies for performing OAT experiments. In the trajectory design, the parameter change compares to the point of the previous experiment. In the radial design, the parameter change compares always to the initial point. From these two, the radial design has been proven to outperform the trajectory one <ref type="bibr" target="#b9">[10]</ref>.</p><p>Radial OAT experiments can be defined as follows. First, a pair of points (α, β) is selected in the parameter space. Point α, the base point (point (1, 1, 2) in Figure <ref type="figure" target="#fig_2">1</ref>), is used as the reference parameter setting of the experiment. A discovery and conformance experiment is executed with this parameters setting to set the reference quality value. Point β, the auxiliary point (point (2, 2, 0) in Figure <ref type="figure" target="#fig_2">1</ref>), is used to compare the impact of changing the parameters, one at a time, from α to β. For each parameter p i ∈ P , a discovery and conformance experiment is executed using the parameter values defined by α for a parameter p j ∈ P ∧ p j = p i and the parameter value defined by β for p i (see the example in Figure <ref type="figure" target="#fig_2">1b</ref>). Insight into the impact of each parameter is provided by aggregating the results of the radial OAT experiments.</p><p>Let A be a control-flow algorithm, M a given measure, and L an event log. The function f A•M (L, P ) computes the quality of the result of A over L with respect to M , where</p><formula xml:id="formula_1">P = [p 1 = v 1 , ..., p k = v k ] is the list of parameters of A. f A•M (L, P ) =    g M (f A (L, P )) if M does not depend on a log g M (f A (L, P ), L) otherwise<label>(1)</label></formula><p>2 OAT stands for One (factor) At a Time.   The elementary effect of a parameter p i ∈ P on a radial OAT experiment is defined by</p><formula xml:id="formula_2">EE i = f A•M (L, α) − f A•M (L, α ← α i • β i ) α i − β i ,<label>(2)</label></formula><p>where α, β are parameter settings of P (the base and auxiliary points), α i and β i are the i th elements of α and β, and</p><formula xml:id="formula_3">f A•M (L, α ← α i • β i ) is the function f A•M (L, α )</formula><p>where α is α with β i replacing α i . The measure µ for p i is defined by</p><formula xml:id="formula_4">µ i = r j=1 |EE i | r ,<label>(3)</label></formula><p>where r is the number of radial OAT experiments to be executed, typically between 10 and 50 <ref type="bibr" target="#b3">[4]</ref>. The total number of discovery and conformance experiments is r(k + 1), where k is the number of parameters of A.</p><p>The impact of a parameter p i ∈ P is given as the relative value of µ i compared to that for the other parameters of P . A parameter p j ∈ P (j = i) is considered to have more impact on the results quality than p i if µ j &gt; µ i . The parameters p j and p i are considered to have equal impact on the results quality if µ j = µ i . The parameter p i is considered to have no impact on the results quality if µ i = 0. This measure is sufficient to provide a reliable ranking of the parameters <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Sobol Numbers</head><p>Sobol quasi-random numbers (or sequences) are low-discrepancy sequences that can be used to distribute uniformly a set of points over a multidimensional space. These sequences are defined by n points with m dimensions. Table <ref type="table">1</ref>   <ref type="table">1</ref>: The first ten points of a ten-dimensional Sobol quasi-random sequence.</p><p>Each element of a point of a Sobol sequence consists of a numerical value between zero and one (e.g., the element representing the second dimension (d 2 ) of point x 5 is 0.8750). A collection of these values (the entire point or part of it) may be used to identify a specific point in a parameter space. An element of a point of a Sobol sequence can be converted into a parameter value by some normalization process. For instance, a possible normalization process for an element e ∈ [0, 1] to one of the n distinct values of some discrete parameter p can be defined by e × n , which identifies the index of the parameter value in p corresponding to e. Notice that the parameter space must be uniformly mapped by the normalization process (e.g., each value of a Boolean parameter must be represented by 50% of all possible elements).</p><p>Using the approach proposed in <ref type="bibr" target="#b4">[5]</ref>, a matrix of quasi-random Sobol numbers of dimensions (r + 4,2k) can be used to analyze the elementary effects of the k parameters of a control-flow algorithm by executing r radial OAT experiments. The first k dimensions of the matrix's points define the base points, while the last k dimensions define the auxiliary points. Given that the first points of a Sobol sequence have the tendency to provide similar base and auxiliary points, it is identified in <ref type="bibr" target="#b4">[5]</ref> the need of discarding the first four points of the sequence for the auxiliary points (i.e., the k rightmost columns should be shifted upward). Therefore, the base and auxiliary points can be computed from a Sobol sequence as follows. Let e j i be the element corresponding to the j th dimension (d j ) of the i th point (x i ) of the sequence. The i th base (α i ) and auxiliary (β i ) points are defined as following.</p><formula xml:id="formula_5">α i = (e 1</formula><p>i , e 2 i , ..., e j i ) and β i = (e j+1 i+4 , e j+2 i+4 , ..., e 2j i+4 ).</p><p>(4)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Example: The FHM</head><p>The following example is used to illustrate the analysis of the parameter space of an algorithm in order to assess the impact of the algorithm's parameters on the results quality. Let us consider an event log that is characterized by two distinct traces: ABDEG and ACDF G. The frequency of any of these traces is high enough to not be considered as noise. The behavior described by these traces does not contain any kind of loop or parallelism, but it does contain two long-distance dependencies: B ⇒ E and C ⇒ F . Let us also consider the Flexible Heuristics Miner (FHM) <ref type="bibr" target="#b16">[17]</ref> as the control-flow algorithm to explore the parameter space in order to assess the impact of the FHM's parameters on the results quality. The parameters of the FHM are summarized in Table <ref type="table">2</ref>. Notice that every parameter of the FHM is continuous, with a range between zero and one. The relative-to-best and the long-distance thresholds are optional. The former is only considered with the all-tasks-connected heuristic. The latter is only taken into account when the long-distance dependencies option is activated.</p><formula xml:id="formula_6">Parameter Domain Optional? Relative-to-best Threshold [0, 1] Yes Dependency Threshold [0, 1] No Length-one-loops Threshold [0, 1] No Length-two-loops Threshold [0, 1] No Long-distance Threshold [0, 1] Table 2:</formula><p>The parameters of the Flexible Heuristics Miner <ref type="bibr" target="#b16">[17]</ref>.</p><p>Figure <ref type="figure">2</ref> presents the two possible process models that can be mined with the FHM on the aforementioned event log, using all combinations of parameter values. Figure <ref type="figure">2a</ref> shows the resulting Causal net where long-distance dependencies are not taken into account. Figure <ref type="figure">2b</ref> shows the resulting Causal net with the long-distance dependencies. Notice that, depending on the quality measure, the quality of these process models may differ (e.g., the precision of the model with long-distance dependencies is higher than the other one). One may be interested on the exploration of the FHM's parameter space to get the process model that fulfills best some quality requirements.</p><p>The analysis of the parameter space of the FHM starts with the generation of the Sobol numbers. Let us consider that, for this analysis, one wants to execute r = 30 radial OAT experiments for assessing the elementary effects of the k = 5 FHM's parameters. So, a matrix of Sobol numbers of dimensions (30 + 4,2 × 5) has to be generated (cf. Section 4.2). Table <ref type="table">1</ref> shows the first ten points of this matrix. Table <ref type="table" target="#tab_1">3</ref> presents the first five base and auxiliary points as well as the parameter values corresponding to these points. Notice that the parameters are represented in the points according to the same ordering in Table <ref type="table">2</ref> (i.e., the first element of a point represents the first parameter and so on). The normalization Fig. <ref type="figure">2</ref>: The process models that can be mined with the FHM.</p><p>process in this example is defined as follows. For the non-optional parameters (cf. Table <ref type="table">2</ref>), an element e ∈ [0, 1] of a point of a Sobol sequence can be directly used to represent the value of the parameter. For the optional parameters, an element e ∈ [0, 1] of a point of a Sobol sequence is normalized to a value e ∈ [0, 2], which maps the parameter space uniformly (i.e., the value of the parameter and whether or not the parameter is enabled). If e ≤ 1 then e is assigned as the value of the parameter; the parameter is disabled otherwise.   <ref type="table" target="#tab_1">3</ref>) as well as the result of the execution of f A•M (L, P ) and the elementary effect EE for each parameter. For executing f A•M (L, P ), A is the FHM, M the Node Arc Degree measure 3 , and L the aforementioned event log. The elementary effects are computed as described in Section 4. 1. 4 Notice that the elementary effect of a parameter can only be computed when the base and auxiliary points provide distinct parameter values (e.g., in Table <ref type="table" target="#tab_2">4</ref>, the first parameter is not assessed because it is disabled in both base and auxiliary points).  <ref type="table" target="#tab_2">4</ref>: Radial sampling for the first radial OAT experiment. The first line corresponds to the base point, while the others consist of the base point in which the element regarding a specific parameter is replaced by that from the auxiliary point; the underlined values identify the replaced element and the parameter being assessed.</p><p>Table <ref type="table">5</ref> presents the results of the analysis of the FHM's parameter space. The results identify the long-distance threshold as the only parameter to take into account for the parameter exploration. As expected, all other parameters have no impact on the results quality. This is explained by the fact that the log does not contain any kind of loop or noise. Notice that the µ absolute value does not provide any insight into how much a parameter influences the results quality. Instead, the µ measurement provides insight into the impact of a parameter on the results quality, compared to others.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Parameter µ</head><p>Dependency Threshold 0.0 Relative-to-best Threshold 0.0 Length-one-loops Threshold 0.0 Length-two-loops Threshold 0.0 Long-distance Threshold 0.113 Table <ref type="table">5</ref>: The µ values of the FHM's parameters. 3 The Node Arc Degree measure consists of the average of incoming and outgoing arcs of every node of the process model. 4 For computing EEi, αi − βi is considered to be 1 when the parameter is changed from a disabled to an enabled state, or the other way around (e.g., the last parameter in Table <ref type="table" target="#tab_2">4</ref>).</p><p>most efficient form of assessing the impact of a parameter, with the method presented in this paper as a baseline. Second, we will try to incorporate the methodology described in this paper in the RS4PD, a recommender system for process discovery <ref type="bibr" target="#b8">[9]</ref>. Finally, the application of the presented method with other goals, e.g., estimating the representational bias of control-flow discovery algorithms may be explored.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>1 ,p 2 ,p 3 )</head><label>123</label><figDesc>(a) Trajectory.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>1 ,p 2 ,p 3 )</head><label>123</label><figDesc>(b) Radial.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 1 :</head><label>1</label><figDesc>Fig. 1: Comparison between radial and trajectory samplings for OAT experiments over 3 parameters, using the points (1, 1, 2) and (2, 2, 0). The underlined values identify the parameter being assessed</figDesc><graphic coords="6,307.90,238.31,63.14,63.14" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>The Causal net with long-distance dependency relations.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>presents an example of a Sobol sequence containing ten points with ten dimensions.</figDesc><table><row><cell></cell><cell>d1</cell><cell>d2</cell><cell>d3</cell><cell>d4</cell><cell>d5</cell><cell>d6</cell><cell>d7</cell><cell>d8</cell><cell>d9</cell><cell>d10</cell></row><row><cell>x1</cell><cell cols="10">0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000</cell></row><row><cell>x2</cell><cell cols="10">0.7500 0.2500 0.2500 0.2500 0.7500 0.7500 0.2500 0.7500 0.7500 0.7500</cell></row><row><cell>x3</cell><cell cols="10">0.2500 0.7500 0.7500 0.7500 0.2500 0.2500 0.7500 0.2500 0.2500 0.2500</cell></row><row><cell>x4</cell><cell cols="10">0.3750 0.3750 0.6250 0.8750 0.3750 0.1250 0.3750 0.8750 0.8750 0.6250</cell></row><row><cell>x5</cell><cell cols="10">0.8750 0.8750 0.1250 0.3750 0.8750 0.6250 0.8750 0.3750 0.3750 0.1250</cell></row><row><cell>x6</cell><cell cols="10">0.6250 0.1250 0.8750 0.6250 0.6250 0.8750 0.1250 0.1250 0.1250 0.3750</cell></row><row><cell>x7</cell><cell cols="10">0.1250 0.6250 0.3750 0.1250 0.1250 0.3750 0.6250 0.6250 0.6250 0.8750</cell></row><row><cell>x8</cell><cell cols="10">0.1875 0.3125 0.9375 0.4375 0.5625 0.3125 0.4375 0.9375 0.9375 0.3125</cell></row><row><cell>x9</cell><cell cols="10">0.6875 0.8125 0.4375 0.9375 0.0625 0.8125 0.9375 0.4375 0.4375 0.8125</cell></row><row><cell cols="11">x10 0.9375 0.0625 0.6875 0.1875 0.3125 0.5625 0.1875 0.1875 0.1875 0.5625</cell></row><row><cell>Table</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3 :</head><label>3</label><figDesc>The parameter values for the first five base and auxiliary points. The wildcard value '-' identifies that the parameter is disabled. The first five points of the Sobol numbers.</figDesc><table><row><cell>Point</cell><cell>Base</cell><cell>Auxiliary</cell></row><row><cell>1</cell><cell>(.5000, .5000, .5000, .5000, .5000)</cell><cell>(.6250, .8750, .3750, .3750, .1250)</cell></row><row><cell>2</cell><cell>(.7500, .2500, .2500, .2500, .7500)</cell><cell>(.8750, .1250, .1250, .1250, .3750)</cell></row><row><cell>3</cell><cell>(.2500, .7500, .7500, .7500, .2500)</cell><cell>(.3750, .6250, .6250, .6250, .8750)</cell></row><row><cell>4</cell><cell>(.3750, .3750, .6250, .8750, .3750)</cell><cell>(.3125, .4375, .9375, .9375, .3125)</cell></row><row><cell>5</cell><cell>(.8750, .8750, .1250, .3750, .8750)</cell><cell>(.8125, .9375, .4375, .4375, .8125)</cell></row><row><cell>...</cell><cell>...</cell><cell>...</cell></row><row><cell></cell><cell cols="2">(a) The first five base and auxiliary points.</cell></row><row><cell>Point</cell><cell>Base</cell><cell>Auxiliary</cell></row><row><cell>1</cell><cell>(-, 0.50, 0.50, 0.50, -)</cell><cell>(-, 0.88, 0.38, 0.38, 0.25)</cell></row><row><cell>2</cell><cell>(-, 0.25, 0.25, 0.25, -)</cell><cell>(-, 0.13, 0.13, 0.13, 0.75)</cell></row><row><cell>3</cell><cell>(0.50, 0.75, 0.75, 0.75, 0.50)</cell><cell>(0.75, 0.63, 0.63, 0.63, -)</cell></row><row><cell>4</cell><cell>(0.75, 0.38, 0.63, 0.88, 0.75)</cell><cell>(0.63, 0.44, 0.94, 0.94, 0.63)</cell></row><row><cell>5</cell><cell>(-, 0.88, 0.13, 0.38, -)</cell><cell>(-, 0.94, 0.44, 0.44, -)</cell></row><row><cell>...</cell><cell>...</cell><cell>...</cell></row><row><cell>(b)</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4</head><label>4</label><figDesc>presents the radial sampling for the first radial OAT experiment (first point in Table</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">P(X) denotes the powerset of some set X.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgments. This work as been partially supported by funds from the Spanish Ministry for Economy and Competitiveness (MINECO) and the European Union (FEDER funds) under grant COMMAS (ref. TIN2013-46181-C2-1-R).</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Application</head><p>The EE method presented in the previous section can be applied to any controlflow algorithm constrained by many parameters, using some event log and a measure capable of quantifying the quality of the result of the algorithm. The presented method can be easily implemented on some framework capable of executing discovery and conformance experiments (e.g., ProM <ref type="bibr" target="#b14">[15]</ref> or CoBeFra <ref type="bibr" target="#b13">[14]</ref>). Several open-source generators of Sobol numbers are available on the web.</p><p>The computational cost of our approach can be defined as follows. Let L be an event log, A a control-flow algorithm constrained by the list of parameters</p><p>and M a quality measure. The computational cost of a discovery experiment using A (with some parameter setting) over L is given by C D . Considering R as the result of a discovery experiment, the computational cost of a conformance experiment over R and L (or just R) with regard to M is given by C C . Therefore, the computational cost of a radial OAT experiment is given by</p><p>where k is the number of parameters of A. The computational cost of the EE method based on r radial OAT experiments is given by C = r(k + 1)(C D + C C ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Perfomance Optimization</head><p>Considering that both discovery and conformance experiments may be computationally costly, performance may become a critical issue for the application of this method. This issue can be partially addressed by identifying a set of potentially irrelevant parameters, and considering those parameters as a group. Then, by adjusting the µ measurement to work with groups of two or more parameters <ref type="bibr" target="#b3">[4]</ref>, the group of parameters can be analyzed together using radial experiments that iterate over all elements of the same group simultaneously.</p><p>Suppose, for instance, that it is known that a given log does not have loops. So, for the FHM's parameters, the length-one-loops and length-two-loops thresholds may be grouped in order to avoid the execution of discovery and conformance experiments that are not relevant for the analysis. Recalling the example presented in Section 4.3, the radial experiments will iterate over one group of two parameters and three indepedent parameters (i.e., the dependency, the relative-to-best, and the long-distance thresholds). This means that, for the group of parameters, all elements of the same group are replaced simultaneously by the corresponding elements from the auxiliary point. Table <ref type="table">6</ref> presents the adjusted radial sampling presented in Table <ref type="table">4</ref>. The first line corresponds to the base point, while the others consist of the base point in which the element(s) regarding a specific parameter (or group of parameters) is replaced by that from the auxiliary point; the underlined values identify the replaced element(s) and the parameter (or group of parameters) being assessed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Parameter Values</head><p>(-, 0.50, 0.50, 0.50, -) (-, 0.50, 0.50, 0.50, -) (-, 0.88, 0.50, 0.50, -) (-, 0.50, 0.38, 0.38, -) (-, 0.50, 0.50, 0.50, 0.25) Table <ref type="table">6</ref>: Radial sampling for the first radial experiment considering a group of parameters.</p><p>The elementary effect of a group of parameters G ⊆ P on a radial experiment is defined by</p><p>where α, β are parameter settings of P (the base and auxiliary points), α G and β G are the elements of G in α and β, and</p><p>The function dist(A, B) computes the distance between A and B (e.g., the Euclidean distance). The measure µ for G is defined by</p><p>where r is the number of radial experiments to be executed. The total number of discovery and conformance experiments depends on the number of groups and independent parameters being assessed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusions and Future Work</head><p>To the best of our knowledge, this work is the first in presenting a methodology to assess the impact of parameters in control-flow discovery algorithms. The method relies on a modern sensitivity analysis technique that requires considerably less exploration than traditional ones such as genetic algorithms or variance-based methods.</p><p>In this work, we have applied the methodology on the Flexible Heuristics Miner algorithm using 13 event logs. The results suggest the effectiveness of the method. We have noticed that simple conformance measures (and, thus, less computationally costly) are as good as any other complex measure for assessing the parameters influence. Nevertheless, we acknowledge that more experiments are necessary to get a better insight.</p><p>Future work is mainly oriented towards addressing three aspects, which are mainly addressed to apply the method of this paper to other control-flow algorithms. First, we are interested in the algorithmic perspective in order to study the</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Gradient-Based Optimization of Hyperparameters</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural computation</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1889" to="1900" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Random Search for Hyper-Parameter Optimization</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bergstra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="281" to="305" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Automatic Determination of Parameters&apos; Values for Heuristics Miner++</title>
		<author>
			<persName><forename type="first">A</forename><surname>Burattin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sperduti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Evolutionary Computation (CEC), 2010 IEEE Congress on</title>
				<imprint>
			<date type="published" when="2010-07">July 2010</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">An Effective Screening Design for Sensitivity Analysis of Large Models</title>
		<author>
			<persName><forename type="first">F</forename><surname>Campolongo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cariboni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Saltelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Environmental Modelling &amp; Software</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="1509" to="1518" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">From Screening to Quantitative Sensitivity Analysis. A Unified Approach</title>
		<author>
			<persName><forename type="first">F</forename><surname>Campolongo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Saltelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cariboni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer Physics Communications</title>
		<imprint>
			<biblScope unit="volume">182</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="978" to="988" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">How to Evaluate the Performance of Process Discovery Algorithms: A Benchmark Experiment to Assess the Performance of Flexible Heuristics Miner</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ma</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
			<pubPlace>Eindhoven</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Eindhoven University of Technology</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Master&apos;s thesis</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Evolutionary Algorithms for Constrained Parameter Optimization Problems</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Michalewicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Schoenauer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Evolutionary Computation</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="32" />
			<date type="published" when="1996-03">March 1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Factorial Sampling Plans for Preliminary Computational Experiments</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Morris</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Technometrics</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="161" to="174" />
			<date type="published" when="1991-04">April 1991</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A Recommender System for Process Discovery</title>
		<author>
			<persName><forename type="first">J</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carmona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Misir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sebag</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Business Process Management</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">S</forename><surname>Sadiq</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Soffer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Vlzer</surname></persName>
		</editor>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">8659</biblScope>
			<biblScope unit="page" from="67" to="83" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Variance Based Sensitivity Analysis of Model Output. Design and Estimator for the Total Sensitivity Index</title>
		<author>
			<persName><forename type="first">A</forename><surname>Saltelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Annoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Azzini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Campolongo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ratto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tarantola</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer Physics Communications</title>
		<imprint>
			<biblScope unit="volume">181</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="259" to="270" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Global Sensitivity Analysis: The Primer</title>
		<author>
			<persName><forename type="first">A</forename><surname>Saltelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ratto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Andres</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Campolongo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cariboni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gatelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Saisana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tarantola</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
			<publisher>Wiley</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Uniformly Distributed Sequences With an Additional Uniform Property</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">M</forename><surname>Sobol</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">USSR Computational Mathematics and Mathematical Physics</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="236" to="242" />
			<date type="published" when="1976">1976</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Process Mining: Discovery, Conformance and Enhancement of Business Processes</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
			<publisher>Springer</publisher>
			<pubPlace>Berlin</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">A Comprehensive Benchmarking Framework (CoBeFra) for conformance analysis between procedural process models and event logs in ProM</title>
		<author>
			<persName><forename type="first">S</forename><surname>Broucke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Weerdt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Baesens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanthienen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Symposium on Computational Intelligence and Data Mining</title>
				<meeting><address><addrLine>Grand Copthorne Hotel, Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">ProM 6: The Process Mining Toolkit</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M W</forename><surname>Verbeek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C A M</forename><surname>Buijs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">F</forename><surname>Van Dongen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Demo at the 8th International Conference on Business Process Management</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="volume">615</biblScope>
			<biblScope unit="page" from="34" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">An Optimization Framework for Process Discovery Algorithms</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J M M</forename><surname>Weijters</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Data Mining</title>
				<meeting>the International Conference on Data Mining<address><addrLine>Las Vegas, Nevada, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Flexible Heuristics Miner (FHM)</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J M M</forename><surname>Weijters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T S</forename><surname>Ribeiro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011</title>
				<meeting>the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011<address><addrLine>Paris, France</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
