-

Studies on the Discovery of Declarative Control Flows from Error-prone Data

Claudio Di Ciccio

claudio.di.ciccio@wu.ac.at

Massimo Mecella

mecella@dis.uniroma1.it 0 Universita di Roma , Rome , Italy 1 Wirtschaftsuniversitat Wien , Vienna , Austria

31 45

The declarative modeling of work ows has been introduced to cope with exibility in processes. Its rationale is based on the idea of stating some basic rules (named constraints), tying the execution of some activities to the enabling, requiring or disabling of other activities. What is not explicitly prohibited by such constraints is implicitly considered legal, w.r.t. the speci cation of the process. Declarative models for work ows are based on a taxonomy of constraint templates. Constraints are thus instances of constraint templates, applied to speci c activities. Many algorithms for the automated discovery of declarative work ows associate to each constraint a support. The support is a statistical measure assessing to what extent a constraint was respected during the enactment(s) of the process. In current state-of-the-art literature, constraints having a support below a user-de ned threshold are considered not valid for the process. Thresholds are useful for ltering out guesses based on possible misleading events, reported in logs either because of errors in the execution, unlikely process deviations, or wrong recordings in logs. The latter circumstance can be considered extremely relevant when logs are not written down directly by machines reporting their work, but extracted from other source of information. Here, we present an insight on the actual capacity of ltering constraints by modifying the threshold for support, on the basis of real data. Then, taking a cue from the results performed on such analysis, we consider the trend of support when controlled errors are injected into the log, w.r.t. individual constraint templates. Through these tests, we demonstrate by experiment that each constraint template reveal to be less or more robust to di erent kinds of error, according to its nature.

process mining artful process declarative work ow noisy event log

Processes are typically represented as graphs, delineating their possible executions altogether, from the beginning up to the end. Most of the used notations are indeed derived by Petri Nets [ 2 ], such as Work ow Nets [ 1 ], BPMN [ 8 ], YAWL [ 4 ]. The classical approach is called \imperative" because it explicitly represents every step allowed by the process model at hand. This leads to the likely increase of graphical objects as the process allows more alternative executions. The size of the model, though, has undesirable e ects on the understandability and also on the likelihood of errors (see [ 18 ] for an insight of the Seven Process Modeling Guidelines): larger models tend to be more di cult to understand [ 19 ], not to mention the higher error probability which they su er from, with respect to small models [ 17 ].

The declarative work ow models [ 22 ] have been introduced to cope with exibility in processes. Its rationale is based on the idea of stating some basic rules (named constraints), tying the execution of some activities to either the enabling, requiring or disabling of other activities. What is not explicitly prohibited by such constraints is implicitly considered legal, w.r.t. the speci cation of the process. Declarative models for work ows are based on a taxonomy of constraint templates. Constraints are thus instances of constraint templates, applied to speci c activities. A collection of constraints constitute altogether a declarative work ow. ConDec [ 20 ], now renamed Declare, is the most used language for modeling declarative work ows in the community of Business Process Management. It provides an extendible list of constraint templates, which we will consider in the remainder of this paper. Declarative models are particularly e ective with some non-conventional kinds of process. For instance, professors, researchers, information engineers and all those professionals contributing to the production of a valuable but intangible products, such as knowledge, are commonly de ned \knowledge workers" [ 23 ]. They are used to dealing with rapid decisions among multiple choices, based on their expertise, competence and intuition. There is an art in the management of their work. This is the reason for the name assigned to their processes: artful processes [ 14 ], which belong to the larger category of knowledge-intensive processes [ 9 ]. Artful processes are thus very exible, dynamic and subject to change. Due to their characteristics, the declarative approach suits to their modeling [ 5 ]. Mining their work ow would be of extreme interest for understanding the best practices and winning strategies adopted by expert knowledge workers.

Process Mining [ 3 ], a.k.a. Work ow Mining [ 2 ], is the set of techniques that allow the extraction of process descriptions, stemming from a set of recorded real executions. Such executions are intended to be stored in so called event log s, i.e., textual representations of a temporarily ordered linear sequence of tasks. Many techniques have been proposed for mining Declare work ows ([ 16,15,10,11,7,6 ]). Most of them associate to each discovered constraint a support, i.e., a statistical measure assessing to what extent a constraint was respected during the enactment(s) of the process. Those discovered constraints having a support below a user-de ned threshold are considered not valid for the process. Thresholds are useful for ltering out guesses based on possible misleading events, reported in event logs either because of errors in the execution, or due to very unlikely process deviations, or caused by wrong registrations of events in logs. The latter circumstance can be considered extremely relevant when event logs are not written down directly by machines reporting their work, but extracted from other sources of information. Artful processes, e.g., are known to be scarcely automated [ 9 ]. Therefore, there are few possibilities to rely on classical system logs, keeping track of their executions. As a matter of fact, despite the advent of structured case management tools, many enterprise processes are still \run" over email messages. Artful processes, for instance, often require the collaboration of many actors, who usually share their information by means of email messages. Thus, email messages are a valuable source of information and event logs can be extracted out of them, relying on their content and meta-data (e.g., the delivery timestamp). [ 12 ] presents a novel approach and a tool, named MailOfMine, designed to mine declarative work ows for artful processes out of email collections. First, MailOfMine inspects subjects, bodies and headers of given archives of email messages: assuming that reading about the execution of an activity can be interpreted as the reporting of its actual enactment, it searches the email messages where one among a list of user-de ned expressions is found. Each is considered an event. Then, considering the temporal ordering of email messages in every archive, a trace in the log is built accordingly. Such log is passed to the MailOfMine control ow discovery algorithm (MINERful), which returns the declarative model for the artful process laying behind the email communications analyzed. Extracting logs out of email messages leads to possible errors though, due to the automated interpretation of semi-structured texts. Hence, such extracted logs are intrinsically prone to errors. Thereby, mistakes in the discovered work ow are likely to increase.

This is actually the question we search an answer for in this paper: what happens to unknown models when they are discovered on the basis of logs which are a ected by errors. [ 13 ] investigates an approach for repairing process models basing on event data. Conversely, we consider the possible unreliability of data which process models are discovered from, supposing that process models were not previously known at all. In this paper, we rst report the analysis of the results obtained by applying MailOfMine to real data, focused on the precision of the inferred model with respect to the support threshold. Then, we present an insight on the trend of the support in presence of errors, injected into synthetic logs. We focus on di erent types of errors (insertion or deletion of events) and spreading policies (a given percentage per each trace or all over the log). We repeat our experiments for each of the possible constraint templates that the MINERful algorithm is able to discover. Thus, we aim at understanding the di erent levels of robustness that constraint templates show w.r.t. the di erent types of errors.

The remainder of the paper is as follows. Section 2 describes the constraint templates of Declare and their usage for describing a declarative process model. Section 3 reports the results of tests on real data (Section 3.1) and experiments conducted on the basis of tunable injection of errors into synthetic logs (Section 3.2). Section 4 concludes this paper and outlines the future paths for our investigation that this paper sheds light on. 2

The declarative process model Here we abstract activities as symbols (e.g., , ) of an alphabet , appearing in nite strings, which, in turn, represent process traces. We will interchangeably use the terms \activity", \character" and \symbol", as well as \trace" and \string", then. We adopt the subset of Declare taxonomy of constraints for modeling processes, as in [ 16 ]. For a comprehensive analysis of all the constraint templates in Declare, the reader can refer to [ 20,21 ].

Constraints are temporal rules constraining the execution of activities. E.g., Response( ; ) is a constraint on the activities and , forcing to be executed if the activity was completed before. Such rules are meant to adhere to speci c constraint templates. RespondedExistence is the template of RespondedExistence( ; ). We further categorize constraint templates into constraint types. For instance, RespondedExistence belongs to the RelationConstraint type. Figure 1 depicts the subsumption hierarchy of Declare constraints.

Declare constraints are always referred to an activity at least, which we call \implying": if it is executed, the constraint is triggered { vice-versa, if it does not appear in the trace, the constraint has no e ect on the trace itself. The Existence(M; ) constraint imposes to appear at least M times in the trace. We rename Existence(1; ) as Participation( ). The Absence(N; ) constraint holds if occurs at most N 1 times in the trace. We call Absence(2; ) as Uniqueness ( ). Init ( ) makes each trace start with .

The aforementioned constraints fall under the type of ExistenceConstraint s, as they relate to an \implying" activity only. The following are named RelationConstraint s, since the execution of the implying imposes some conditions on another activity, namely the \implied".

RespondedExistence( ; ) holds if, whenever is read, was either already read or going to occur (i.e., no matter if before or afterwards). Instead, Response( ; ) enforces it by requiring a to appear after , if was read. Precedence( ; ) forces to occur after as well, but the condition to be veri ed is that was read - namely, you can not have any if you did not read a before. AlternateResponse( ; ) and AlternatePrecedence( ; ) strengthen respectively Response( ; ) and Precedence( ; ) by stating that each ( ) must be followed (preceded) by at least one occurrence of ( ). The \alternation" is in that you can not have two 's ( 's) in a row before (after ). ChainResponse( ; ) and ChainPrecedence( ; ), in turn, specialize AlternateResponse( ; ) and AlternatePrecedence( ; ), both declaring that no other symbol can occur between and . The di erence between the two is in that the former is veri ed for each occurrence of , the latter for each occurrence of . The reader should note that the hierarchy under the Precedence constraint template does not inherit the base and implied symbols from the RespondedExistence parent; it overrides them both by inverting the two, instead. This is due to the semantics of the constraints themselves.

The MutualRelation constraints follow: they are veri ed i two RespondedExistence (or descendant) constraints (resp., (forward and backward , in Figure 1) are satis ed. CoExistence( ; ) holds if both RespondedExistence( ; ) and RespondedExistence( ; ) hold. Succession( ; ) is valid if Response( ; ) and Precedence( ; ) are veri ed. The same holds with AlternateSuccession( ; ), equivalent to the conjunction of AlternateResponse( ; ) and AlternatePrecedence( ; ), and ChainSuccession( ; ), with respect to ChainResponse( ; ) and ChainPrecedence( ; ).

Finally, we consider NegativeRelation constraints: they are satis ed i the related MutualRelations (negated , in Figure 1) are not. NotChainSuccession( ; ) expresses the impossibility for to occur immediately after (the opposite of ChainSuccession( ; )). NotSuccession( ; ) generalizes the previous by imposing that, if is read, no other can be read until the end of the trace (Succession( ; ) is the negated constraint). NotCoExistence( ; ) is even more restrictive: if appears, not any can be in the same trace (the contrary of CoExistence( ; )).

As a brief example, we may want to model the process of de ning an agenda for a research project meeting. The schedule is discussed by email among the participants. We suppose that a nal agenda will be committed (\con rm" { n) after that requests for a new proposal (\request" { r), proposals themselves (\propose" { p) and comments (\comment" { c) have been circulated.

The aforementioned activities are bound to the following constraints, then. If a request is sent, then a proposal is expected to be prepared afterwards (cf. Response(r; p)). Comments can be given in order to review a proposed agenda, or for soliciting the formulation of a new proposal. Thus, the presence of c in the trace is constrained to the presence of p (cf. RespondedExistence(c; p)). A conrmation is supposed to be mandatorily given after the proposal, and vice-versa any proposal is expected to precede a con rmation (cf. Succession(p; n)). We suppose the con rmation to be the f inal activity (cf. End (n)). This mandatory task (cf. Participation(n)) is not expected to be executed more than once (cf. Uniqueness (n)).

Hence, the example process consists in the six aforementioned constraints: Response(r; p), RespondedExistence(c; p), Succession(p; n), Participation(n), U niqueness(n) and End(n). As an example, the following traces would be compliant to the work ow: pn, pcn, rpcn, rpcpn, rrpcrpcrcpcn, rpprpcccrpcn. 3

Experiments and evaluation In order to inspect the quality of the control ow discovery in presence of errorprone logs, we rst veri ed the whole MailOfMine system on real data (Section 3.1). There, data were extracted from the mailbox of an authors' colleague, known to be an expert in the area of the process to discover. As usual for artful processes, the process behind the analyzed email messages was not known a priori. Therefore, we could not apply an automated comparison between the resulting work ow model and the originating process, since no de nition for the originating process was available at all. Thus, the expert was requested to analyze and assess the discovered work ow model by categorizing the mined constraints. Being real data, the presence of errors in the phase of the extraction of event logs out of email messages was not tunable.

Thereafter, we created synthetic logs, where errors of di erent kinds were injected into event logs. Every event log was created as adhering to the speci cation of declarative processes comprising a single constraint at a time. For each log, i.e., a di erent constraint template was considered. Being known a priori the only constraint to be considered valid, when mined out of the synthetic log, we focused on the trend of its support, in order to monitor the robustness of the template w.r.t. given types of errors. We outline the results of that analysis in Section 3.2. 3.1

A real case study As real data to conduct the experiments on, we took 6 mailbox IMAP folders containing email messages which concerned the management of 5 di erent European research projects (Figure 1a). Such folders belonged to a domain expert. Our aim was to use MailOfMine in order to discover the artful process of managing European research projects and validate the result, together with him.

In order to ease the revision process of the gathered results, we restricted the number of activities for the process to discover to 13. 8:998% of the total amount of email messages were considered related to the execution of an activity. The setup and the results of the inspection of email messages for extracting a log is quantitatively summarized in Table 1b. The log was passed to the control ow discovery algorithm, which returned a process model comprising c.a. 200 constraints. Each was veri ed to hold true within the log and associated to a support exceeding the user-de ned threshold of 80%.

(a) The input

Activities 13 Traces 6 Events 139 Discovered constraints 218 Noticeably right discovered constraints 14 (6:422%) Right discovered constraints 173 (69:725%) Wrong discovered constraints 45 (20:642%) Utterly wrong discovered constraints 7 (3:211%) (b) Retrieved information and mined constraints

In order to assess the validity of the mined process, we checked every constraint with the expert. This allowed us for a quantitative evaluation.For each constraint in the list, we asked him whether it was either: (i) right, i.e., it made sense with respect to his experience; (ii) noticeably right, i.e., it not only made sense but also suggested some surprising mechanisms in the work ow; (iii) wrong, i.e., not necessarily corresponding to reality; (iv) utterly wrong, i.e., not corresponding to reality, unreasonable. The last level was assigned to quite few constraints (7 out of 218), a half of how many were considered noticeably right (14). The model is not known a priori, but the expert could classify as right or wrong a guessed constraint. Then, the analysis helped us nd only true positives (TP , i.e., right or noticeably right) and false positives (FP , i.e., wrong or utterly wrong). As a matter of fact, such situation of partial knowledge of the work ow reproduces a real case, where the artful process had not ever been formalized before.

Recalling that Precision = TPT+PFP , the algorithm was proven to obtain a Precision degree of 0:794 over the real case study. Table 1b summarizes the encouraging results of this real case study evaluation. More than 75% of the constraints inferred were compliant to a realistic model of the process. Figure 2 shows the trend of true positives, false positives and overall (i.e., the sum of the preceding) constraints found, scaled in percentage by their total amount, with respect to their support. The quantities on the ordinates are cumulative, i.e., they represent the sum of the values which are gained up to the current abscissa. The curves show how, as the support increases, the distance between the cumulated false positives and the true positives grow. A line puts in evidence where the relative percentage of con rmed constraints overtakes the wrong, i.e., a \breakpoint" after which the rate of hits, in terms of accepted guesses, is higher than the rate of misses, in terms of wrong guesses. Such breakpoint corresponds to a support value of 0:85 (i.e., 5% higher than the threshold established a priori), which is little enough to limit the number of true positives below that soil to less than 10%. The same graph, although, depicts that more than 90% of errors are given a support exceeding that soil as well. Thus, shifting the threshold altogether would not lead to signi cant improvements in the quality of the returned process. Hence, we studied the trend of support for error-injected logs, taking into account and isolating the behavior of every constraint template to di erent types of errors.

Constraints Discovered

Total False positives True positives

Experiments over arti cial error-injected logs In order to test the robustness of MINERful with respect to the presence of errors in logs, we built an additional testing module, which injected a controlled amount of noise in the sequences of traces.

We identi ed three possible types of error injection: 1. insertion of spurious events in the log; 2. deletion of events from the log; 3. random insertion/deletion of events.

The errors were spread according to a given percentage3. The tester could also specify whether errors had to refer to a given activity, or not. In the latter case, every insertion or deletion was applied to an event picked each time at random.

In order to de ne how many errors had to be injected, and where, a spreading policy was requested too. It could be either: 1. to calculate the number of errors to inject w.r.t. the whole log, and distribute the error injections accordingly, or 2. to calculate the number of errors to inject w.r.t. every single trace, case by case.

In the latter case, every trace was made a ected by a number of errors, computed on the basis of the number of target events in that trace. This reproduces a systematic error, taking place in every registered enactment of the process. In the former, some traces could remain untouched. 3 In case the calculated number of errors to inject resulted in a non-integer number, the actual amount of errors was rounded up to the next integer (e.g., 0:2 was rounded to 1 error to inject).

Thereupon, we conducted an extensive analysis on the reaction of MINERful, the control ow discovery algorithm of MailOfMine, through an experiment set up as summarized in Table 2.

Activities (target) 8 (1) Generating constraints 18 Trace length [0; 30] Log size 1 000 Spreading policies 3 Error types 3 Runs per combination 50 Error injection percentage [0; 30] Total runs 167 400

We created 18 groups of 9 300 synthetic logs each. Every group was generated so to comply to one constraint at a time, among the 18 templates involving a, as the implying activity, and (optionally) b, as the implied (i.e., Participation(a), Uniqueness (a), . . . , RespondedExistence(a; b), Response(a; b), . . . ). The alphabet comprised 6 more non-constrained activities (c, d, . . . , h), totalling 8. We chose a as the target activity for the injection of errors. Then, we injected errors in the synthetic logs, with all of the possible combinations of the aforementioned parameters ((i) insertion, deletion or random error type, (ii) over-string or overcollection spreading policy, (iii) error injection percentage ranging between 0 and 30%) and ran the control ow discovery algorithm of MailOfMine on the resulting altered logs. We collected the results and, for each of the 18 groups of logs, analyzed the trend of the support for the generating constraint. I.e., we looked at how the support for the only constraint which had to be veri ed all over the log lowered, w.r.t. the increasing percentage of errors injected. We also hightlighted those other constraints whose topmost computed support exceeded the value of 0:754, being them the most likely candidates to be false positives in the discovery.

The analysis of within-trace error-injected logs revealed to be more e ective in stressing the resilience of constraints with respect to certain types of errors. In other words, it showed the structural weaknesses of constraint templates w.r.t. some types of error even for small percentages of injected errors. For instance, the support of End (a)'s (Figure 3) is not a ected by the insertion of spurious a's in the traces (see Figure 3a), whereas it su ers from deletions of a's (Figure 3b).

In Section 2 we described the mechanism tying MutualRelation constraints to forward and backward -related constraints, as in the case of AlternateSuccession w.r.t. AlternateResponse and AlternatePrecedence. Then, here we remark that since (i) the support for AlternateResponse(a; b) remains unchanged in case of spurious inserted a's (Figure 4a), but not in case of deleted a's (Figure 4b), whilst 4 We recall that assigning a constraint the support of 0:5 would be equivalent to asserting that such constraint would hold if, tossing a coin, a cross was shown in the end. Thereby, 0:75 is the least value of the topmost half of the \reliable" range. (ii) conversely, the support for AlternatePrecedence(a; b) remains unchanged in case of deleted a's (Figure 4c), but not in case of inserted spurious a's (Figure 4d), AlternateSuccession inherits the sensitivity towards errors of both, resulting in a decreasing support for both faulty insertions and deletions of a's (Figure 5).

The analysis of over-collection error-injected logs showed smoother changes in curves, since errors are spread on a wider area of appearances, for the targeted activity. Therefore, it reveals a more realistic trend for the assessment of discovered constraints in presence of errors. We reasonably expect to have sparse errors in logs, rather than a xed percentage of faults for every trace, as a matter of fact.

Along a branch in the constraints hierarchy (see Figure 1), we expect that the more a constraint is restrictive, the more its support decreases in terms of deviations from the expected behavior. We can prove it by evidence in, e.g., Figure 6, where the curve's slope gets steeper as we analyze the subsumed constraints along the MutualRelation constraints (i.e., CoExistence, Succession, AlternateSuccession, ChainSuccession).

The interested reader can download the whole collection of graphs depicting the gathered results at the following address: http://www.dis.uniroma1.it/~cdc/code/minerful/latest/ errorinjectiontestresults.zip

End_a trend:

'a'−targeted, insertion over strings errors

End_a trend: 'a'−targeted, deletion over strings errors 100 110000 75 [t]r % o p up50 S n a e M 25 0

AEPUNNRalnnooeterditsqtCCrp_icunhhoaieapntaiisdennteRiSSso_deuunbE_scc_xpbaioeesnstesniieooc_nneb___abb___abaa PANNlrooteettcSSrenuudacceteneePcssreeii_oocaenn_d__ebabn__ceab_a_b 0 10 Error percen2t0age [%] 30 0 10 Error percen2t0age [%] 30 (a) The trend of the support for End (a), (b) The trend of the support for End (a), w.r.t. the percentage of spurious events w.r.t. the percentage of events deleted inserted into every string from every string

SRRCoesEpxoisntedncdeE_xabis_tenabce_ba__ab

Pruescpeodnesnieoc_nea_a_bb 80.946Alt0ern4ateResponse_a_ b PAURlrnteisqcrpuenoedanntdeenPscdreE__cxbaei_sdtebnce_ab_ ba

NotChainSuc es ion_b_ a

CRohesaEpixnoiPsntrdeneccdeeEd_exabins_cteebn_cae__ab_ b NotChainSuc es ion_a_ b 100 75 ] % [ t r o p up50 S n a e M 25 0 100 75 ] % [ t r o p up50 S n a e M 25 0 20

Error percentage [%]

30 10

20 Error percentage [%] 30 (Aal)ternTahteeRetsrpeonndse (oaf; b)t,hew.rs.ut.pptohret centage of spurious events inserted every string for perinto (Abl)ternTahteeRetsrpeonndse (oaf; b)t,hew.rs.ut.pptohret centage of spurious events inserted every string for perinto

AlternatePrecedence_a__b trend: 'a'−targeted, insertion over strings errors AlternatePrecedence_a__b trend: 'a'−targeted, deletion over strings errors

10 20

Error percentage [%]

30 10 20

Error percentage [%]

30 (c) The trend of the AlternatePrecedence (a; b), centage of events deleted string

support w.r.t. the from

for perevery (d) The trend of the AlternatePrecedence (a; b), centage of events deleted string

support w.r.t. the from

for perevery bFwoi.grt..ht4. t:thTheehienerstreroerrtnsidoinnofjaetnchdteeddsueilpnepttioohrnet lfooofgr .aATeltvheeernnetarsrt,eowRrieitnshpjieoncntesiaeocnh and AlternatePrecedence , policies under exam are trace.

AlternateResponse_a__b trend:

'a'−targeted, insertion over strings errors

AlternateSuccession_a__b trend:

'a'−targeted, insertion over strings errors AlternateSuccession_a__b trend: 'a'−targeted, deletion over strings errors 100 75 ] r% [ t o p up50 S n a e M 25 0 100 0 100 75 ] r% [ t o p up50 S n a e M 25 0 100 0 10 20

Error percentage [%]

30 10

20 Error percentage [%] 30 (Acael)ntetranTgaehteeoSf utscrpceeunsrdsioiounso(fae;vbteh)n,etswi.srnu.stpe. prttoherdet every string for p erinto (cAbeln)tetrangTaehteeoSfutcrceeevnsedsnitosno(fda;eblteh)t,eedw.sruf.rtpo.pmtohret string for p erevery 100 75 ] r% [ t o p up50 S n a e M 25 0 100 0

Error percentage [%]

supp ort for w.r.t. the p erand insertions, iFnigt.h5e: lTohg,e wtrietnhdinoefatchhe sturapcpeo. rt for AlternateSuccession , w.r.t. the errors injected 99.1705 CRoesEpxoisntedncdeE_xabis_tebance_ba_ ab

PSPPRNaarueoerrctsttcCpiicceehoiidsppaneaaisnnitteoiiScoo_neunna_c__aba_e_bsbion_b_ a

PRCeosEpxoisntedncdeE_xabis_tenabce_ab_ ba

SRrueecscpeeodsnesnieoc_nea__a_b_b

AAUlltteerrnnaatteeSPruecceedseniocne__aa__ bb 84.489Alnt9eiqrun2eanteRse_spbonse_a_ b UNNnooittqSCuuhecanineeSssu_icoane_sb_iona_b_ a 100 75 ] % t[r o p up50 S n a e M 25 0 100 75 ] % t[r o p up50 S n a e M 25 0 100 100 0 0 10 20

Error percentage [%]

30 10 20

Error percentage [%]

30 (a) The trend of the CoExistence (a; b), w.r.t. of both event deletions spread over the whole log support for the percentage and insertions, (b) The trend Succession (a; b), of both event spread over the of the w.r.t. the deletions and whole log support for percentage insertions,

AlternateSuccession_a__b trend: 'a'−targeted, insertion/deletion (random proportion) over collection errors ChainSuccession_a__b trend: 'a'−targeted, insertion/deletion (random proportion) over collection errors

10 20

Error percentage [%]

30 10 20 Error percentage [%] 30 100 75 ] % [tr o p up50 S n a e M 25 0 100 75 ] % t[r o p up50 S n a e M 25 0 100 100 0 0

CoExistence_a__b trend: 'a'−targeted, insertion/deletion (random proportion) over collection errors Succession_a__b trend: 'a'−targeted, insertion/deletion (random proportion) over collection errors SPRRRCrueeoecssEcppxeooidsnnteddncddeEE_xxabiiss_tteennbaccee__ba__ ab

96.884A3sp2eosnsieo_na_ b_ lternatePrecedence_a_ b

NotChainSuc es ion_b_ a

(Acl)ternTahteeSutcrceensdsiono(fa; bth),e w.sru.tp.ptohret pfeorrcentage of both event deletions and insertions, spread over into the whole log (Cdh)ainTShueccestsrieonnd(a;obf), wth.er.t. stuhpeppoertrce nfotrage of both event deletions and insertions, spread over into the whole log Fig. 6: The trend errors injected in insertion/deletion tohf et hloeg.suTphpeoretrrfoorr itnhjeecMtiountupaol Rliceylatuionndecroenxsatmraiinstst,he of a events, over the whole log. w.r.t. the random

Conclusions Throughout this paper, we have analyzed the problem of discovering declarative work ows out of event logs which are a ected by errors. To this aim, we rst assessed the quality of a model, mined out of real data. We used a single threshold level for the estimated support of discovered constraints, in order to determine whether they could be considered valid for the mined process or not. The obtained results suggested that adjusting the level of such threshold did not considerably enhance the quality of the mined process altogether. Therefore, for each constraint in the set of Declare templates, we investigated the trend of its own estimated support with respect to the amount of errors injected into logs. By means of experiments carried out on synthetic data, we showed that the semantics of constraint templates actually a ect their degree of robustness w.r.t. the presence or spurious events or the absence of expected ones in the log.

Starting from these results, we will investigate the problem of de ning an automated approach for the self-adjustment of user-de ned thresholds, on the basis of the nature of each discovered constraint. Intuitively, indeed, a more \robust" constraint should be considered valid in the log (and therefore for the process) if its support exceeds a higher threshold. On the contrary, the threshold should be diminished for more \sensitive" ones. We also aim at mixing such an approach with the analysis of di erent metrics, pertaining to the number of times an event occurred in the log. The intuition is that the more an event is frequent in the log, the less it can be considered subject to errors. Such metrics have been already considered in literature ([ 15 ]) for assessing the relevance of discovered constraints. We want to exploit them for estimating the reliability of constraints in mined processes as well.

1. van der Aalst , W.M.P. : Veri cation of work ow nets . In: Azema, P. , Balbo , G . (eds.) ICATPN. Lecture Notes in Computer Science , vol. 1248 , pp. 407 { 426 . Springer ( 1997 ).

2. van der Aalst , W.M.P.: The application of petri nets to work ow management . Journal of Circuits, Systems, and Computers 8 ( 1 ), 21 { 66 ( 1998 ).

3. van der Aalst , W.M.P. : Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer ( 2011 ).

4. van der Aalst , W.M.P. , ter Hofstede , A.H.M.: YAWL: yet another work ow language . Inf. Syst . 30 ( 4 ), 245 { 275 ( 2005 ).

5. van der Aalst , W.M.P. , Pesic , M. , Schonenberg , H.: Declarative work ows: Balancing between exibility and support . Computer Science - R&D 23 ( 2 ), 99 { 113 ( 2009 ).

6. Alberti , M. , Chesani , F. , Gavanelli , M. , Lamma , E. , Mello , P. , Torroni , P. : Veriable agent interaction in abductive logic programming: The SCIFF framework . ACM Trans. Comput. Log . 9 ( 4 ), 29 :1{ 29 : 43 ( August 2008 ).

7. Chesani , F. , Lamma , E. , Mello , P. , Montali , M. , Riguzzi , F. , Storari , S. : Exploiting inductive logic programming techniques for declarative process mining . T. Petri Nets and Other Models of Concurrency 2 , 278 { 295 ( 2009 ).

8. Decker , G. , Dijkman , R.M. , Dumas , M. , Garc a-Ban~uelos, L.: The business process modeling notation . In: ter Hofstede, A.M., van der Aalst , W.M.P. , Adamns , M. , Russell , N. (eds.) Modern Business Process Automation, pp. 347 { 368 . Springer ( 2010 ).

Ciccio , C. , Marrella , A. , Russo , A. : Knowledge-intensive processes: An overview of contemporary approaches . In: ter Hofstede, A.H. , Mecella , M. , Sardina , S. , Marrella , A . (eds.) KiBP . vol. 861 , pp. 33 { 47 . CEUR Workshop Proceedings (06 2012 ).

10.

Ciccio , C. , Mecella , M.: Mining constraints for artful processes . In: Abramowicz, W. , Kriksciuniene , D. , Sakalauskas , V. (eds.) BIS. Lecture Notes in Business Information Processing , vol. 117 , pp. 11 { 23 . Springer (05 2012 ).

11.

Ciccio , C. , Mecella , M.: A two-step fast algorithm for the automated discovery of declarative work ows . In: CIDM. IEEE (04 2013 ).

12.

Ciccio , C. , Mecella , M. , Scannapieco , M. , Zardetto , D. , Catarci , T.: MailOfMine { analyzing mail messages for mining artful collaborative processes . In: Aberer, K. , Damiani , E. , Dillon , T. (eds.) Data-Driven Process Discovery and Analysis, Lecture Notes in Business Information Processing , vol. 116 , pp. 55 { 81 . Springer (10 2012 ).

13. Fahland , D., van der Aalst, W.M.P. : Repairing process models to re ect reality . In: Barros, A.P. , Gal , A. , Kindler , E. (eds.) BPM. Lecture Notes in Computer Science , vol. 7481 , pp. 229 { 245 . Springer ( 2012 ).

14. Hill , C. , Yates , R. , Jones , C. , Kogan , S.L. : Beyond predictable work ows: Enhancing productivity in artful business processes . IBM Systems Journal 45 ( 4 ), 663 { 682 ( 2006 ).

15. Maggi , F.M. , Bose , R.P.J.C. , van der Aalst , W.M.P.: E cient discovery of understandable declarative process models from event logs . In: Ralyte, J. , Franch , X. , Brinkkemper , S. , Wrycza , S. (eds.) CAiSE. Lecture Notes in Computer Science , vol. 7328 , pp. 270 { 285 . Springer ( 2012 ).

16. Maggi , F.M. , Mooij , A.J., van der Aalst, W.M.P.: User-guided discovery of declarative process models . In: CIDM . pp. 192 { 199 . IEEE ( 2011 ).

17. Mendling , J. , Neumann , G., van der Aalst, W.M.P. : Understanding the occurrence of errors in process models based on metrics . In: Meersman, R. , Tari , Z . (eds.) CoopIS. Lecture Notes in Computer Science , vol. 4803 , pp. 113 { 130 . Springer ( 2007 ).

18. Mendling , J. , Reijers , H.A., van der Aalst , W.M.P. : Seven process modeling guidelines (7PMG) . Information & Software Technology 52 ( 2 ), 127 { 136 ( 2010 ).

19. Mendling , J. , Reijers , H.A. , Cardoso , J.: What makes process models understandable ? In: Alonso, G. , Dadam , P. , Rosemann , M. (eds.) BPM. Lecture Notes in Computer Science , vol. 4714 , pp. 48 { 63 . Springer ( 2007 ).

20. Pesic , M.: Constraint-based Work ow Management Systems: Shifting Control to Users . Ph.D. thesis, Technische Universiteit Eindhoven (10 2008 ), http:// repository.tue. nl/638413

21. Pesic , M. , Schonenberg , H., van der Aalst, W.M.P. : Declare: Full support for loosely-structured processes . In: EDOC . pp. 287 { 300 . IEEE Computer Society ( 2007 ).

22. Pesic , M. , Schonenberg , M.H. , Sidorova , N., van der Aalst , W.M.P. : Constraintbased work ow models: Change made easy . In: Meersman, R. , Tari , Z . (eds.) CoopIS. Lecture Notes in Computer Science , vol. 4803 , pp. 77 { 94 . Springer ( 2007 ).

23. Warren , P. , Kings , N. , Thurlow , I. , Davies , J. , Buerger , T. , Simperl , E. , Ruiz , C. , Gomez-Perez , J.M. , Ermolayev , V. , Ghani , R. , Tilly , M. , Bosser, T., Imtiaz , A. : Improving knowledge worker productivity - the Active integrated approach . BT Technologiy Journal 26 ( 2 ), 165 { 176 ( 2009 ).