=Paper=
{{Paper
|id=None
|storemode=property
|title=Volvo IT Belgium VINST
|pdfUrl=https://ceur-ws.org/Vol-1052/paper3.pdf
|volume=Vol-1052
|dblpUrl=https://dblp.org/rec/conf/bpm/BrouckeVB13
}}
==Volvo IT Belgium VINST==
Third International Business Process Intelligence Challenge (BPIC'13): Volvo IT Belgium VINST 1 1 12 Seppe K.L.M. vanden Broucke , Jan Vanthienen and Bart Baesens 1 Department of Decision Sciences and Information Management, KU Leuven Naamsestraat 69, B-3000 Leuven, Belgium 2 School of Management, University of Southampton Higheld Southampton, SO17 1BJ, United Kingdom seppe.vandenbroucke@kuleuven.be Abstract. For the Third International Business Process Intelligence Challenge (BPIC'13), a collection of real-life event logs from Volvo IT Belgium is analyzed. The provided logs contain events from an inci- dent and problem handling system which is called VINST. This report presents results related to the following investigations. First, an open- minded exploratory analysis of the given event logs and second, answer- ing the four specic questions posed by the process owner. To do so, we utilize both already existing as well as dedicated developed tools, and heavily combine traditional data analysis tools and process-oriented tech- niques; we indicate the existence of a gap between these two categories of tools and as such emphasize the importance of a hybrid approach in a process intelligence context throughout the report. Key words: process mining, process analysis, data analysis, process intelligence, challenge 1 Introduction 1.1 Overview This report presents the results we have uncovered related to the analysis of the Volvo IT data set in the context of the Third International Business Process Intelligence Challenge (BPIC'13). The data set consists of three real-life event logs from an incident and problem handling system called VINST. The report is structured as follows. First, a high level overview of the data set and the VINST system is provided in Section 2, to familiarize readers with the provided material. Next, in Section 3, we perform an open-minded exploratory analysis using ProM 1 [1] and Disco2 in order to get an initial idea about the event 1 ProM is an extensible process mining framework for the discovery and analysis of process models and event logs. See http://www.processmining.org for more infor- mation. 2 Disco is a commercial process discovery tool developed by Fluxicon and pro- vides an easy-to-use interface together with extensive ltering capabilities. See http://uxicon.com/disco/ for more information. 2 Seppe vanden Broucke et al. logs' composition and the VINST processes before moving on to answering the questions posed by the process owner in Section 4. The report is concluded in Section 5. 1.2 Methodology We will make heavy use of both traditional data analysis tools and process- oriented techniques. We believe there to be a gap between these two categories of tools, causing diculties when ltering, exploring and analyzing event based data sets. Reports submitted to previous BPI challenges, for instance, illustrate that practitioners and academics alike frequently perform initial ltering steps using spreadsheet or OLAP-based tools. Oftentimes, the same tools are used to derive descriptive statistics, before moving on to a more process-oriented view (e.g. process discovery or performance analysis), which these traditional data analysis tools lack. Since event logs are often handled as a large at table (i.e. a listing of events), it is dicult to derive statistical information at the trace level (i.e. a grouping of events) rather than at the event level. Filtering out events based on complex control-ow based criteria is even more challenging. Consider for example the user trying to lter out all events belonging to a trace where two activities occur in parallel. Vice versa, tools which are specically geared towards the analysis of process based data are often lacking in terms of ltering or descriptive statistics. Consider ProM, for example, which contains a plethora of plugins for the discovery of process models alone, but does not provide strong methods to lter or explore the data set at hand. In ProM 6, only two helpful ltering plugins are provided by default: Filter Log using Simple Heuristics, which allows to discard events based on their classication and traces based on their start/ending/contained events; Filter Log by Attributes, which is able to lter on traces containing events with a specic attribute-value pair. Advanced ltering options (e.g. value greater than x) are not provided. The same can be said for plugins providing descriptive statics. Information about events per case, number of cases and events and other simple counts are easily retrieved, but no robust options are provided to derive other, more complex event data. Due to the mismatch described above, practitioners and researchers currently often nd themselves using an abundance of tools to perform their analysis tasks. While this is not an insurmountable problem, of course, the fact remains that available tooling towards the analysis of event logs can be improved. In fact, this is one of the challenges also listed in the process mining manifesto [2]. Disco stands out as a great contender, however, which has enabled practitioners and researcher to quickly lter and inspect event logs, but as we will indicate later on is not completely up to par with traditional data analysis tools when it comes to inspecting particular attribute distributions or correlations. Therefore, we will apply a hybrid approach in this report, switching back-and-forth between BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 3 (mainly) R and Disco to perform the desired tasks and combine statistical data analysis with process mining insights. As a summary, Table 1 provides an overview of the dierences between tradi- tional data analysis tools and process oriented techniques. The main dierences arise due to a dierent starting viewpoint on the data at hand. Traditional tools work with at data tables containing record lines, whereas process oriented tech- niques are aware of the hierarchical nature between cases and events. Other than the plethora of tools already available for (statistical) data anal- ysis and process mining, it should be mentioned that some work has also been undertaken in order to add some process awareness or, more precisely, sequence awareness to data analysis tools. TraMineR, for instance, is an R package [3] for mining, describing and visualizing sequences of states, but contains support for event (i.e. transition) based data sets as well, such as found in business pro- cess event logs. The package oers a robust toolkit, but requires a relatively large amount of user expertise to use correctly. Furthermore, ltering data sets is somewhat dicult and the provided visualization options do not aim to dis- cover a summarized process map from the given sequences. Many authors have also emphasized the need for sequential data support in traditional data base management systems [46]. One particular example of interest is SQL-TS [79], an extension for SQL which provides a simple extension for SQL to work with sequential patterns, using proposed CLUSTER BY and SEQUENCY BY clauses. This would provide a good starting point towards analyzing event based data sets. I.e. the CLUSTER BY clause indicates the eld specifying the case Table 1: Overview of the dierences between traditional data analysis tool and process oriented techniques. Characteristic Traditional Data Process Oriented Analysis Analysis Example products Excel, R, SAS, other statistical packages Disco, ProM, other process mining tools Data structure Flat (table of records) Hierarchical (cases with events) Central object Events Cases Process discovery (extracting visual process maps) Hard Easy Statistical analysis (extracting distributions, Easy Hard correlations, relations) Easy for some Filtering Easy at event level, hard at case level lters/tools (e.g. Disco), harder for others Querying Easy (e.g. SQL) at event level Hard 4 Seppe vanden Broucke et al. identier (e.g. SR Number), while the SEQUENCE BY clause indicates the ordering of events. Sadly, two drawbacks exists which makes this approach not yet fully suited for analyzing business processes. First, higher level structural behavior, such as loops and parallelism, is not discovered and can thus not be taken into account when constructing queries, or must be specied manually by the user (i.e. explicitly take into account all parallel orderings possible). Second, the language is not available for public use. Finally, we are also reminded of regular expressions [10] (or regular languages) and other expressive grammar languages such as linear temporal logic [11] (LTL) which are converted to au- tomatons to support the querying of sequence based data. Again, these tools oer theoretically sound solutions to search sequence based data sets, but also require some more advanced user expertise and or not able to deal with higher level control-ow abstractions, so that they are quite far distanced from the easy-to-use traditional and common tools oer. 2 Data Set Overview The BPIC'13 data set is obtained from Volvo IT Belgium and contains events pertaining to an internally developed Siebel-based incident and problem man- agement system called VINST. Three log les are provided: one for incidents, one for open problems and one for closed problems . 3 The incident management process aims to restore normal service operations after the occurrence of a specic incident within SLA dened boundaries. In- cident cases are rst handled by a rst line desk (the service desk and help desks) and escalated to second line and third line teams when the rst line workers are not able to resolve the incident. Incident cases are assigned a prior- ity level, which is calculated based on the impact (major, high, medium or low) and urgency (high, medium, low) of the issue, see Table 2. Table 2: Priority levels for incident cases are calculated based on the impact and urgency of the incident. When incidents are not resolved within a specied time frame, urgency is increased automatically. Incidents cannot automatically migrate from one impact level to another. Major High Medium Low Impact Impact Impact Impact High Urgency (1) 4 7 10 Medium Urgency (2) 5 8 11 Low Urgency (3) 6 9 12 3 DOI identiers for the data sets are: doi:10.4121/500573e6-accc-4b0c- 9576-aa5468b10cee, doi:10.4121/3537c19d-6c64-4b1d-815d-915ab0e479da and doi:10.4121/c2c3b154-ab26-4b31-a0e8-8f2350ddac11. BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 5 The problem management process tries to uncover the root causes behind incidents and implement xes to prevent the occurrence of further incidents in IT-services operated by Volvo IT and includes activities to update internal knowledge bases with discovered ndings. This process is primarily handled by second and third line teams. Contrary to the incident management process, there is no push to front system implemented for the problem management process which escalates cases among dierent service lines. The problem management and incident management processes thus work in parallel, where incidents are resolved as quickly as possibly in a reactive manner while the underlying root cause is xed by a problem management case. These two workows are sup- ported by the Volvo IT VINST tool. The tool also supports other workow processes such as handle questions and major incident procedure, but these are not incorporated in the BPI challenge data set. The data set lists the following attributes for each logged event line: SR Number (or Problem Number for the problem management process): the service request case identier. Example values: 1-364285768, 1-109135791, 1-147898401. Change Date+Time: the time stamp of the logged event line. Example values: 2011-08-29T07:12:35+01:00, 2011-08-29T07:13:48+01:00. Status and Sub Status: the current status of the case as changed by the logged event line. Example values: Queued/Awaiting Assignment, Accepted/In Progress, Ac- cepted/Assigned. Impact : the impact of the case. Example values: Medium, Low, High. Product : the product involved in the case. Example values: PROD821, PROD236, PROD793. Involved ST: the support team trying to solve the problem. Example values: V5 3rd, V30, V13 2nd 3rd. Involved ST Functional Division: the support team's functional division. Example values: V3_2, C_6, E_10. Involved Organization: the involved organisation line. Example values: Org line A2, Org line C, Org line V7n. Organization Country : the location that takes the ownership of the support team. Example values: fr, se, nl. Owner Country and Owner First Name : the person in the support team working on the case. Example values: France/Frederic, Sweden/Adam, Belgium/Bert. Next section describes an exploratory analysis of the three event logs. 6 Seppe vanden Broucke et al. 3 Exploratory Analysis 3.1 Descriptive Statistics Using ProM and Disco, we perform some initial analysis on the given three event logs. Some rudimentary, process based descriptive statistics are listed in Table 3. We use the default classier as stored in the event logs to classify events to a set of activities, namely the combination of Status and Sub Status attribute elds. For each event log, the total number of cases and events is listed, together with the number of distinct cases (the dierent variants). Next, the total number of event classes found in the event log, together with how many of these appear as the rst or last activity in the traces contained in the event log is shown. Finally, Table 3 also lists the minimum, average and maximum case length (i.e. number of events). Table 3: Process based descriptive statistics for the three event logs. For the three logs, the default event classier (Status plus Sub Status) is left as is. Event Log incidents open closed problems problems Nr. of Cases 7554 819 1487 Nr. of Events 65533 2351 6660 Nr. of Distinct Cases 2278 182 327 Total/Starting/Ending Nr. of Event Classes 13/9/8 5/5/5 7/6/1 Min./Avg./Max. Case Length 1/8/123 1/2/22 1/4/35 Especially for the incidents event log, the numbers indicate a large amount of variability. As shown by Fig. 1, a large number of trace variants exist which occur only once in the event log, leading to a skewed distribution in the number of cases for each variant. Next, table 4 shows an overview of the dierent attributes (described above) with their number of values and mean of frequencies of the values for the three event logs. Let us now take a look at some performance related characteristics for the given event logs. When we plot the mean duration for the cases represented by a variant against the number of cases itself, a relation can be observed between the rarity of a trace variant and its running time, as depicted by Fig. 2a. One is free to remove these peculiar traces from the data set before continuing on with further analysis, we but choose to leave these exceptional cases alone for now. A similar observation can be made when plotting the case duration for all cases, for example as shown in Fig. 2b for the incidents log. Observe that a small number of cases exists which run for a high amount of time, with the longest trace taking 2.11 years. BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 7 Nr. of Cases (Log Scale) 500 100 10 1 Variants Fig. 1: Number of cases in the incidents log per variant. A long tail of 1934 variants exist which only occur once in the event log. Next, Fig. 3 inspects the relationship between a case's duration and the involved product or impact level of the case. We observe that there exists a strong correlation between the product a case relates to and the required running time for that case. Indeed, this is to be expected, as is seems reasonable to Table 4: Descriptive statistics for the attributes contained in the event logs. For each attribute, the absolute number of possible values is shown, together with the mean of the frequencies of the values (between parentheses). Attribute Event Log incidents open closed problems problems Status 4 (16383.25) 3 (783.67) 4 (1665.00) Sub Status 13 (5041.00) 5 (470.20) 7 (951.43) Impact 4 (16383.25) 4 (587.75) 4 (1665.00) Product 704 (93.09) 139 (16.91) 337 (19.76) Involved ST 649 (100.98) 1 (2351.00) 324 (20.56) Involved ST Functional Division 24 (2730.54) 26 (90.42) 29 (229.66) Involved Organization 25 (2621.32) 1 (2351.00) 15 (444) Organization Country 23 (2849.26) 1 (2351.00) 17 (391.76) Owner First Name 1440 (45.51) 240 (9.80) 585 (11.38) Owner Country 32 (2047.91) 14 (167.93) 21 (317.14) 8 Seppe vanden Broucke et al. 800 800 Mean Duration (Days, Sqrt Scale) 600 Nr. of Cases (Log Scale) 400 200 400 200 50 10 0 0 0 500 1000 1500 Nr. of Cases Variants (a) Scatter plot showing the mean dura- (b) Case durations (shown in days) for all tion in days for each variant against its cases included in the event log. number of occurrences. Many exceptional variants also exhibit a long duration. Fig. 2: Performance related characteristics for the incidents event log. assume that some products (such as PROD507, PROD34 and PROD681 highlighted in Fig. 3) lead to incidents which are harder to resolve, due for example to the complexity of the product. Concerning impact levels, the box plots shown in Fig. 3 do not immediately reveal a great dierence in duration medians, although there does exist a signicant dierence between the mean durations over the impact levels (which are 9.7, 13.7. 16.2 and 8.3 days for Low, Medium, High and Major impact respectively). We can also derive performance related statistics for the support team mem- ber (Owner First Name) and countries (Owner Country). However, contrary to products, support teams and member are event-attributes, and can thus vary within a case. When we would group traces as done above (e.g. a trace belongs to the group associated to Sam when one of its events was executed by Sam), this would lead to a skewed distribution as perhaps only one event in a long trace was executed by a certain support team member. We are thus interested in knowing the duration of certain tasks executed by the various support team members, rather than the duration of the whole case. Sadly, the provided event logs do not contain starting and stopping times for each activity. Instead, activities are logged as atomic events, i.e. only with a single time stamp, and are assumed to have a duration of zero (instant completion). Still, when relating task durations to support team members, we nevertheless would like to have some idea about how long the case was being handled by a certain person before moving on to the next activity (and, perhaps, some other support team member). Therefore, we apply a simple pre-processing step where we assign a completion time to each event equal to the (starting) time stamp BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 9 40 PROD507 300 250 Case Mean Duration (Days) 30 PROD34 200 Duration (Days) PROD681 20 150 100 10 50 0 0 Low Medium High Major Products Impact (a) Mean case duration grouped per prod- (b) Box plots depicting case duration uct. Observe that there exists a strong statistics per impact level. Although no correlation between the products involved strong dierence among the medians in a case and the required running time (shown by the box plots) becomes appar- for that case. ent, the variance decreases for higher im- pact levels. The means of the dierent im- pact levels also signicantly dier at the 95% condence level. Fig. 3: Case duration as depending on the product involved in the case, or the case's impact level (incidents event log). of the event following directly after in the same trace. If there is no such event, we just set the completion time equal to the starting time and assume this nal event was instantly completed. This allows us to derive how long a particular case stayed with a particular activity, and hence, support team member . Once 4 done, we can easily extract various sorts of information for the support team members and countries. Table 5 provides an overview of mean activity durations for activities, support team members and support team countries, as well as a task-originator matrix for the support team members. As always, these initial exploratory results must be carefully interpreted to avoid reaching to erroneous conclusions. The common warning given when deal- ing with performance-originator oriented statistics indeed states that the fact that someone is taking longer to do their job might simply be because this worker commonly deals with harder jobs which take longer to resolve. Addi- tionally, using the mean duration may lead to misleading results under skewed distributions; for example, the median duration for Anne Claire's activities is a just a few seconds. The standard deviation for the workers with high mean 4 Note that we assume the time stamp present in the logs to denote the starting time of a particular activity. Only when this activity is completed does the next event start. 10 Seppe vanden Broucke et al. Table 5: Performance related overview metrics for support team members and activities for the incidents event log. (a) Mean activity durations for all activities based on derived completion times. Again, Status plus Sub Status is used as a classier to construct activities. Mean Activity Duration (Hours) Accepted/Wait - Vendor 5.48 Accepted/Wait - Implementation 5.40 Accepted/Wait - Customer 4.81 Completed/Resolved 4.69 Accepted/Wait - User 4.58 Accepted/Wait 4.08 Accepted/Assigned 1.51 Queued/Awaiting Assignment 1.09 Accepted/In Progress 0.44 Completed/In Call 0.22 Completed/Closed 0.15 Unmatched/Unmatched 0.01 Completed/Cancelled 0.00 (b) Mean activity durations for the support team members (Owner First (c) Mean activity durations for the sup- Name) based on derived completion port team countries (Owner Country) times. Only the highest ten members are based on derived starting times. Only the shown. highest ten countries are shown. Owner First Name Mean Duration (Days) Owner Country Mean Duration (Days) Anne Claire 103.44 Denmark 19.26 Earl 60.16 Austria 14.76 Mica 48.91 Netherlands 12.46 Elaine 44.79 Spain 4.65 Ray 42.75 Turkey 3.85 Cincellia 42.27 Germany 3.76 Bozidar 36.87 United Kingdom 3.11 Lucas 27.47 France 2.94 Brice 23.29 Russian Federation 2.86 Minkyu 22.93 Korea 2.60 (d) Task-originator matrix for the support team members. Only the rst ten team members are shown, sorted by the number of activities executed (summed last row). Siebel Krzysztof Pawel Marcin Marika Michael Fredrik Piotr Andreas Brecht Queued/Awaiting Assignment 538 206 125 93 90 117 104 92 53 91 Accepted/Assigned 23 189 31 73 193 10 31 68 0 44 Accepted/In Progress 12 616 479 332 297 320 267 304 286 222 Accepted/Wait 27 5 32 7 4 12 18 18 6 0 Accepted/Wait - Implementation 8 1 1 0 0 0 0 5 0 0 Accepted/Wait - User 173 56 123 71 12 32 69 33 63 75 Accepted/Wait - Vendor 7 0 0 5 0 0 10 0 33 0 Accepted/Wait - Customer 1 1 0 0 0 1 0 0 0 0 Completed/Cancelled 0 0 0 0 0 0 0 0 0 0 Completed/Resolved 0 87 88 80 9 56 83 16 72 38 Completed/In Call 0 5 24 10 0 31 3 16 29 7 Completed/Closed 5373 7 22 17 0 8 0 2 0 0 Unmatched/Unmatched 0 0 0 0 0 0 0 0 0 0 Sum 6162 1173 925 688 605 587 585 554 542 477 BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 11 durations is high, caused by a few (or even just one) activities taking a long time to be passed forward. This is evidenced by Fig. 4, which shows a small number of activities with a very high duration (i.e. more than one year). The process owner can inspect these activities and the cases they belong to in order to verify if further action should be undertaken. Furthermore, keep in mind that we have derived activity durations based on a single time stamp present in the log les, which might dier from the actual work time. 700 Duration (Days, Sqrt Scale) 400 100 200 10 0 Activities Fig. 4: Durations for all activities in the incidents event log. Notice the few number of events consuming an unrealistic amount of time. We limit the statistical description to the overview provided above, as not to consume too much space. Nevertheless, this section illustrates the power of using simple, at data oriented tools already widely available to get a rst, solid introspection in the performance and characteristics of running processes. However, since these tools lack a process oriented perspective, it is dicult to derive statistical information at the trace level (a grouping of events) rather than at the event level, as was already encountered in this section. This already emphasizes the need for a hybrid approach. Therefore, we also briey explore the data set using process specic tools, namely ProM and Disco. 3.2 Process Mining Mining the incidents event log using HeuristicsMiner [12] in ProM yields a result as depicted in Fig. 5, i.e. a spaghetti like model where all activities appear to be connected with one another. Using Disco with all activities and paths visible gives a similar (albeit prettier) outcome. 12 Seppe vanden Broucke et al. (a) Model mined with HeuristicsMiner in ProM (we use a custom created plugin which allows to provide the graph annotations). 6372 1156 Accepted/In Progress 7293 1 30239 2 10700 8458 136 2017 1 242 87 Queued/Awaiting Assignment 777 Completed/In Call 17 11544 2035 3 29 86 2 35 Completed/Cancelled 1 1405 513 20 Accepted/Wait - Vendor 14 1 313 12 2 1 3 285 171 410 16 5 Accepted/Wait - Customer 2 3 80 3531 1520 4 101 5 1 4 10 17 618 2756 3108 3 4038 194 Accepted/Wait 72 3 23 25 2 4 1533 3 7 4 24 24 94 38 111 5 120 2 41 227 Accepted/Wait - Implementation 20 19 2 493 38 81 31 59 1882 1 310 Accepted/Wait - User 334 11 4217 134 218 7 1423 Accepted/Assigned 100 3221 2 21 3 3 Completed/Resolved 7 6115 1 5704 1 5 90 Completed/Closed 12 Unmatched/Unmatched 5716 5 5573 (b) Model mined with Disco (all activities and paths shown). (c) The same model mined with a custom built process discovery tool, which runs on top of Pandas (another statistical data analy- sis tool; see http://pandas.pydata.org) and extracts process ows together with summarizing metrics from a at event table. This is then exported to a process graph (using Graphviz). This illus- trates that the integration of process-oriented perspectives within traditional statistical tools is indeed possible. Fig. 5: Mining the incidents log as is yields a so called hard to read spaghetti model, typical for non-structured ticketing and issue handling processes. BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 13 Let us see whether we can derive a straight-through process, a simple pro- cess map explaining the multitude of behavior found in the event log. Filtering on the frequent variants provides a solid means to do so. Fig. 6 shows the pro- cess map derived from the ve most frequent variants, accounting for 41% of all traces in the incident log. 2828 300 Accepted/In Progress 2828 6511 555 855 203 1749 Queued/Awaiting Assignment Accepted/Wait - User 1176 855 203 203 Completed/In Call 1749 Completed/Resolved 1379 1379 1749 Completed/Closed 1379 1379 Fig. 6: Process map (all activities and paths shown) derived from the ve most frequent variants in the incidents log. When we just use the Status attribute as a classier, the resulting pro- cess model becomes even simpler, and just shows all transitions between the Accepted, Queued, Completed and Unmatched statuses, see Fig. 7. Since the process described a ticket based issue handling system, and since the event logs do not contain start and ending times by itself (unless approximated with a pre-processing step as done above), we can assume that the process does not contain any parallel behavior, i.e. process case is passed on from one activity to the next, without an activity create two or more concurrent ows of work. That said, it is thus possible to view the process as a Markov chain, i.e. as shown in Fig. 8 (once more using both Status and Sub Status elds), similar 14 Seppe vanden Broucke et al. 5860 Accepted 21713 919 40117 773 9344 10154 2 7454 2385 Queued 779 11544 5 1606 502 85 Completed 5555 526 13867 3 5422 Unmatched 5 Fig. 7: Process map showing all transitions between the statuses for the inci- dents log. to the process maps obtained by Disco, the depicted graph is also able to quickly highlight the most important ows from one activity to the next. Fig. 8: Markov chain for the incident log (we use a custom created ProM plugin to extract the Markov chain). BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 15 We can also easily show some initial performance focused process maps. Fig. 9 depicts the maximum and mean waiting times between activity occurrences. Note the discrepancy between the mean duration and the maximum duration, also indicating the presence of low-frequent exceptional cases which run far longer. Queued/Awaiting Assignment Queued/Awaiting Assignment instant instant 8.3 hrs 67.7 mins 13.9 d 10.7 d Accepted/In Progress 2.6 mins Accepted/In Progress 15.3 hrs instant instant 3 hrs 45.8 mins 6.1 d 8.1 d 12.8 hrs Accepted/Wait - User 27.8 d Accepted/Wait - User instant instant 3.9 d Completed/In Call 27.6 d Completed/In Call instant instant Completed/Resolved Completed/Resolved instant instant 6.1 d 8d Completed/Closed Completed/Closed instant instant (a) Mean durations. (b) Maximum durations. Fig. 9: Two process maps showing performance related metrics. Note the dis- crepancy between the mean duration between activity ows and the maximum duration. The mean durations show the bottlenecks for average cases, whereas the maximum duration highlights a dierent part of the process model which acts a bottleneck, albeit only in some exceptional cases. One can continue to apply various other lters and views in order to extract additional process maps. Fig. 10, for instance, shows the process map obtained for the incidents log after ltering on performance (keep only quickest 95% of cases, i.e. taking less than about one month), endpoints (cases should start with either a Accepted or Queued activity and end with Completed), time frame (rst of January 2012 onwards), and variation (use only cases whose sequence of activities is shared by at least two cases). The process map is quite readable and easy to interpret, but does not help further towards gaining more insights, 16 Seppe vanden Broucke et al. which requires rst stating the goals and questions one wants to investigate. We will do so in the following section. 4 Answering the Process Owner's Questions This section explores the questions posed by the process owner in the context of the BPI challenge. Four broad questions were outlined, which will be dealt with accordingly in the following subsections. 4.1 Question 1: Push to Front This question only applies to the incidents event log. In this process, a strategy is put in place which states that most of the incidents should be resolved by the rst line support teams. The process owner is interested in knowing how many (and which) cases escalate to a second or third team. The provided case description is somewhat ambiguous concerning which cri- teria exactly determine whether an event was executed by a rst line team. It is stated that The second line could be a support team within Org line C or a support team working with Org line A2. This might lead one to believe that all teams not belonging to either Org line C or Org line A2 are the rst line teams. However, inspecting the Involved ST eld in the data set includes names such as V5 3rd and V13 2nd 3rd, which leads us to believe that the level assignment is included in this moniker. Therefore, we will add an extra eld to the data set. Support team names including the strings 2nd or 3rd will be regarded as an above rst level team (second, third or both). Names not including such string shall be regarded as rst level teams. Next, we import the data set in Disco, but specify the Involved ST as the activity and derive the following case counts: 7554 cases in total, creating an unstructured model. 4782 cases exist (63%) which did not require intervention from a second or third line support team. This is obtained by applying an attribute lter. 2772 cases (the others) did pass through a higher level team. 704 of these cases even immediately start in a higher level. 556 cases start and stay within higher level teams during their complete life cycle. This is a peculiar result. Fig. 11 shows the three most involved support teams (all paths shown) where this happens. For what products is the push to front mechanism most used? We export the cases found with Disco and turn to R to answer the posed sub- questions. After some straightforward operations, we derive for each product the number of times the product got involved in a case passing through a higher level support team. Fig. 12 shows a scatter plot showing the number of cases each product was involved in for which the case did not go through second or third level support teams (blue color) or did escalate to second or third level BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 17 6 33 220 745 Queued/Awaiting Assignment 2 4711 4065 3001 27 4969 54 3811 61 3 Accepted/In Progress 7 14756 385 Accepted/Wait - User Completed/Resolved Completed/Closed 3562 770 1278 3638 3562 1152 3562 2530 35 1936 5 1867 69 Completed/In Call 1941 Fig. 10: Process map for the incidents log after ltering on endpoints, time frame, performance and variants. Activities with an absolute frequency less than a thousand are ltered out as well. All paths are retained. 18 Seppe vanden Broucke et al. 36 126 37 G47 2nd 113 G51 2nd 374 N15 2nd 167 149 518 208 36 109 36 Fig. 11: 556 cases stay within second or third level support teams during their complete life cycle. support teams (red color). We can indeed notice some products being escalated frequently to higher levels. Table 6 shows the twenty products with the highest number of cases with second or third level support team involvement, together with the number of cases without escalation. What is the impact of incident escalation on performance? We did relating to a not nd a clear relationship between mean duration for all cases certain product and the number of times the product was involved in a case escalated to second or third level support teams (or not involved). However, there is a signicant dierence in means of case durations for products which were never involved in an escalated case and product which were (8.6 versus 16.5 days). There also exists a signicant dierence in means of case durations for cases pushed to second/third level teams and cases that did not, i.e.: an average case duration of 20.5 days versus 7.2 days. Avoiding higher level escalation is thus indeed an important goal to be maintained. We have also inspected the impact of escalated incident cases on the duration of problem handling cases, as the case states that cases ending up in second or third line support teams disturbs them from doing their core activity (which is generally not support) and that handling problem cases are primarily the responsibility of 2nd and 3rd line support teams. It might thus be interesting to see if we can nd a correlation between the performance of second and third line teams in the problem handling process (one of their main tasks) and the number of active escalated incident cases (which disturb them). Fig. 13 shows the resulting outcome. The thick lines represent the number of cases over time (all incident cases, escalated incident cases and problem cases both open and closes), whereas the dotted lines show the mean and median duration of problem cases active in the same time window. No signicant correlation can be derived. This might be due to the data set (incident cases arrive neatly over time but BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 19 500 400 Number of Cases 300 200 100 0 Products Fig. 12: Scatter plot over the products showing the number of cases the product was involved in for which the case (i) did not go through second or third level support teams (blue color points) or (ii) did escalate to second or third level support teams (red color crosses). Table 6: Ten products with highest number of cases with second or third level support team involvement. Number of Cases Number of Cases Product with Second or with First Level Third Level Involvement Involvement Only PROD424 196 686 PROD698 93 47 PROD542 75 0 PROD253 53 173 PROD243 47 2 PROD494 44 142 PROD802 43 35 PROD660 42 442 PROD267 39 80 PROD264 39 1 20 Seppe vanden Broucke et al. almost always keep running until the middle of 2012). Also, in our analysis, cases are seen as escalated even when they are not yet escalated at a specic point in time (note however that this does not impact the median duration of problem cases, which stays stable over time anyway). More data is needed to perform a more rigorous analysis, or it might simply be that there exists no correlation between these two specic properties. Active Incidents over Time 60 Active Excalated Incidents over Time Active Problems over Time (Open and Closed) Mean Duration of Problem Cases (Open and Closed) Median Duration of Problem Cases (Open and Closed) 50 Active Cases (Hundreds) Mean Duration (Days) 40 30 20 10 0 2010 2011 2012 Time Fig. 13: Impact on number of escalated incident cases on duration of problem cases over time. Where in the organization is the push to front process most imple- mented, specically if we compare the Org line A2 with the Org line C? What functions are most in line with the push to front process? Finding out which organizational lines (or support teams or functional divisions) were best able to deal with incidents without escalating them is non-trivial, as these are event-related attributes and do not necessarily remain constant over a complete case, so that deriving event based counts (e.g. with Disco) leads to skewed distributions when cases contain varying number of events. For the organization lines A2 and C, we can apply an attribute lter to retrieve the number of cases. Of the 2772 cases which were escalated to higher level teams, 1298 of them were handled by Org line A2 at some point, whereas 1950 cases were handled by Org line C. BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 21 For the involved support teams (and functional divisions), it is bothersome to re-apply a lter for each dierent support team. Hence, we switch once more to R. We aggregate all escalated and non-escalated cases by support team and use the length of the unique list of Case IDs (SR Number) as the aggregation function to get, for each support team, the number of cases the team was involved in (for at least one activity) (only taking cases into account which escalated or did not escalate to a higher level team). We then order the teams based on their relative frequency dierence between escalated and non escalated cases to account for the dierent number of cases. Table 7 shows the resulting overviews. An overview for organization lines was included once more. Note the absence of Org line C is this listing: although this line handled 1950 cases which were escalated, the organization also handled 4176 cases which were not escalated, so that this organization is dropped after deriving the relative frequency dierence. 4.2 Question 2: Ping Pong Behavior The case description states: It occurs that support teams start to send incidents to each other again and again (ping pong) which of course is an unwanted situ- ation. The business owner is interested in nding out about such cases. First, we have to be clear about what exactly is meant with ping pong behavior: The attribute of interest is the involved support team; Ping pong is dened as a case starting with support team A, moving then to another support team B (ping) and nally coming back to A later on (pong). For example, we will consider the following support team trajectory as being acceptable: hA, B, C, Di, although many support teams are involved, while the following is not: hA, B, C, A, Di. No information about status requirements are given. Note that it is how- ever possible to perform an additional check here, e.g. also require that the Queued/Awaiting Assignment status is revisited again when switching sup- port teams. Let us start with investigating the incidents log. We rst lter out all cases which stay with the same activity (here: the involved support team), by using a Follower lter in Disco which allows any activity succession but requires dierent values for each matched pair. This leaves us with 50% of the ping cases (3820 cases). Next, we want to get the pong behavior (returning to a previous support team). This is less easy to retrieve with Disco, as the follower lter cannot be applied here. As such, we search for tandem repeats using R and the following aggregation. We aggregate by Case ID (SR Number) and apply a custom function to signpost cases containing tandem repeats. The method we apply is a bit involved and further detailed in Listing 1. We nd 1258 cases with ping pong behavior. We can then export these cases to Disco again for visual validation and analysis. Fig. 14 shows a simplied process map containing the ping pong cases with some support teams hidden but all paths shown. Back-and-forth ping pong behavior is indeed clearly visible. 22 Seppe vanden Broucke et al. Table 7: Ten highest organization lines, support teams and functional divisions ordered based on relative dierence between escalated and non escalated cases. (a) Organization lines. Organization Nr. of Cases with Nr. of Cases with Line Second or Third (%) First Level (%) Level Involvement Involvement Only Org line A2 1298 46.83% 501 10.48% Org line B 408 14.72% 219 4.58% Other 318 11.47% 242 5.06% Org line V11 85 3.07% 41 0.86% Org line G1 50 1.80% 4 0.08% Org line V2 37 1.33% 32 0.67% Org line G2 23 0.83% 14 0.29% Org line F 13 0.47% 4 0.08% Org line E 15 0.54% 16 0.33% Org line V3 5 0.18% 4 0.08% (b) Involved support team. Support Nr. of Cases with Nr. of Cases with Team Second or Third (%) First Level (%) Level Involvement Involvement Only G97 486 17.53% 490 10.25% G92 123 4.44% 45 0.94% D1 58 2.09% 11 0.23% D2 94 3.39% 78 1.63% D4 106 3.82% 104 2.17% V17 3rd 33 1.19% 6 0.13% D5 86 3.10% 101 2.11% G34 3rd 27 0.97% 2 0.04% G49 29 1.05% 7 0.15% D6 36 1.30% 28 0.59% (c) Functional division. Functional Nr. of Cases with Nr. of Cases with Division Second or Third (%) First Level (%) Level Involvement Involvement Only (blank) 1008 36.36% 378 7.90% E_10 687 24.78% 3 0.06% A2_2 385 13.89% 71 1.48% A2_4 302 10.89% 27 0.56% A2_3 244 8.80% 18 0.38% A2_1 515 18.58% 567 11.86% E_6 154 5.56% 3 0.06% E_5 328 11.83% 346 7.24% E_1 77 2.78% 10 0.21% A2_5 75 2.71% 41 0.86% BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 23 Listing 1: Finding ping pong behavior. # Assume the following trace with ping pong behavior # ( return to b ) : > trace <- c ( 'a ', 'a ' , 'b ' , 'c ', 'b ' , 'd ') # " Match " all elements equaling b : > match ( c ( trace ) , 'b ' , nomatch =0) [1] 0 0 1 0 1 0 # Take the difference between elements : > diff ( match ( c ( trace ) , 'b ' , nomatch =0) ) [1] 0 1 -1 1 -1 # 0: stays at 'b ' , -1: leaves from 'b ', 1: enters 'b ' # We thus check if 'b ' is ' entered ' multiple times by finding # elements equaling 1: > match ( diff ( match ( c ( trace ) , 'b ' , nomatch =0) ) , + 1 , nomatch =0) [1] 0 1 0 1 0 # And summing : > sum ( match ( diff ( match ( c ( trace ) , 'b ' , nomatch =0) ) , 1 , + nomatch =0) ) > 1 [1] TRUE # However , it could be that our trace starts with the element # we ' re finding a tandem repeat for , we thus create an # artificial start event : > sum ( match ( diff ( match ( c ( ' __START__ ' , trace ) , 'b ' , nomatch =0) ) , 1 , nomatch =0) ) > 1 [1] TRUE # We ' ve only tested for 'b ' , but need to do # so for each element : > lapply ( unique ( trace ) , + function ( el ) { + sum ( match ( diff ( match ( c ( ' __START__ ' , trace ) , el , + nomatch =0) ) , 1 , nomatch =0) ) > 1 + } + ) [[1]] [1] FALSE [[2]] [1] TRUE [[3]] [1] FALSE [[4]] [1] FALSE # Summing : is there an element which is tandem repeating ? > sum ( unlist ( lapply ( unique ( trace ) , + function ( el ) { + sum ( match ( diff ( match ( c ( ' __START__ ' , trace ) , el , nomatch =0) ) , 1 , nomatch =0) ) > 1 + } + ))) > 0 > [1] TRUE 24 Seppe vanden Broucke et al. 85 112 G97 2506 226 3041 1 16 58 114 D8 910 126 G96 499 1221 817 31 35 1 D4 572 122 1100 61 47 92 123 89 14 D2 653 D5 1052 941 1408 29 30 96 101 V51 2nd 57 97 Fig. 14: Simplied process map (some support teams hidden) clearly showing back-and-forth ping pong behavior. Is there a correlation between the ping pong behavior and the total life time of an incident? Fig. 15 shows a scatter plot with case durations separated by cases with (colored red) and without ping pong behavior (colored blue). Contrary to the statement made in the case description, no signicant correlation becomes apparent. A Welch Two Sample t-test conrms that the means of 1.47 (ping pong) days and 1.35 days (no ping pong) are not signicantly dierent at the 95% condence level. Table 8 summarizes the ndings of ping pong behavior for the three event logs. Since the problem processes do not contain separate elds for the involved support team and support team functional division, we have only used the latter for these logs. We note that the relation between duration and ping pong behav- ior is not so clear as it may appear at rst sight. For instance, we nd that closed problems which do exhibit ping pong behavior are closed faster than those which do not. One might suggest to inspect the correlation between case duration and the number of ping pongs performed, but we did not nd any particular pattern when doing so for the incidents log (using the number of support teams which ping ponged out of the total support teams involved in the case). What are the functions, organizations, support teams responsible for most of the ping pong? What products are most aected by it? Due to space limitations, we only include the results for the incidents log. The same approach can be applied to the other logs. Again, we aggregate at the trace level, rather than the event level to account for skewed results. Table 9 lists the ten highest products, support teams, functional divisions and organization lines for the cases containing ping pong behavior. BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 25 500 400 Duration (Days) 300 200 100 0 Cases Fig. 15: Scatter plot showing the durations for cases with ping pong behavior (red crosses) and without (blue points) for the incident event log. Table 8: Summarized ndings of ping pong behavior for the three event logs. Ping behavior denotes cases which jump from one support team (or functional division) to another. Ping pong behavior are cases which also return to an earlier visited team or division. Nr. of Cases Nr. of Cases Signicant Event Log Nr. of Cases with with Mean Duration Ping Ping Pong Dierence Behavior Behavior incidents (involved support team) 7554 3820 1258 No (p-value: 0.10) incidents (support team 7554 2574 1159 No (p-value: 0.31) functional division) open problems (support team 819 71 21 Yes (p-value: 0.00), functional division) 22 versus 3 days closed problems Yes inverse (support team 1487 210 92 (p-value: 0.00), functional division) 22 versus 43 days) 26 Seppe vanden Broucke et al. Table 9: Products and teams exhibiting the most ping pong behavior (shown as number of cases). Only the ten highest items are shown. Product Support Functional Involved (Nr. of Cases) Team Division Organization Line PROD424 126 G97 286 V3_2 606 Org line C 972 PROD542 74 G96 242 A2_1 567 Org line A2 617 PROD236 41 3D4 192 (blank) 470 Org line B 206 PROD455 37 D5 126 E_10 247 Other 115 PROD776 31 D8 121 A2_2 149 Org line V11 109 PROD258 29 D2 115 E_5 122 Org line V7n 76 PROD697 29 G230 2nd 78 A2_3 82 Org line G4 38 PROD305 27 V37 2nd 76 E_6 71 Org line G1 34 PROD789 25 G179 71 A2_5 63 Org line E 30 PROD215 24 V32 2nd 64 A2_4 59 Org line V7 17 4.3 Question 3: Wait User Abuse The challenge case states: People try to nd workarounds that stop the clock from ticking. One way of doing this is manually giving an incident the sub status 'wait user'. Although there are guidelines not to use this sub status, some people are breaking this guideline. We again use the incidents log to inspect this behavior. We're interested in the Sub Status Wait - User, which only occurs together with the Accepted main status. We can thus lter on this activity which results in 33% of the cases containing the Accepted/Wait - User activity. Who is making most use of this sub status (action owner)? What is the behavior per support team, function, organization etc? To delve deeper into this behavior, we inspect how long cases stay within the Accept- ed/Wait - User state before another state becomes active, and will correlate this with other properties. Table 11 provides an overview of the ten support team members having spent most time in the Wait - User state, ranked by the median to account for exceptional outliers. Again, one should take care not to derive conclusions too quickly. We could, for example, lter out owners such as Susanne whose minimum duration does not exceed a certain threshold. Al- ternatively and perhaps a better approach since we do not possess starting and completions times of activities, we could also check for the absolute number of times someone activated the Wait - User state, instead of inspecting the duration. This provides the following list: Siebel, Pawel, Muthu, Brecht, Marcin, Fredrik, Andreas, Katia, Krzysztof, Emil. Just as for the previous questions, this count can also be aggregated at the case level. Follow-up questions by the process owner can thus quickly result in more ne-tuned analysis. We close this question with Table 11, which provides an overview of the ten highest support teams, functional divisions, organization lines, countries and owners making most use of the Wait - User status, ordered by the number of times the associated actor executed a Wait - User status transition. BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 27 Table 10: Ten highest support team members spending most time in the Ac- cepted/Wait - User state, ranked by median to account for exceptional outliers. Owner Median Min Mean Max Total First Name Duration Duration Duration Duration Duration (Days) (Days) (Days) (Days) (Days) Gregor 29.38 28.87 29.38 29.89 58.76 Gabriel 27.52 0.01 27.52 55.03 55.05 Brice 21.10 2.98 21.10 39.23 42.21 Reinier 20.12 3.92 112.61 313.78 337.82 Susanne 18.07 0.00 29.20 78.02 175.20 Maurice 17.45 0.00 17.45 34.90 34.90 Earl 12.02 0.00 44.66 329.01 5761.26 Suliman 11.17 0.00 11.17 22.33 22.34 Bengt 9.57 0.00 9.57 19.14 19.14 Shery 8.46 0.03 8.46 16.89 16.92 Table 11: Ten highest support teams, functional divisions, organization lines, countries and owners making most use of the Wait - User status. Support Team Functional Organization Country Owner (Nr. of Events) Division Line First Name G97 704 V3_2 1930 Org line C 2783 Sweden 1227 Siebel 173 G96 226 A2_1 803 Org line A2 847 Poland 1018 Pawel 123 D5 212 E_10 383 Org line B 355 India 787 Muthu 80 D8 194 (blank) 258 Other 138 Belgium 305 Brecht 75 G92 191 E_5 248 Org line V2 50 Brazil 225 Marcin 71 G230 2nd 190 A2_2 180 Org line G4 12 USA 225 Fredrik 69 S42 178 D_1 161 Org line G2 10 0 174 Andreas 63 S43 136 A2_4 94 Org line V5 7 China 87 Katia 62 S56 123 A2_3 54 Org line V1 4 France 59 Krysztof 56 D7 97 E_4 31 Org line V11 3 Japan 43 Emil 55 4.4 Question 4: Process Conformity per Organization This question pertains to Org line A2 and Org line C: In general the Volvo IT organization is spread in two organizations: Org line A2 and Org line C. It would be interesting to see how conform or how much in line every organization is with the incident and problem management processes. This is an open-ended question for which more domain knowledge must be gathered before it can be answered in a satisfactory way. The main question is what is meant with in line. One particular perspective was already focused upon when dealing with the rst question: which of these two organization lines perform most case escalations to higher level support teams? When being in line is related to case duration, we can derive the mean case durations for Org line A2 and Org line C respectively: 21 days versus 10 days (for the incidents log). When SLA's are set on total running time of case, the outcome might, however, be dierent. Consider for example an SLA determining that incident handling cases may take at most 7 days (one week). Now Org line A2 violates this 1494 while Org line C performs worse: 3529. Finally, it is also possible to view in line conformance in terms of a prescrip- tive process already in place. Ticketing based systems such as this one, however, 28 Seppe vanden Broucke et al. are not often subjected to such designed processes, as is also evidenced by the unstructured nature of the full process map with no ltering applied. As a proof of concept, however, let us assume a designed process as shown in Fig. 16. We will only make use of Status transitions to allow for enough freedom for the sake of this example. We assume that cases must start with an Accepted or Queued status. Both these statuses can repeat, and it is possible to go from one state to another as well. Finally, cases must be ended with a Completed state following Accepted. Completed Accepted Queued Fig. 16: An example designed prescriptive to-be process model. We load this model into ProM in order to perform control-ow based con- formance analysis. We use the alignment based conformance checking plugin by Adriansyah et al. [1315]. Fig. 17 shows the results of the conformance analysis. Inspecting the trace tness distributions (a measure of being in line with the prescribed process), we can derive that Org line A2 is less conform (relatively) than Org line C. Keep in mind, of course, that we have used an example model. By comparing the as-is situation with the to-be model, it becomes possible to implement changes in order to try to x problem areas. 5 Conclusions This report presented the results we uncovered related to the analysis of the Volvo IT data set for the Third International Business Process Intelligence Chal- lenge (BPIC'13). An open-minded exploratory surveyance of the data set was performed, followed with a deeper analysis towards answering the specic ques- tions posed by the process owner. To do so, we have applied both traditional data analysis tools and process- oriented techniques. As evidenced by this report, there exists a gap between these two categories of tools, causing diculties when ltering, querying, ex- ploring and analyzing event based data sets, as is also put forward as one of the challenges listed in the process mining manifesto [2]. Therefore, we have utilized a hybrid approach, combining and switching back-and-forth between (mainly) R and Disco. BPI Challenge 2013 (BPIC'13): Volvo IT Belgium 29 (a) Trace tness distribution: Org (b) Trace tness distribution: Org line A2. line C. (c) Aggregated overview highlighting conformance issues. Fig. 17: Conformance analysis results for Org line A2 on example process model. Acknowledgment We would like to thank the KU Leuven research council for nancial support under grand OT/10/010 and the Flemish Research Council for nancial support under Odysseus grant B.0915.09. We thank Véronique Van Vlasselaer for her insights regarding R. References [1] van der Aalst, W.M.P., van Dongen, B.F., Günther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: Prom: The process mining toolkit. In de Medeiros, A.K.A., Weber, B., eds.: BPM (Demos). Volume 489 of CEUR Workshop Proceedings., CEUR-WS.org (2009) [2] van der Aalst, W.M.P.: Process mining manifesto. In Daniel, F., Barkaoui, K., Dustdar, S., eds.: Business Process Management Workshops (1). Vol- ume 99 of Lecture Notes in Business Information Processing., Springer (2011) 169194 (Authors ommited.). 30 Seppe vanden Broucke et al. [3] Gabadinho, A., Müller, N.S., Ritschard, G., Studer, M.: Traminer: a toolbox for exploring sequence data [4] Wang, H., Zaniolo, C.: Atlas: A native extension of sql for data mining and stream computations (2003) [5] Law, Y.N., Wang, H., Zaniolo, C.: Query languages and data models for database sequences and data streams. In: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30. VLDB '04, VLDB Endowment (2004) 492503 [6] Law, Y.N., Wang, H., Zaniolo, C.: Relational languages and data models for continuous queries on sequences and data streams. ACM Trans. Database Syst. 36(2) (June 2011) 8:18:32 [7] Sadri, M.R.: Optimization of Sequence Queries in Database Systems. PhD thesis, University of California, Los Angeles (2001) [8] Sadri, R., Zaniolo, C., Zarkesh, A., Adibi, J.: Optimization of sequence queries in database systems. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. PODS '01, New York, NY, USA, ACM (2001) 7181 [9] Sadri, R., Zaniolo, C., Zarkesh, A.M., Adibi, J.: A sequential pattern query language for supporting instant data mining for e-services. In: Proceedings of the 27th International Conference on Very Large Data Bases. VLDB '01, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. (2001) 653656 [10] Kleene, S.C.: Representation of events in nerve nets and nite automata. In Shannon, C.E., McCarthy, J., eds.: Automata Studies. Princeton University Press (1956) 340 [11] van der Aalst, W.M.P., de Beer, H.T., van Dongen, B.F.: Process mining and verication of properties: An approach based on temporal logic. In: OTM Conferences (1). (2005) 130147 [12] Weijters, A.J.M.M., Ribeiro, J.T.S.: Flexible heuristics miner (fhm). In: CIDM, IEEE (2011) 310317 [13] Adriansyah, A., Sidorova, N., van Dongen, B.F.: Cost-based tness in con- formance checking. In Caillaud, B., Carmona, J., Hiraishi, K., eds.: ACSD, IEEE (2011) 5766 [14] Adriansyah, A., Munoz-Gama, J., Carmona, J., van Dongen, B.F., van der Aalst, W.M.P.: Alignment based precision checking. In Rosa, M.L., Soer, P., eds.: Business Process Management Workshops. Volume 132 of Lecture Notes in Business Information Processing., Springer (2012) 137149 [15] van der Aalst, W.M.P., Adriansyah, A., van Dongen, B.F.: Replaying his- tory on process models for conformance checking and performance analysis. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery 2(2) (2012) 182192