=Paper=
{{Paper
|id=Vol-2703/paperDC8
|storemode=property
|title=Event Log Extraction: How to Minimize the Effort of the Human-in-the-Loop? (Extended Abstract)
|pdfUrl=https://ceur-ws.org/Vol-2703/paperDC8.pdf
|volume=Vol-2703
|authors=Vinicius Stein Dani
|dblpUrl=https://dblp.org/rec/conf/icpm/Dani20
}}
==Event Log Extraction: How to Minimize the Effort of the Human-in-the-Loop? (Extended Abstract)==
Event Log Extraction: How to Minimize the Effort of the Human-in-the-Loop? (Extended Abstract) Vinicius Stein Dani Department of Information and Computing Sciences Utrecht University Utrecht, The Netherlands v.steindani@uu.nl Abstract—To conduct process mining an event log is required. could have been used during other stages. Several approaches However, extracting event logs is often time-consuming, especially address the problem of automating event log extraction, with when the respective IT systems do not store events in a process- their own assumptions and requirements. Some assume the oriented way. The main reason is that tasks have to be performed manually, such as identifying entries in a transactional database existence of redo-logs [6], while others require the interaction that relate to the activities of the considered process. In this PhD of a domain expert [7]. A recent approach [8] proposes to project, we address this problem of the human-in-the-loop during link additional information to the event log and, in this way, event log extraction. Our main goal is to minimize, through integrate the process and data perspective. However, all these automation, the manual effort involved in the extraction of event approaches are far from being fully automated. logs. Index Terms—process mining, event log extraction, automa- Based on the findings from an exploratory literature review tion, human-in-the-loop we conducted, we identified that one recurring time-consuming stage of a process mining project is related to tasks that require I. I NTRODUCTION domain knowledge and a domain expert interaction [1]–[3], Process mining is widely used in a plethora of fields [5]–[8], [11]. Often, a domain-expert is required to point including healthcare, banking, and production [1], [2]. The relations between, for example, the process model that is to be general applicability and the value of process mining in these analyzed and the data model (or data schema) that represents fields has been demonstrated in the context of various case the data that is to be used as a source for the event log studies [3]. Process mining requires an event log [4], which extraction. This so-called correspondence definition is, among is not always readily available in practice [5]. An event log others, a manual preparation effort that may be feasible to is composed of at least three different attributes related to the address through automation. execution of the considered process: (i) a case identifier, to identify the instances of the process; (ii) an activity identifier, III. R ESEARCH G OAL AND R ESEARCH Q UESTIONS to represent the activities pertaining to the process; and, (iii) a timestamp, to represent when the activity was executed [4]. Considering this, we address the problem of the event log Different approaches exist to extract event logs and partially manual preparation effort. Our main goal is the following: reduce the manual effort associated with this extraction [6]– [8]. However, to the best of our knowledge, there is no event • To minimize, through automation, the manual preparation log extraction approach available that fully automates this task. steps that are associated with the event log extraction Besides, in current process mining methods the data extraction phase from an operational database. stage is described minimally [9], [10]. Therefore, the goal of With this purpose in mind, we consider answering the this PhD project is to minimize the manual efforts during the following research questions as the basis of our project: event log extraction through automation. In the remainder of this work, Section II presents the • What are the techniques currently being used to extract motivation for our research. Section III elaborates on our event logs? The objective here is to identify existing research goal and questions. Section IV discusses how we techniques, their inputs, outputs, assumptions, the extent intend to address our research goal. Finally, in Section V, we to which these are automated, and the necessary steps for report on the current stage of this PhD project. extracting an event log; • What are the manual tasks performed during the event II. M OTIVATION log extraction? By answering this, we aim to acquire a Event log extraction is time-consuming and requires a clear view of which manual tasks are required during the significant amount of domain knowledge [8]. It is a costly event log extraction; • Can such tasks be object of automation? With this, we part of any process mining project, draining resources that expect to build a list of tasks that are feasible to automate, Netherlands Organisation for Scientific Research. which can be used to prioritize and focus our efforts. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). IV. R ESEARCH M ETHOD identify the database tables that contain information relevant to To solve our goal, we will conduct this project based on an the process execution. Such identification represents an align- iterative, seven-step approach, where we will: (i) gather from ment problem. Acknowledging this, we intend to characterize the literature - and practitioners’ experience - a set of currently the relations between a process model, a database schema, and used techniques to extract event logs; (ii) identify the tasks its textual descriptions; and, afterward, to compute the most which still demand a human-in-the-loop; (iii) map the findings likely alignment. Based on this, we may use a Markov Logic from the literature and practitioners’ perspectives to see where formalization to specify how an optimal alignment can be they overlap or differ; (iv) build a list of feasible tasks to obtained. Based on the alignment generated by our technique, automate; (v) validate this list with experts; (vi) prioritize, also the extraction of the event log could be achieved. with experts, the tasks to be tackled first; (vii) build a script to Finally, we will prepare a research paper to report on automate the task with higher priority in the list; (vii) validate the exploratory literature review and the manual event log the automation with experts. extraction we performed. We will discuss what could be done To address our research questions, as well as steps (i) differently and automatically as to minimize the manual efforts and (ii) specifically, we will perform a systematic literature during the event log extraction task. review [12] and a survey [13]. We want to gain a two-sided ACKNOWLEDGMENT perspective on the event log extraction task: both from the literature and from practitioners. Hence, we will bring these Part of this research was funded by NWO (Netherlands findings together to see if the tasks that the practitioners Organisation for Scientific Research) project number 16672. mention they perform during the event log extraction are R EFERENCES related to the ones found in the literature. Thereafter, we will [1] C. d. S. Garcia, A. Meincheim, E. R. Faria Junior, M. R. Dallagassa, advance to the next steps of this approach towards our main D. M. V. Sato, D. R. Carvalho, E. A. P. Santos, and E. E. Scalabrin, goal. “Process mining techniques and applications – A systematic mapping study,” Expert Systems with Applications, vol. 133, pp. 260–295, 2019. V. C URRENT S TAGE OF THIS R ESEARCH P ROJECT [2] A. Corallo, M. Lazoi, and F. Striani, “Process mining and industrial applications: A systematic literature review,” Knowledge and Process This project is still in an early phase. So far, we manually Management, no. January, pp. 1–9, 2020. extracted an event log to better understand, from a practi- [3] G. B. Pereira, E. A. P. Santos, and M. M. C. Maceno, “Process mining tioner’s perspective, how such a task is performed. We used project methodology in healthcare: a case study in a tertiary hospital,” Network Modeling Analysis in Health Informatics and Bioinformatics, sample data from a SAP P2P process provided by an industry vol. 9, no. 1, pp. 1–14, 2020. partner. [4] W. M. P. van der Aalst, Process Mining: Discovery, Conformance and Next, we conducted an exploratory literature review to build Enhancement of Business Processes, 2011. [5] K. Diba, K. Batoulis, M. Weidlich, and M. Weske, “Extraction, cor- up initial knowledge regarding existing approaches for the relation, and abstraction of event data for process mining,” in Wiley event log extraction and its preparation efforts. For instance, Interdisciplinary Reviews: Data Mining and Knowledge Discovery, current tools such as ProMimport [14] and XESame [15] help vol. 10, no. 3, 2020. [6] W. M. P. van der Aalst, “Extracting Event Data from Databases to users to integrate event log data from heterogeneous sources, Unleash Process Mining,” pp. 105–128, 2015. although they are not able to automatically identify the relevant [7] M. Jans, “From relational database to valuable event logs for process database tables. From another side, in [16], the relevant mining purposes : a procedure,” Tech. Rep. January, 2017. [8] E. González López de Murillas, H. A. Reijers, and W. M. van der Aalst, database tables may be identified and related to the process “Connecting databases with process mining: a meta model and toolset,” model; however, new adapters need to be implemented for Software and Systems Modeling, vol. 18, no. 2, pp. 1209–1247, 2019. every different data source format and structure. To approach [9] M. Bozkaya, J. Gabriels, and J. M. Van Der Werf, “Process diagnostics: A method based on process mining,” Proceedings - International Con- this problem from another perspective may be only one of the ference on Information, Process, and Knowledge Management, eKNOW building blocks for minimizing the human-in-the-loop effort 2009, no. February 2009, pp. 22–27, 2009. in the event log extraction task. And, to achieve this, some [10] M. L. Van Eck, X. Lu, S. J. J. Leemans, and W. M. P. van der Aalst, “PM2: A Process Mining Project Methodology,” 2015. scientific challenges need to be addressed: [11] C. Rodrı́guez, R. Engel, G. Kostoska, F. Daniel, F. Casati, and • Partial alignment: the relationship between the tables of M. Aimar, “Eventifier: Extracting process execution logs from opera- a database and a process model is partial, i.e., only a tional databases,” in CEUR Workshop Proceedings, vol. 936, 2012, pp. 17–22. small number of tables relate to process model activities. [12] B. Kitchenham and S. Charters, “Guidelines for performing system- Mechanisms to differentiate between relevant and irrele- atic literature reviews in software engineering,” Keele University and vant tables are required; Durham University Joint Report, Tech. Rep. EBSE 2007-001, 2007. [13] A. Bryman, “Social research methods,” Journal of Chemical Information • Naming differences: process models and operational and Modeling, vol. 53, no. 9, pp. 1689–1699, 2013. databases often use completely different, or even cryptic, [14] C. W. Günther and W. M. Van Der Aalst, “A generic import framework names to refer to similar entities. This means that iden- for process event logs,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in tifying relevant tables cannot exclusively rely on textual Bioinformatics), vol. 4103 LNCS, 2006, pp. 81–92. similarity measures. [15] H. M. Verbeek, J. C. Buijs, B. F. Van Dongen, and W. M. Van Der Aalst, “XES, XESame, and ProM 6,” in Lecture Notes in Business Information Considering these challenges, one possible direction is to Processing, vol. 72 LNBIP, 2011, pp. 60–75. work on a technique to automate the identification of corre- [16] E. González López De Murillas, “Extracting Event Data from Real-Life spondences between a process model and a database schema to Data Sources Process Mining on Databases,” Tech. Rep., 2019.