=Paper= {{Paper |id=Vol-2703/paperDC8 |storemode=property |title=Event Log Extraction: How to Minimize the Effort of the Human-in-the-Loop? (Extended Abstract) |pdfUrl=https://ceur-ws.org/Vol-2703/paperDC8.pdf |volume=Vol-2703 |authors=Vinicius Stein Dani |dblpUrl=https://dblp.org/rec/conf/icpm/Dani20 }} ==Event Log Extraction: How to Minimize the Effort of the Human-in-the-Loop? (Extended Abstract)== https://ceur-ws.org/Vol-2703/paperDC8.pdf
      Event Log Extraction: How to Minimize the Effort
       of the Human-in-the-Loop? (Extended Abstract)
                                                               Vinicius Stein Dani
                                                 Department of Information and Computing Sciences
                                                                 Utrecht University
                                                             Utrecht, The Netherlands
                                                                  v.steindani@uu.nl


        Abstract—To conduct process mining an event log is required.        could have been used during other stages. Several approaches
     However, extracting event logs is often time-consuming, especially     address the problem of automating event log extraction, with
     when the respective IT systems do not store events in a process-       their own assumptions and requirements. Some assume the
     oriented way. The main reason is that tasks have to be performed
     manually, such as identifying entries in a transactional database      existence of redo-logs [6], while others require the interaction
     that relate to the activities of the considered process. In this PhD   of a domain expert [7]. A recent approach [8] proposes to
     project, we address this problem of the human-in-the-loop during       link additional information to the event log and, in this way,
     event log extraction. Our main goal is to minimize, through            integrate the process and data perspective. However, all these
     automation, the manual effort involved in the extraction of event      approaches are far from being fully automated.
     logs.
        Index Terms—process mining, event log extraction, automa-              Based on the findings from an exploratory literature review
     tion, human-in-the-loop                                                we conducted, we identified that one recurring time-consuming
                                                                            stage of a process mining project is related to tasks that require
                              I. I NTRODUCTION                              domain knowledge and a domain expert interaction [1]–[3],
        Process mining is widely used in a plethora of fields               [5]–[8], [11]. Often, a domain-expert is required to point
     including healthcare, banking, and production [1], [2]. The            relations between, for example, the process model that is to be
     general applicability and the value of process mining in these         analyzed and the data model (or data schema) that represents
     fields has been demonstrated in the context of various case            the data that is to be used as a source for the event log
     studies [3]. Process mining requires an event log [4], which           extraction. This so-called correspondence definition is, among
     is not always readily available in practice [5]. An event log          others, a manual preparation effort that may be feasible to
     is composed of at least three different attributes related to the      address through automation.
     execution of the considered process: (i) a case identifier, to
     identify the instances of the process; (ii) an activity identifier,          III. R ESEARCH G OAL AND R ESEARCH Q UESTIONS
     to represent the activities pertaining to the process; and, (iii)
     a timestamp, to represent when the activity was executed [4].           Considering this, we address the problem of the event log
        Different approaches exist to extract event logs and partially      manual preparation effort. Our main goal is the following:
     reduce the manual effort associated with this extraction [6]–
     [8]. However, to the best of our knowledge, there is no event            •   To minimize, through automation, the manual preparation
     log extraction approach available that fully automates this task.            steps that are associated with the event log extraction
     Besides, in current process mining methods the data extraction               phase from an operational database.
     stage is described minimally [9], [10]. Therefore, the goal of            With this purpose in mind, we consider answering the
     this PhD project is to minimize the manual efforts during the          following research questions as the basis of our project:
     event log extraction through automation.
        In the remainder of this work, Section II presents the                • What are the techniques currently being used to extract
     motivation for our research. Section III elaborates on our                 event logs? The objective here is to identify existing
     research goal and questions. Section IV discusses how we                   techniques, their inputs, outputs, assumptions, the extent
     intend to address our research goal. Finally, in Section V, we             to which these are automated, and the necessary steps for
     report on the current stage of this PhD project.                           extracting an event log;
                                                                              • What are the manual tasks performed during the event
                                II. M OTIVATION                                 log extraction? By answering this, we aim to acquire a
        Event log extraction is time-consuming and requires a                   clear view of which manual tasks are required during the
     significant amount of domain knowledge [8]. It is a costly                 event log extraction;
                                                                              • Can such tasks be object of automation? With this, we
     part of any process mining project, draining resources that
                                                                                expect to build a list of tasks that are feasible to automate,
       Netherlands Organisation for Scientific Research.                        which can be used to prioritize and focus our efforts.




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                    IV. R ESEARCH M ETHOD                              identify the database tables that contain information relevant to
   To solve our goal, we will conduct this project based on an         the process execution. Such identification represents an align-
iterative, seven-step approach, where we will: (i) gather from         ment problem. Acknowledging this, we intend to characterize
the literature - and practitioners’ experience - a set of currently    the relations between a process model, a database schema, and
used techniques to extract event logs; (ii) identify the tasks         its textual descriptions; and, afterward, to compute the most
which still demand a human-in-the-loop; (iii) map the findings         likely alignment. Based on this, we may use a Markov Logic
from the literature and practitioners’ perspectives to see where       formalization to specify how an optimal alignment can be
they overlap or differ; (iv) build a list of feasible tasks to         obtained. Based on the alignment generated by our technique,
automate; (v) validate this list with experts; (vi) prioritize, also   the extraction of the event log could be achieved.
with experts, the tasks to be tackled first; (vii) build a script to      Finally, we will prepare a research paper to report on
automate the task with higher priority in the list; (vii) validate     the exploratory literature review and the manual event log
the automation with experts.                                           extraction we performed. We will discuss what could be done
   To address our research questions, as well as steps (i)             differently and automatically as to minimize the manual efforts
and (ii) specifically, we will perform a systematic literature         during the event log extraction task.
review [12] and a survey [13]. We want to gain a two-sided                                      ACKNOWLEDGMENT
perspective on the event log extraction task: both from the
literature and from practitioners. Hence, we will bring these            Part of this research was funded by NWO (Netherlands
findings together to see if the tasks that the practitioners           Organisation for Scientific Research) project number 16672.
mention they perform during the event log extraction are                                            R EFERENCES
related to the ones found in the literature. Thereafter, we will        [1] C. d. S. Garcia, A. Meincheim, E. R. Faria Junior, M. R. Dallagassa,
advance to the next steps of this approach towards our main                 D. M. V. Sato, D. R. Carvalho, E. A. P. Santos, and E. E. Scalabrin,
goal.                                                                       “Process mining techniques and applications – A systematic mapping
                                                                            study,” Expert Systems with Applications, vol. 133, pp. 260–295, 2019.
     V. C URRENT S TAGE OF THIS R ESEARCH P ROJECT                      [2] A. Corallo, M. Lazoi, and F. Striani, “Process mining and industrial
                                                                            applications: A systematic literature review,” Knowledge and Process
   This project is still in an early phase. So far, we manually             Management, no. January, pp. 1–9, 2020.
extracted an event log to better understand, from a practi-             [3] G. B. Pereira, E. A. P. Santos, and M. M. C. Maceno, “Process mining
tioner’s perspective, how such a task is performed. We used                 project methodology in healthcare: a case study in a tertiary hospital,”
                                                                            Network Modeling Analysis in Health Informatics and Bioinformatics,
sample data from a SAP P2P process provided by an industry                  vol. 9, no. 1, pp. 1–14, 2020.
partner.                                                                [4] W. M. P. van der Aalst, Process Mining: Discovery, Conformance and
   Next, we conducted an exploratory literature review to build             Enhancement of Business Processes, 2011.
                                                                        [5] K. Diba, K. Batoulis, M. Weidlich, and M. Weske, “Extraction, cor-
up initial knowledge regarding existing approaches for the                  relation, and abstraction of event data for process mining,” in Wiley
event log extraction and its preparation efforts. For instance,             Interdisciplinary Reviews: Data Mining and Knowledge Discovery,
current tools such as ProMimport [14] and XESame [15] help                  vol. 10, no. 3, 2020.
                                                                        [6] W. M. P. van der Aalst, “Extracting Event Data from Databases to
users to integrate event log data from heterogeneous sources,               Unleash Process Mining,” pp. 105–128, 2015.
although they are not able to automatically identify the relevant       [7] M. Jans, “From relational database to valuable event logs for process
database tables. From another side, in [16], the relevant                   mining purposes : a procedure,” Tech. Rep. January, 2017.
                                                                        [8] E. González López de Murillas, H. A. Reijers, and W. M. van der Aalst,
database tables may be identified and related to the process                “Connecting databases with process mining: a meta model and toolset,”
model; however, new adapters need to be implemented for                     Software and Systems Modeling, vol. 18, no. 2, pp. 1209–1247, 2019.
every different data source format and structure. To approach           [9] M. Bozkaya, J. Gabriels, and J. M. Van Der Werf, “Process diagnostics:
                                                                            A method based on process mining,” Proceedings - International Con-
this problem from another perspective may be only one of the                ference on Information, Process, and Knowledge Management, eKNOW
building blocks for minimizing the human-in-the-loop effort                 2009, no. February 2009, pp. 22–27, 2009.
in the event log extraction task. And, to achieve this, some           [10] M. L. Van Eck, X. Lu, S. J. J. Leemans, and W. M. P. van der Aalst,
                                                                            “PM2: A Process Mining Project Methodology,” 2015.
scientific challenges need to be addressed:                            [11] C. Rodrı́guez, R. Engel, G. Kostoska, F. Daniel, F. Casati, and
   • Partial alignment: the relationship between the tables of              M. Aimar, “Eventifier: Extracting process execution logs from opera-
      a database and a process model is partial, i.e., only a               tional databases,” in CEUR Workshop Proceedings, vol. 936, 2012, pp.
                                                                            17–22.
      small number of tables relate to process model activities.       [12] B. Kitchenham and S. Charters, “Guidelines for performing system-
      Mechanisms to differentiate between relevant and irrele-              atic literature reviews in software engineering,” Keele University and
      vant tables are required;                                             Durham University Joint Report, Tech. Rep. EBSE 2007-001, 2007.
                                                                       [13] A. Bryman, “Social research methods,” Journal of Chemical Information
   • Naming differences: process models and operational                     and Modeling, vol. 53, no. 9, pp. 1689–1699, 2013.
      databases often use completely different, or even cryptic,       [14] C. W. Günther and W. M. Van Der Aalst, “A generic import framework
      names to refer to similar entities. This means that iden-             for process event logs,” in Lecture Notes in Computer Science (including
                                                                            subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
      tifying relevant tables cannot exclusively rely on textual            Bioinformatics), vol. 4103 LNCS, 2006, pp. 81–92.
      similarity measures.                                             [15] H. M. Verbeek, J. C. Buijs, B. F. Van Dongen, and W. M. Van Der Aalst,
                                                                            “XES, XESame, and ProM 6,” in Lecture Notes in Business Information
   Considering these challenges, one possible direction is to               Processing, vol. 72 LNBIP, 2011, pp. 60–75.
work on a technique to automate the identification of corre-           [16] E. González López De Murillas, “Extracting Event Data from Real-Life
spondences between a process model and a database schema to                 Data Sources Process Mining on Databases,” Tech. Rep., 2019.