=Paper=
{{Paper
|id=Vol-2973/paper_273
|storemode=property
|title=An XES Extension for Uncertain Event Data
|pdfUrl=https://ceur-ws.org/Vol-2973/paper_273.pdf
|volume=Vol-2973
|authors=Marco Pegoraro,Merih Seran Uysal,Wil M.P. van der Aalst
|dblpUrl=https://dblp.org/rec/conf/bpm/PegoraroUA21
}}
==An XES Extension for Uncertain Event Data==
An XES Extension for Uncertain Event Data Marco Pegoraro1,2 , Merih Seran Uysal1 and Wil M.P. van der Aalst1 1 Chair of Process and Data Science (PADS), Department of Computer Science, RWTH Aachen University, Aachen, Germany 2 Corresponding author. Abstract Event data, often stored in the form of event logs, serve as the starting point for process mining and other evidence-based process improvements. However, event data in logs are often tainted by noise, errors, and missing data. Recently, a novel body of research has emerged, with the aim to address and analyze a class of anomalies known as uncertainty—imprecisions quantified with meta-information in the event log. This paper illustrates an extension of the XES data standard capable of representing uncertain event data. Such an extension enables input, output, and manipulation of uncertain data, as well as analysis through the process discovery and conformance checking approaches available in literature. Keywords Event Data, Uncertainty, XES Standard, Process Mining, Business Process Management 1. Introduction Through the last decades, the increase in the availability of data generated by the execution of processes has enabled the development of the set of disciplines known as process sciences. These fields of science aim to analyze data accounting for the process perspective—the flow of events belonging to a process case. Uncertain event data is a newly-emerging class of anomalous event data. Uncertain data consists of events that have been logged with a quantified measure of uncertainty affecting the recorded information. Sources of uncertainty include noise, human error, or limitations of the information system supporting the process. Such imprecisions affecting the event data are either recorded in an information system with the data itself or reconstructed in a subsequent processing step, often with the aid of domain knowledge provided by process experts. Recently, the possible types of uncertain data have been classified in a taxonomy, and effective process mining algorithms for uncertain event data have been introduced [1, 2]. However, the data standards currently in use within the process science community do not support uncertain event logs. A very popular event data standard is XES (eXtensible Event Stream) [3, 4]. As the name suggest, this standard has been designed to flexibly allow for extensions; in the recent Proceedings of the Demonstration & Resources Track, Best BPM Dissertation Award, and Doctoral Consortium at BPM 2021 co-located with the 19th International Conference on Business Process Management, BPM 2021, Rome, Italy, September 6-10, 2021 Envelope-Open pegoraro@pads.rwth-aachen.de (M. Pegoraro); uysal@pads.rwth-aachen.de (M. S. Uysal); wvdaalst@pads.rwth-aachen.de (W. M.P. v. d. Aalst) GLOBE http://mpegoraro.net/ (M. Pegoraro); http://www.vdaalst.com (W. M.P. v. d. Aalst) Orcid 0000-0002-8997-7517 (M. Pegoraro); 0000-0003-1115-6601 (M. S. Uysal); 0000-0002-0955-6940 (W. M.P. v. d. Aalst) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Table 1 The uncertain trace of an instance of healthcare process used as a running example. For the sake of clarity, we have further simplified the notation in the timestamps column by showing only the day of the month. Case ID Event ID Timestamp Activity Indeterminacy ID192 𝑒1 5 NightSweats ? ID192 𝑒2 8 PrTP, SecTP ID192 𝑒3 4–10 Splenomeg past, many such extensions have been proposed, to support communications, messages and signals [5], usage and performance of hardware resources [6], and privacy-preserving data transmission [7]. This paper contributes to the field of process science by describing an XES extension which allows the representation of uncertain data, enabling XES-compatible tools to manipulate uncertain logs. Furthermore, our extension is implemented through the meta- attribute structure already supported by XES, making uncertain data retroactively readable by existing tools. The remainder of the paper is structured as follows. Section 2 formally describes uncertain event data. Section 3 introduces an extension to the XES standard capable of representing uncertain event data. Lastly, Section 4 concludes the paper. 2. Uncertain Event Data In order to more clearly visualize the structure of the attributes in uncertain events, let us consider the following process instance, which is a simplified version of actually occurring anomalies, e.g., in the processes of the healthcare domain. An elderly patient enrolls in a clinical trial for an experimental treatment against myeloproliferative neoplasms, a class of blood cancers. This enrollment includes a lab exam and a visit with a specialist; then, the treatment can begin. The lab exam, performed on the 8th of July, finds a low level of platelets in the blood of the patient (event 𝑒2 ), a condition known as thrombocytopenia (TP). During the visit on the 10th of July, the patient reports an episode of night sweats on the night of the 5th of July, prior to the lab exam (event 𝑒1 ). The medic notes this but also hypothesizes that it might not be a symptom, since it can be caused either by the condition or by external factors (such as very warm weather). The medic also reads the medical records of the patient and sees that, shortly prior to the lab exam, the patient was undergoing a heparin treatment (a blood-thinning medication) to prevent blood clots. The thrombocytopenia, detected by the lab exam, can then be either primary (caused by the blood cancer) or secondary (caused by other factors, such as a concomitant condition). Finally, the medic finds an enlargement of the spleen (splenomegaly) in the patient (event 𝑒3 ). It is unclear when this condition has developed: it might have appeared at any moment prior to that point. These events are collected and recorded in the trace shown in Table 1 within the hospital’s information system. In this trace, the rightmost column refers to event indeterminacy: in this case, 𝑒1 has been recorded, but it might not have occurred in reality, and is marked with a “?” symbol. Event 𝑒2 has more than one possible activity label, either PrTP or SecTP (primary or secondary thrombocytopenia, respectively). Lastly, event 𝑒3 has an uncertain timestamp, and might have happened at any point in time between the 4th and 10th of July. These uncertain attributes do not describe the probability of the possible outcomes, and we refer to such situation as strong uncertainty. In some cases, uncertain events have probability values associated with them. In the ex- ample described above, suppose the medic estimates that there is a high chance (90%) that the thrombocytopenia is primary (caused by the cancer). Furthermore, if the splenomegaly is suspected to have developed three days prior to the visit, which takes place on the 10th of July, the timestamp of event 𝑒3 may be described through a Gaussian curve with 𝜇 = 7. When probability is available, such attributes are affected by weak uncertainty. Let us now describe a data standard extension able to represent strong and weak uncertainty, enabling the analysis of uncertain data with process science techniques. 3. An XES Standard Extension Proposal The XES standard is designed to effectively represent and transfer event data, thanks to the descriptors extended from the XML language. Additionally, XES has been designed for flexibility: its descriptors, containers, and datatypes can be extended to define new types of information. Figure 1 describes an extension of the XES standard able to represent uncertain data as described in the previous section and illustrated in the running example of Table 1. Log Probability contains Probability Density Function Value Interval Distribution 0..n key value value value contains Trace Set of Attribute Values contains Function ID xs:double xs:double xs:double value 0..n 0..n list entry xs:any_datatype Event contains 2 0..n 0..n contains orders contains contains 0..n Continuous Continuous Discrete Discrete 0..n Attribute Weak Strong Weak Strong contains Figure 1: UML diagram illustrating an extension of the XES standard capable of representing uncertain data. This proposed extension can represent all scenarios of uncertain data shown in Section 2. As a consequence, it enables XES-compliant software to import and export uncertain event data, and it allows uncertainty-aware process mining tools to implement process discovery and conformance checking approaches on uncertain data, as described in the literature. An example of a tool able to exploit this extended XES representation to manage and analyze uncertain event data is the PROVED project1 , which offers process mining and data visualization techniques capable of handling uncertain event data [8]. It is important however to emphasize the fact that the use of the extension described here is not limited to the PROVED tool. There exist multiple tools able to support the XES standard, such as ProM [9], bupaR [10], and PM4Py [11]. Each of these tools is able to edit attributes, meta-attributes and values in a XES event log, and is then capable to record uncertain attributes on process traces. In summary, while uncertainty-aware analysis techniques are only available on a narrow selection of tools (such as PROVED), this extension benefits any tool that supports XES as one of its input/output data standards. A set of synthetic uncertain event logs is publicly available for download2 . In the same folder, it is possible to find the additional document (part of the BPM Resource track submission) explaining more in detail how our extension proposal models uncertain event data3 . 4. Conclusion Recent literature in the rapidly-growing field of process mining shows how descriptions of specific data anomalies can be extracted from information systems or obtained through domain knowledge. Anomalies labeled by such descriptions characterize uncertain event data, and there exist process mining algorithms able to exploit this meta-information to gain insights about the process with a precisely bounded reliability. A fundamental part of these data analysis approaches is however needed: formats for data representation and transmission. In this paper, we described an extension of the XES data standard which enables representation of such uncertain data, and that allows uncertain event to be read and written by existing XES-compliant software. This, in turn, empowers process mining researchers and practitioners to build analysis techniques that account for data uncertainty, and that can thus be more trustworthy and reliable. Acknowledgments We thank the Alexander von Humboldt (AvH) Stiftung for supporting our research interactions. We thank and acknowledge Fabian Rempfer for his valuable input on writing style, and Majid Rafiei for his contribution to the graphics. 1 https://github.com/proved-py/ 2 https://github.com/proved-py/proved-core/tree/An_XES_Extension_for_Uncertain_Event_Data/data 3 https://github.com/proved-py/proved-core/blob/An_XES_Extension_for_Uncertain_Event_Data/data/ uncertainty_XES_standard.pdf References [1] M. Pegoraro, W. M. P. van der Aalst, Mining Uncertain Event Data in Process Mining, in: International Conference on Process Mining, ICPM 2019, Aachen, Germany, June 24-26, 2019, IEEE, 2019, pp. 89–96. doi:1 0 . 1 1 0 9 / I C P M . 2 0 1 9 . 0 0 0 2 3 . [2] M. Pegoraro, M. S. Uysal, W. M. P. van der Aalst, Conformance checking over uncertain event data, Information Systems 102 (2021) 101810. URL: https://www.sciencedirect.com/ science/article/pii/S0306437921000582. doi:h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / j . i s . 2 0 2 1 . 1 0 1 8 1 0 . [3] H. M. W. Verbeek, J. C. A. M. Buijs, B. F. van Dongen, W. M. P. van der Aalst, XES, XESame, and ProM 6, in: P. Soffer, E. Proper (Eds.), Information Systems Evolution - CAiSE Forum 2010, Hammamet, Tunisia, June 7-9, 2010, Selected Extended Papers, volume 72 of Lecture Notes in Business Information Processing, Springer, 2010, pp. 60–75. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 6 4 2 - 1 7 7 2 2 - 4 \ _ 5 . [4] W. M. P. van der Aalst, C. Günther, J. Bose, J. Carmona, M. Dumas, F. van Geffen, S. Goel, A. Guzzo, R. Khalaf, R. Kuhn, et al., 1849–2016—IEEE Standard for eXtensible Event Stream (XES) for achieving interoperability in event logs and event streams, IEEE Std 1849TM-2016, 2016. URL: http://hdl.handle.net/2117/341493. doi:1 0 . 1 1 0 9 / I E E E S T D . 2 0 1 6 . 7 7 4 0 8 5 8 . [5] M. Leemans, C. Liu, XES Software Communication Extension, XES Working Group (2017) 1–5. [6] M. Leemans, C. Liu, XES Software Telemetry Extension, XES Working Group (2017) 1–7. [7] M. Rafiei, W. M. P. van der Aalst, Privacy-preserving data publishing in process mining, in: D. Fahland, C. Ghidini, J. Becker, M. Dumas (Eds.), Business Process Management Forum - BPM Forum 2020, Seville, Spain, September 13-18, 2020, Proceedings, volume 392 of Lecture Notes in Business Information Processing, Springer, 2020, pp. 122–138. doi:1 0 . 1007/978- 3- 030- 58638- 6\_8. [8] M. Pegoraro, M. S. Uysal, W. M. P. van der Aalst, PROVED: A Tool for Graph Representation and Analysis of Uncertain Event Data, in: D. Buchs, J. Carmona (Eds.), Application and Theory of Petri Nets and Concurrency, Springer International Publishing, Cham, 2021, pp. 476–486. [9] B. F. van Dongen, A. K. A. de Medeiros, H. M. W. Verbeek, A. J. M. M. Weijters, W. M. P. van der Aalst, The ProM framework: A new era in process mining tool support, in: G. Cia- rdo, P. Darondeau (Eds.), Applications and Theory of Petri Nets 2005, 26th International Conference, ICATPN 2005, Miami, USA, June 20-25, 2005, Proceedings, volume 3536 of Lecture Notes in Computer Science, Springer, 2005, pp. 444–454. doi:1 0 . 1 0 0 7 / 1 1 4 9 4 7 4 4 \ _ 2 5 . [10] G. Janssenswillen, B. Depaire, bupaR: Business process analysis in R, in: R. Clarisó, H. Leopold, J. Mendling, W. M. P. van der Aalst, A. Kumar, B. T. Pentland, M. Weske (Eds.), Proceedings of the 15th International Conference on Business Process Management (BPM 2017), Barcelona, Spain, September 13, 2017, volume 1920 of CEUR Workshop Proceedings, CEUR-WS.org, 2017. URL: http://ceur-ws.org/Vol-1920/BPM_2017_paper_193.pdf. [11] A. Berti, S. J. van Zelst, W. M. P. van der Aalst, Process Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Science, in: ICPM Demo Track (CEUR 2374), 2019, p. 13–16.