=Paper= {{Paper |id=Vol-2973/paper_273 |storemode=property |title=An XES Extension for Uncertain Event Data |pdfUrl=https://ceur-ws.org/Vol-2973/paper_273.pdf |volume=Vol-2973 |authors=Marco Pegoraro,Merih Seran Uysal,Wil M.P. van der Aalst |dblpUrl=https://dblp.org/rec/conf/bpm/PegoraroUA21 }} ==An XES Extension for Uncertain Event Data== https://ceur-ws.org/Vol-2973/paper_273.pdf
An XES Extension for Uncertain Event Data
Marco Pegoraro1,2 , Merih Seran Uysal1 and Wil M.P. van der Aalst1
1
  Chair of Process and Data Science (PADS), Department of Computer Science, RWTH Aachen University, Aachen,
Germany
2
  Corresponding author.


                                         Abstract
                                         Event data, often stored in the form of event logs, serve as the starting point for process mining and other
                                         evidence-based process improvements. However, event data in logs are often tainted by noise, errors,
                                         and missing data. Recently, a novel body of research has emerged, with the aim to address and analyze a
                                         class of anomalies known as uncertainty—imprecisions quantified with meta-information in the event
                                         log. This paper illustrates an extension of the XES data standard capable of representing uncertain event
                                         data. Such an extension enables input, output, and manipulation of uncertain data, as well as analysis
                                         through the process discovery and conformance checking approaches available in literature.

                                         Keywords
                                         Event Data, Uncertainty, XES Standard, Process Mining, Business Process Management




1. Introduction
Through the last decades, the increase in the availability of data generated by the execution
of processes has enabled the development of the set of disciplines known as process sciences.
These fields of science aim to analyze data accounting for the process perspective—the flow of
events belonging to a process case.
   Uncertain event data is a newly-emerging class of anomalous event data. Uncertain data
consists of events that have been logged with a quantified measure of uncertainty affecting
the recorded information. Sources of uncertainty include noise, human error, or limitations of
the information system supporting the process. Such imprecisions affecting the event data are
either recorded in an information system with the data itself or reconstructed in a subsequent
processing step, often with the aid of domain knowledge provided by process experts. Recently,
the possible types of uncertain data have been classified in a taxonomy, and effective process
mining algorithms for uncertain event data have been introduced [1, 2]. However, the data
standards currently in use within the process science community do not support uncertain
event logs. A very popular event data standard is XES (eXtensible Event Stream) [3, 4]. As the
name suggest, this standard has been designed to flexibly allow for extensions; in the recent
Proceedings of the Demonstration & Resources Track, Best BPM Dissertation Award, and Doctoral Consortium at BPM
2021 co-located with the 19th International Conference on Business Process Management, BPM 2021, Rome, Italy,
September 6-10, 2021
Envelope-Open pegoraro@pads.rwth-aachen.de (M. Pegoraro); uysal@pads.rwth-aachen.de (M. S. Uysal);
wvdaalst@pads.rwth-aachen.de (W. M.P. v. d. Aalst)
GLOBE http://mpegoraro.net/ (M. Pegoraro); http://www.vdaalst.com (W. M.P. v. d. Aalst)
Orcid 0000-0002-8997-7517 (M. Pegoraro); 0000-0003-1115-6601 (M. S. Uysal); 0000-0002-0955-6940 (W. M.P. v. d. Aalst)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
Table 1
The uncertain trace of an instance of healthcare process used as a running example. For the sake of
clarity, we have further simplified the notation in the timestamps column by showing only the day of
the month.
                Case ID Event ID Timestamp                 Activity    Indeterminacy
                  ID192         𝑒1             5         NightSweats         ?
                  ID192         𝑒2             8         PrTP, SecTP
                  ID192         𝑒3           4–10         Splenomeg


past, many such extensions have been proposed, to support communications, messages and
signals [5], usage and performance of hardware resources [6], and privacy-preserving data
transmission [7]. This paper contributes to the field of process science by describing an XES
extension which allows the representation of uncertain data, enabling XES-compatible tools
to manipulate uncertain logs. Furthermore, our extension is implemented through the meta-
attribute structure already supported by XES, making uncertain data retroactively readable by
existing tools.
   The remainder of the paper is structured as follows. Section 2 formally describes uncertain
event data. Section 3 introduces an extension to the XES standard capable of representing
uncertain event data. Lastly, Section 4 concludes the paper.


2. Uncertain Event Data
In order to more clearly visualize the structure of the attributes in uncertain events, let us
consider the following process instance, which is a simplified version of actually occurring
anomalies, e.g., in the processes of the healthcare domain. An elderly patient enrolls in a
clinical trial for an experimental treatment against myeloproliferative neoplasms, a class of
blood cancers. This enrollment includes a lab exam and a visit with a specialist; then, the
treatment can begin. The lab exam, performed on the 8th of July, finds a low level of platelets
in the blood of the patient (event 𝑒2 ), a condition known as thrombocytopenia (TP). During the
visit on the 10th of July, the patient reports an episode of night sweats on the night of the 5th of
July, prior to the lab exam (event 𝑒1 ). The medic notes this but also hypothesizes that it might
not be a symptom, since it can be caused either by the condition or by external factors (such
as very warm weather). The medic also reads the medical records of the patient and sees that,
shortly prior to the lab exam, the patient was undergoing a heparin treatment (a blood-thinning
medication) to prevent blood clots. The thrombocytopenia, detected by the lab exam, can then
be either primary (caused by the blood cancer) or secondary (caused by other factors, such as a
concomitant condition). Finally, the medic finds an enlargement of the spleen (splenomegaly) in
the patient (event 𝑒3 ). It is unclear when this condition has developed: it might have appeared
at any moment prior to that point. These events are collected and recorded in the trace shown
in Table 1 within the hospital’s information system.
   In this trace, the rightmost column refers to event indeterminacy: in this case, 𝑒1 has been
recorded, but it might not have occurred in reality, and is marked with a “?” symbol. Event
𝑒2 has more than one possible activity label, either PrTP or SecTP (primary or secondary
thrombocytopenia, respectively). Lastly, event 𝑒3 has an uncertain timestamp, and might have
happened at any point in time between the 4th and 10th of July. These uncertain attributes do
not describe the probability of the possible outcomes, and we refer to such situation as strong
uncertainty.
   In some cases, uncertain events have probability values associated with them. In the ex-
ample described above, suppose the medic estimates that there is a high chance (90%) that
the thrombocytopenia is primary (caused by the cancer). Furthermore, if the splenomegaly
is suspected to have developed three days prior to the visit, which takes place on the 10th of
July, the timestamp of event 𝑒3 may be described through a Gaussian curve with 𝜇 = 7. When
probability is available, such attributes are affected by weak uncertainty.
   Let us now describe a data standard extension able to represent strong and weak uncertainty,
enabling the analysis of uncertain data with process science techniques.


3. An XES Standard Extension Proposal
The XES standard is designed to effectively represent and transfer event data, thanks to the
descriptors extended from the XML language. Additionally, XES has been designed for flexibility:
its descriptors, containers, and datatypes can be extended to define new types of information.
   Figure 1 describes an extension of the XES standard able to represent uncertain data as
described in the previous section and illustrated in the running example of Table 1.



                                                                                                    Log
                                                     Probability
                                                                                              contains
     Probability Density Function   Value Interval   Distribution
                                                                                                          0..n
        key             value           value           value

                                                                                                                  contains
                                                                                                 Trace
                                                                    Set of Attribute Values
                                                                                               contains




     Function ID     xs:double        xs:double       xs:double             value
                        0..n                                                                              0..n

                                         list           entry          xs:any_datatype           Event
                     contains
                                            2              0..n        0..n
                                                                                               contains




                                         orders         contains             contains

                                                                                                          0..n
           Continuous                Continuous       Discrete             Discrete                              0..n
                                                                                              Attribute
             Weak                      Strong          Weak                 Strong

                                                                                                                  contains




Figure 1: UML diagram illustrating an extension of the XES standard capable of representing uncertain
data.

  This proposed extension can represent all scenarios of uncertain data shown in Section 2.
As a consequence, it enables XES-compliant software to import and export uncertain event
data, and it allows uncertainty-aware process mining tools to implement process discovery and
conformance checking approaches on uncertain data, as described in the literature.
   An example of a tool able to exploit this extended XES representation to manage and analyze
uncertain event data is the PROVED project1 , which offers process mining and data visualization
techniques capable of handling uncertain event data [8].
   It is important however to emphasize the fact that the use of the extension described here is
not limited to the PROVED tool. There exist multiple tools able to support the XES standard,
such as ProM [9], bupaR [10], and PM4Py [11]. Each of these tools is able to edit attributes,
meta-attributes and values in a XES event log, and is then capable to record uncertain attributes
on process traces. In summary, while uncertainty-aware analysis techniques are only available
on a narrow selection of tools (such as PROVED), this extension benefits any tool that supports
XES as one of its input/output data standards.
   A set of synthetic uncertain event logs is publicly available for download2 . In the same folder,
it is possible to find the additional document (part of the BPM Resource track submission)
explaining more in detail how our extension proposal models uncertain event data3 .


4. Conclusion
Recent literature in the rapidly-growing field of process mining shows how descriptions of
specific data anomalies can be extracted from information systems or obtained through domain
knowledge. Anomalies labeled by such descriptions characterize uncertain event data, and
there exist process mining algorithms able to exploit this meta-information to gain insights
about the process with a precisely bounded reliability. A fundamental part of these data analysis
approaches is however needed: formats for data representation and transmission. In this
paper, we described an extension of the XES data standard which enables representation of such
uncertain data, and that allows uncertain event to be read and written by existing XES-compliant
software. This, in turn, empowers process mining researchers and practitioners to build analysis
techniques that account for data uncertainty, and that can thus be more trustworthy and reliable.


Acknowledgments
We thank the Alexander von Humboldt (AvH) Stiftung for supporting our research interactions.
We thank and acknowledge Fabian Rempfer for his valuable input on writing style, and Majid
Rafiei for his contribution to the graphics.




    1
      https://github.com/proved-py/
    2
      https://github.com/proved-py/proved-core/tree/An_XES_Extension_for_Uncertain_Event_Data/data
    3
      https://github.com/proved-py/proved-core/blob/An_XES_Extension_for_Uncertain_Event_Data/data/
uncertainty_XES_standard.pdf
References
 [1] M. Pegoraro, W. M. P. van der Aalst, Mining Uncertain Event Data in Process Mining, in:
     International Conference on Process Mining, ICPM 2019, Aachen, Germany, June 24-26,
     2019, IEEE, 2019, pp. 89–96. doi:1 0 . 1 1 0 9 / I C P M . 2 0 1 9 . 0 0 0 2 3 .
 [2] M. Pegoraro, M. S. Uysal, W. M. P. van der Aalst, Conformance checking over uncertain
     event data, Information Systems 102 (2021) 101810. URL: https://www.sciencedirect.com/
     science/article/pii/S0306437921000582. doi:h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / j . i s . 2 0 2 1 . 1 0 1 8 1 0 .
 [3] H. M. W. Verbeek, J. C. A. M. Buijs, B. F. van Dongen, W. M. P. van der Aalst, XES,
     XESame, and ProM 6, in: P. Soffer, E. Proper (Eds.), Information Systems Evolution
     - CAiSE Forum 2010, Hammamet, Tunisia, June 7-9, 2010, Selected Extended Papers,
     volume 72 of Lecture Notes in Business Information Processing, Springer, 2010, pp. 60–75.
     doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 6 4 2 - 1 7 7 2 2 - 4 \ _ 5 .
 [4] W. M. P. van der Aalst, C. Günther, J. Bose, J. Carmona, M. Dumas, F. van Geffen, S. Goel,
     A. Guzzo, R. Khalaf, R. Kuhn, et al., 1849–2016—IEEE Standard for eXtensible Event Stream
     (XES) for achieving interoperability in event logs and event streams, IEEE Std 1849TM-2016,
     2016. URL: http://hdl.handle.net/2117/341493. doi:1 0 . 1 1 0 9 / I E E E S T D . 2 0 1 6 . 7 7 4 0 8 5 8 .
 [5] M. Leemans, C. Liu, XES Software Communication Extension, XES Working Group (2017)
     1–5.
 [6] M. Leemans, C. Liu, XES Software Telemetry Extension, XES Working Group (2017) 1–7.
 [7] M. Rafiei, W. M. P. van der Aalst, Privacy-preserving data publishing in process mining,
     in: D. Fahland, C. Ghidini, J. Becker, M. Dumas (Eds.), Business Process Management
     Forum - BPM Forum 2020, Seville, Spain, September 13-18, 2020, Proceedings, volume
     392 of Lecture Notes in Business Information Processing, Springer, 2020, pp. 122–138. doi:1 0 .
     1007/978- 3- 030- 58638- 6\_8.
 [8] M. Pegoraro, M. S. Uysal, W. M. P. van der Aalst, PROVED: A Tool for Graph Representation
     and Analysis of Uncertain Event Data, in: D. Buchs, J. Carmona (Eds.), Application and
     Theory of Petri Nets and Concurrency, Springer International Publishing, Cham, 2021, pp.
     476–486.
 [9] B. F. van Dongen, A. K. A. de Medeiros, H. M. W. Verbeek, A. J. M. M. Weijters, W. M. P.
     van der Aalst, The ProM framework: A new era in process mining tool support, in: G. Cia-
     rdo, P. Darondeau (Eds.), Applications and Theory of Petri Nets 2005, 26th International
     Conference, ICATPN 2005, Miami, USA, June 20-25, 2005, Proceedings, volume 3536 of
     Lecture Notes in Computer Science, Springer, 2005, pp. 444–454. doi:1 0 . 1 0 0 7 / 1 1 4 9 4 7 4 4 \ _ 2 5 .
[10] G. Janssenswillen, B. Depaire, bupaR: Business process analysis in R, in: R. Clarisó,
     H. Leopold, J. Mendling, W. M. P. van der Aalst, A. Kumar, B. T. Pentland, M. Weske (Eds.),
     Proceedings of the 15th International Conference on Business Process Management (BPM
     2017), Barcelona, Spain, September 13, 2017, volume 1920 of CEUR Workshop Proceedings,
     CEUR-WS.org, 2017. URL: http://ceur-ws.org/Vol-1920/BPM_2017_paper_193.pdf.
[11] A. Berti, S. J. van Zelst, W. M. P. van der Aalst, Process Mining for Python (PM4Py):
     Bridging the Gap Between Process- and Data Science, in: ICPM Demo Track (CEUR 2374),
     2019, p. 13–16.