<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>XES Extension for Uncertain Event Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Pegoraro</string-name>
          <email>pegoraro@pads.rwth-aachen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Merih Seran Uysal</string-name>
          <email>uysal@pads.rwth-aachen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wil M.P. van der Aalst</string-name>
          <email>wvdaalst@pads.rwth-aachen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair of Process and Data Science (PADS), Department of Computer Science, RWTH Aachen University</institution>
          ,
          <addr-line>Aachen</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Event Data</institution>
          ,
          <addr-line>Uncertainty, XES Standard, Process Mining, Business Process Management</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Event data, often stored in the form of event logs, serve as the starting point for process mining and other evidence-based process improvements. However, event data in logs are often tainted by noise, errors, and missing data. Recently, a novel body of research has emerged, with the aim to address and analyze a class of anomalies known as uncertainty-imprecisions quantified with meta-information in the event log. This paper illustrates an extension of the XES data standard capable of representing uncertain event data. Such an extension enables input, output, and manipulation of uncertain data, as well as analysis through the process discovery and conformance checking approaches available in literature.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Through the last decades, the increase in the availability of data generated by the execution
of processes has enabled the development of the set of disciplines known as process sciences.
These fields of science aim to analyze data accounting for the process perspective—the flow of
events belonging to a process case.</p>
      <p>
        Uncertain event data is a newly-emerging class of anomalous event data. Uncertain data
consists of events that have been logged with a quantified measure of uncertainty afecting
the recorded information. Sources of uncertainty include noise, human error, or limitations of
the information system supporting the process. Such imprecisions afecting the event data are
either recorded in an information system with the data itself or reconstructed in a subsequent
processing step, often with the aid of domain knowledge provided by process experts. Recently,
the possible types of uncertain data have been classified in a taxonomy, and efective process
mining algorithms for uncertain event data have been introduced [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. However, the data
standards currently in use within the process science community do not support uncertain
event logs. A very popular event data standard is XES (eXtensible Event Stream) [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. As the
name suggest, this standard has been designed to flexibly allow for extensions; in the recent
LGOBE
September 6-10, 2021
http://mpegoraro.net/ (M. Pegoraro); http://www.vdaalst.com (W. M.P. v. d. Aalst)
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
past, many such extensions have been proposed, to support communications, messages and
signals [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], usage and performance of hardware resources [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and privacy-preserving data
transmission [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This paper contributes to the field of process science by describing an XES
extension which allows the representation of uncertain data, enabling XES-compatible tools
to manipulate uncertain logs. Furthermore, our extension is implemented through the
metaattribute structure already supported by XES, making uncertain data retroactively readable by
existing tools.
      </p>
      <p>The remainder of the paper is structured as follows. Section 2 formally describes uncertain
event data. Section 3 introduces an extension to the XES standard capable of representing
uncertain event data. Lastly, Section 4 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Uncertain Event Data</title>
      <p>In order to more clearly visualize the structure of the attributes in uncertain events, let us
consider the following process instance, which is a simplified version of actually occurring
anomalies, e.g., in the processes of the healthcare domain. An elderly patient enrolls in a
clinical trial for an experimental treatment against myeloproliferative neoplasms, a class of
blood cancers. This enrollment includes a lab exam and a visit with a specialist; then, the
treatment can begin. The lab exam, performed on the 8th of July, finds a low level of platelets
in the blood of the patient (event  2), a condition known as thrombocytopenia (TP). During the
visit on the 10th of July, the patient reports an episode of night sweats on the night of the 5th of
July, prior to the lab exam (event  1). The medic notes this but also hypothesizes that it might
not be a symptom, since it can be caused either by the condition or by external factors (such
as very warm weather). The medic also reads the medical records of the patient and sees that,
shortly prior to the lab exam, the patient was undergoing a heparin treatment (a blood-thinning
medication) to prevent blood clots. The thrombocytopenia, detected by the lab exam, can then
be either primary (caused by the blood cancer) or secondary (caused by other factors, such as a
concomitant condition). Finally, the medic finds an enlargement of the spleen (splenomegaly) in
the patient (event  3). It is unclear when this condition has developed: it might have appeared
at any moment prior to that point. These events are collected and recorded in the trace shown
in Table 1 within the hospital’s information system.</p>
      <p>In this trace, the rightmost column refers to event indeterminacy: in this case,  1 has been
recorded, but it might not have occurred in reality, and is marked with a “?” symbol. Event
 2 has more than one possible activity label, either PrTP or SecTP (primary or secondary
thrombocytopenia, respectively). Lastly, event  3 has an uncertain timestamp, and might have
happened at any point in time between the 4th and 10th of July. These uncertain attributes do
not describe the probability of the possible outcomes, and we refer to such situation as strong
uncertainty.</p>
      <p>In some cases, uncertain events have probability values associated with them. In the
example described above, suppose the medic estimates that there is a high chance (90%) that
the thrombocytopenia is primary (caused by the cancer). Furthermore, if the splenomegaly
is suspected to have developed three days prior to the visit, which takes place on the 10th of
July, the timestamp of event  3 may be described through a Gaussian curve with  = 7 . When
probability is available, such attributes are afected by weak uncertainty.</p>
      <p>Let us now describe a data standard extension able to represent strong and weak uncertainty,
enabling the analysis of uncertain data with process science techniques.</p>
    </sec>
    <sec id="sec-3">
      <title>3. An XES Standard Extension Proposal</title>
      <p>The XES standard is designed to efectively represent and transfer event data, thanks to the
descriptors extended from the XML language. Additionally, XES has been designed for flexibility:
its descriptors, containers, and datatypes can be extended to define new types of information.</p>
      <p>Figure 1 describes an extension of the XES standard able to represent uncertain data as
described in the previous section and illustrated in the running example of Table 1.</p>
      <p>This proposed extension can represent all scenarios of uncertain data shown in Section 2.
As a consequence, it enables XES-compliant software to import and export uncertain event</p>
      <sec id="sec-3-1">
        <title>Probability</title>
      </sec>
      <sec id="sec-3-2">
        <title>Distribution</title>
        <p>value
entry</p>
        <p>0..n
contains</p>
      </sec>
      <sec id="sec-3-3">
        <title>Discrete</title>
      </sec>
      <sec id="sec-3-4">
        <title>Weak</title>
      </sec>
      <sec id="sec-3-5">
        <title>Set of Attribute Values</title>
        <p>value
xs:any_datatype
0..n</p>
        <p>contains</p>
      </sec>
      <sec id="sec-3-6">
        <title>Discrete</title>
      </sec>
      <sec id="sec-3-7">
        <title>Strong</title>
        <p>Log
c
o
n
t
a
i
n
s 0..n
Trace tcno
cno isan
t
a
i
n
s 0..n
Event
c
o
n
t
a
i
sn 0..n
Attribute 0..n
contains</p>
      </sec>
      <sec id="sec-3-8">
        <title>Probability Density Function</title>
      </sec>
      <sec id="sec-3-9">
        <title>Value Interval key value value</title>
      </sec>
      <sec id="sec-3-10">
        <title>Function ID</title>
        <p>xs:double
xs:double
xs:double
0..n
contains</p>
      </sec>
      <sec id="sec-3-11">
        <title>Continuous</title>
      </sec>
      <sec id="sec-3-12">
        <title>Weak</title>
        <p>list</p>
        <p>2
orders</p>
      </sec>
      <sec id="sec-3-13">
        <title>Continuous</title>
      </sec>
      <sec id="sec-3-14">
        <title>Strong</title>
        <p>data, and it allows uncertainty-aware process mining tools to implement process discovery and
conformance checking approaches on uncertain data, as described in the literature.</p>
        <p>
          An example of a tool able to exploit this extended XES representation to manage and analyze
uncertain event data is the PROVED project1, which ofers process mining and data visualization
techniques capable of handling uncertain event data [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>
          It is important however to emphasize the fact that the use of the extension described here is
not limited to the PROVED tool. There exist multiple tools able to support the XES standard,
such as ProM [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], bupaR [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], and PM4Py [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Each of these tools is able to edit attributes,
meta-attributes and values in a XES event log, and is then capable to record uncertain attributes
on process traces. In summary, while uncertainty-aware analysis techniques are only available
on a narrow selection of tools (such as PROVED), this extension benefits any tool that supports
XES as one of its input/output data standards.
        </p>
        <p>A set of synthetic uncertain event logs is publicly available for download2. In the same folder,
it is possible to find the additional document (part of the BPM Resource track submission)
explaining more in detail how our extension proposal models uncertain event data3.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>Recent literature in the rapidly-growing field of process mining shows how descriptions of
specific data anomalies can be extracted from information systems or obtained through domain
knowledge. Anomalies labeled by such descriptions characterize uncertain event data, and
there exist process mining algorithms able to exploit this meta-information to gain insights
about the process with a precisely bounded reliability. A fundamental part of these data analysis
approaches is however needed: formats for data representation and transmission. In this
paper, we described an extension of the XES data standard which enables representation of such
uncertain data, and that allows uncertain event to be read and written by existing XES-compliant
software. This, in turn, empowers process mining researchers and practitioners to build analysis
techniques that account for data uncertainty, and that can thus be more trustworthy and reliable.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We thank the Alexander von Humboldt (AvH) Stiftung for supporting our research interactions.
We thank and acknowledge Fabian Rempfer for his valuable input on writing style, and Majid
Rafiei for his contribution to the graphics.</p>
      <p>1https://github.com/proved-py/
2https://github.com/proved-py/proved-core/tree/An_XES_Extension_for_Uncertain_Event_Data/data
3https://github.com/proved-py/proved-core/blob/An_XES_Extension_for_Uncertain_Event_Data/data/
uncertainty_XES_standard.pdf</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pegoraro</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Mining Uncertain Event Data in Process Mining</article-title>
          , in: International Conference on Process Mining,
          <string-name>
            <surname>ICPM</surname>
          </string-name>
          <year>2019</year>
          , Aachen, Germany, June 24-26,
          <year>2019</year>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>89</fpage>
          -
          <lpage>96</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>0</volume>
          <fpage>9</fpage>
          <string-name>
            <surname>/ I C P M .</surname>
          </string-name>
          <article-title>2 0 1 9 . 0 0 0 2 3</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pegoraro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Uysal</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Conformance checking over uncertain event data</article-title>
          ,
          <source>Information Systems</source>
          <volume>102</volume>
          (
          <year>2021</year>
          )
          <article-title>101810</article-title>
          . URL: https://www.sciencedirect.com/ science/article/pii/S0306437921000582. doi:h t t p s : / / d o i .
          <source>o r g / 1 0 . 1 0 1 6 / j . i s . 2 0</source>
          <volume>2 1 . 1 0 1 8 1 0 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H. M. W.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C. A. M.</given-names>
            <surname>Buijs</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. F. van Dongen</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , XES, XESame, and
          <article-title>ProM 6</article-title>
          , in: P.
          <string-name>
            <surname>Sofer</surname>
          </string-name>
          , E. Proper (Eds.),
          <source>Information Systems Evolution - CAiSE Forum</source>
          <year>2010</year>
          , Hammamet, Tunisia, June 7-9,
          <year>2010</year>
          , Selected Extended Papers, volume
          <volume>72</volume>
          <source>of Lecture Notes in Business Information Processing</source>
          , Springer,
          <year>2010</year>
          , pp.
          <fpage>60</fpage>
          -
          <lpage>75</lpage>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 6 4 2 - 1 7 7 2 2 - 4</volume>
          \ _ 5 .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
            , C. Günther,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bose</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Carmona</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dumas</surname>
          </string-name>
          , F. van
          <string-name>
            <surname>Gefen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Goel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Guzzo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Khalaf</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kuhn</surname>
          </string-name>
          , et al.,
          <year>1849</year>
          -2016
          <article-title>-IEEE Standard for eXtensible Event Stream (XES) for achieving interoperability in event logs and event streams</article-title>
          ,
          <source>IEEE Std 1849TM-2016</source>
          ,
          <year>2016</year>
          . URL: http://hdl.handle.
          <source>net/2117/341493. doi:1 0 . 1 1</source>
          <volume>0</volume>
          <fpage>9</fpage>
          <string-name>
            <surname>/ I E E E S T D</surname>
          </string-name>
          .
          <volume>2 0 1 6 . 7 7 4 0 8 5 8 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Leemans</surname>
          </string-name>
          , C. Liu, XES Software Communication Extension, XES Working Group (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Leemans</surname>
          </string-name>
          , C. Liu, XES Software Telemetry Extension, XES Working Group (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rafiei</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Privacy-preserving data publishing in process mining</article-title>
          , in: D.
          <string-name>
            <surname>Fahland</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Ghidini</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Becker</surname>
          </string-name>
          , M. Dumas (Eds.),
          <source>Business Process Management Forum - BPM Forum</source>
          <year>2020</year>
          , Seville, Spain,
          <source>September 13-18</source>
          ,
          <year>2020</year>
          , Proceedings, volume
          <volume>392</volume>
          <source>of Lecture Notes in Business Information Processing</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>122</fpage>
          -
          <lpage>138</lpage>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 0 3 0 - 5 8 6 3 8 - 6</volume>
          \ _ 8 .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pegoraro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Uysal</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>PROVED: A Tool for Graph Representation and Analysis of Uncertain Event Data</article-title>
          , in: D.
          <string-name>
            <surname>Buchs</surname>
          </string-name>
          , J. Carmona (Eds.),
          <source>Application and Theory of Petri Nets and Concurrency</source>
          , Springer International Publishing, Cham,
          <year>2021</year>
          , pp.
          <fpage>476</fpage>
          -
          <lpage>486</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B. F. van Dongen</given-names>
            ,
            <surname>A. K. A. de Medeiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M. W.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. J. M. M. Weijters</surname>
            ,
            <given-names>W. M. P. van der Aalst</given-names>
          </string-name>
          ,
          <article-title>The ProM framework: A new era in process mining tool support</article-title>
          , in: G. Ciardo, P. Darondeau (Eds.),
          <source>Applications and Theory of Petri Nets</source>
          <year>2005</year>
          , 26th International Conference, ICATPN 2005,
          <article-title>Miami</article-title>
          , USA, June 20-25,
          <year>2005</year>
          , Proceedings, volume
          <volume>3536</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2005</year>
          , pp.
          <fpage>444</fpage>
          -
          <lpage>454</lpage>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 1 1 4 9 4 7 4 4 \ _ 2</volume>
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Janssenswillen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Depaire</surname>
          </string-name>
          , bupaR: Business process analysis in R, in: R. Clarisó,
          <string-name>
            <given-names>H.</given-names>
            <surname>Leopold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mendling</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>B. T.</given-names>
          </string-name>
          <string-name>
            <surname>Pentland</surname>
          </string-name>
          , M. Weske (Eds.),
          <source>Proceedings of the 15th International Conference on Business Process Management (BPM</source>
          <year>2017</year>
          ), Barcelona, Spain,
          <year>September 13</year>
          ,
          <year>2017</year>
          , volume
          <volume>1920</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2017</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-1920/BPM_2017_paper_193.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J. van Zelst</given-names>
            ,
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Process Mining for Python (PM4Py): Bridging the Gap Between Process-</article-title>
          and
          <string-name>
            <surname>Data Science</surname>
          </string-name>
          ,
          <source>in: ICPM Demo Track (CEUR 2374)</source>
          ,
          <year>2019</year>
          , p.
          <fpage>13</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>