=Paper= {{Paper |id=Vol-3098/dc_176 |storemode=property |title=Process Mining on Uncertain Event Data (Extended Abstract) |pdfUrl=https://ceur-ws.org/Vol-3098/dc_176.pdf |volume=Vol-3098 |authors=Marco Pegoraro |dblpUrl=https://dblp.org/rec/conf/icpm/000121 }} ==Process Mining on Uncertain Event Data (Extended Abstract)== https://ceur-ws.org/Vol-3098/dc_176.pdf
              Process Mining on Uncertain Event Data
                       (Extended Abstract)
                                                          Marco Pegoraro*
                                       Chair of Process and Data Science (PADS)
             Department of Computer Science, RWTH Aachen University, Ahornstr. 55, 52074 Aachen, Germany
                              *Corresponding author. Email: pegoraro@pads.rwth-aachen.de


   Abstract—With the widespread adoption of process mining in         symptom, since it can be caused either by the condition or by
organizations, the field of process science is seeing an increase     external factors (such as very warm weather). The medic also
in the demand for ad-hoc analysis techniques of non-standard          reads the medical records of the patient and sees that, shortly
event data. An example of such data are uncertain event data:
events characterized by a described and quantified attribute          prior to the lab exam, the patient was undergoing a heparin
imprecision. This paper outlines a research project aimed at          treatment (a blood-thinning medication) to prevent blood clots.
developing process mining techniques able to extract insights         The thrombocytopenia, detected by the lab exam, can then
from uncertain data. We set the basis for this research topic,        be either primary (caused by the blood cancer) or secondary
recapitulate the available literature, and define a future outlook.   (caused by other factors, such as a concomitant condition).
                                                                      Finally, the medic finds an enlargement of the spleen in the
                       I. I NTRODUCTION                               patient (splenomegaly). It is unclear when this condition has
   Since its inception, process mining has ultimately proved its      developed: it might have appeared at any moment prior to that
value in commercial applications. An ever-increasing number           point. These events are recorded in the trace ID192-1 (shown
of success stories has led to a vast demand of the most               in Table I) within the hospital’s information system.
diverse process analysis techniques, often customized to meet            Such scenario, with no known probability, is known as
the needs of specific domains. Among these, novel techniques          strong uncertainty. In this trace, the rightmost column refers
have been introduced to mine non-standard types of data.              to event indeterminacy: in this case, e1 has been recorded,
   This paper presents a research direction aimed to mine             but it might not have occurred in reality, and is marked with
one such type of anomalous (i.e, uncommon) type of event              a “?” symbol. Event e2 has more then one possible activity
data: uncertain data. Such data is associated with a degree of        labels, either PrTP or SecTP. Lastly, event e3 has an uncertain
imprecision that affects event attributes, which is described and     timestamp, and might have happened at any point in time
quantified through sets of possible attribute labels, intervals of    between the 4th and 10th of July.
possible values, or probability distributions.                           Uncertain events may also have probability values asso-
   The remainder of the paper is structured as follows. Sec-          ciated with them, a scenario defined as weak uncertainty
tion II illustrates with examples the structure of uncertain event    (trace ID192-2 in Table I). In the example described above,
data. Section III shows the research principles in regard of          suppose the medic estimates that there is a high chance
process mining on uncertain data, and reports recent results          (90%) that the thrombocytopenia is primary (caused by the
on the topic. Finally, Section IV outlines open challenges,           cancer). Furthermore, if the splenomegaly is suspected to have
outlook, and future perspectives of this line of research.            developed three days prior to the visit, which takes place on
                                                                      the 10th of July, the timestamp of event e3 may be described
                     II. U NCERTAIN DATA                              through a Gaussian curve with µ = 7. Lastly, the probability
   In order to more clearly visualize the structure of the            that the event e1 has been recorded but did not occur in reality
attributes in uncertain events, let us consider the following         may be known (for example, it may be 25%).
process instance, which is a simplified version of actually
                                                                      TABLE I: Two uncertain traces related to an example of
occurring anomalies, e.g., in the processes of the healthcare
                                                                      healthcare process. The timestamps column shows only the
domain. An elderly patient enrolls in a clinical trial for an
                                                                      day of the month.
experimental treatment against myeloproliferative neoplasms,
a class of blood cancers. This enrollment includes a lab exam          Case ID   Event ID   Timestamp      Activity    Indeterminacy
                                                                       ID192-1      e1          5        NightSweats         ?
and a visit with a specialist; then, the treatment can begin.          ID192-1      e2          8        PrTP, SecTP
The lab exam, performed on the 8th of July, finds a low                ID192-1      e3         4–10       Splenomeg
level of platelets in the blood of the patient, a condition            ID192-2      e4          5        NightSweats      ? : 25%
known as thrombocytopenia (TP). During the visit on the                                                  PrTP: 90%,
                                                                       ID192-2      e5           8
                                                                                                         SecTP: 10%
10th of July, the patient reports an episode of night sweats           ID192-2      e6        N (7, 1)    Splenomeg
on the night of the 5th of July, prior to the lab exam. The
medic notes this but also hypothesizes that it might not be a           Table II summarizes the types of uncertain data subject of

  Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0
  International (CC BY 4.0).
our research.                                                                                                              Process


                          Weak uncertainty                             Strong uncertainty
                              (stochastic)                             (non-deterministic)
                                                                                                                              Records
                    Discrete probability distribution

                        50                                                                                                                                     Process

                        40
                                                                      Set of possible values        Agents                                      Graph
                                                                                                                                                              discovery

   Discrete data                                                                                                          Raw data                           Conformance   Process model
                        30
                                                                                                                                            representation     checking
                                                                      {x1 , x2 , x3 , . . . } ⊆ X
                        20


                        10                                                                                                                   Abstracts

                         0
                              0    5       10       15       20

                      Probability density function
                                                                                                                        Domain                 Uncertain
                        0.8                                                                                            knowledge               event log
                                                                               Interval
                        0.6
  Continuous data
                        0.4                                           {x ∈ R | a ≤ x ≤ b}           Fig. 1: The overall schema for process mining over uncertainty.
                        0.2



                         0
                         −2   −1       0        1        2        3

                                                                                                              IV. O PEN C HALLENGES AND C ONCLUSION
TABLE II: The four different types of uncertainty subject of
                                                                                                       The field of process mining over uncertain data is still
this research project.
                                                                                                    in its infancy. While some techniques to perform discovery
                                                                                                    and conformance checking over uncertainty do exist, the
                    III. R ESEARCH A PPROACH                                                        weakly uncertain case is still unexplored. The principle of the
                                                                                                    four quality metrics of logs and processes (fitness, precision,
   We will now illustrate the guiding principles of our research                                    simplicity, precision), a cornerstone of process mining, needs
plans, through a series of assertions.                                                              to be (re)developed in the context of uncertain data.
Assertion 1 (Uncertainty is not noise). Uncertain data contain                                         Through analyzing uncertain event data without discarding
information and value. We do not aim to analyze the data                                            any of the attributes in an uncertain event log, this research di-
beyond the uncertainty, but the data within the uncertainty.                                        rection unlocks the extraction of process information formerly
                                                                                                    inaccessible. Insights from process mining analyses can, as a
Assertion 2 (Uncertainty should not be filtered or repaired).                                       consequence, maintain quantified guarantees of reliability and
To extract information from uncertainty itself, existing ap-                                        accuracy even in presence of data affected by uncertainty.
proaches to filter or repair data are not applicable: informa-
tion from uncertainty must be accounted for, and not altered.                                                               ACKNOWLEDGMENTS
                                                                                                      I am very grateful to Prof. Wil van der Aalst, who advises
Assertion 3 (Uncertainty is behavior). The many possible
                                                                                                    my doctoral studies, and to Merih Seran Uysal, who supervises
values for event attributes entail numerous possible scenarios
                                                                                                    me in researching this topic. I thank the Alexander von Hum-
for the control-flow perspective of an uncertain trace—which
                                                                                                    boldt (AvH) Stiftung for supporting my research interactions.
can be represented as behavior. To fully analyze uncertain
process instances, it is necessary to account for such behavior.                                                                     R EFERENCES
   The fundamental technique that enables the analysis of                                           [1] M. Pegoraro and W. M. P. van der Aalst, “Mining uncertain event data in
                                                                                                        process mining,” in International Conference on Process Mining (ICPM).
uncertain traces is their representation as dynamic objects that                                        IEEE, 2019, pp. 89–96.
incorporate the intrinsic behavior of uncertain traces, such as                                     [2] M. Pegoraro, B. Bakullari, M. S. Uysal, and W. M. P. van der Aalst,
graphs or Petri nets (behavior graphs or behavior nets [1],                                             “Probability estimation of uncertain process trace realizations,” in In-
                                                                                                        ternational Workshop on Event Data and Behavioral Analytics (EdbA).
respectively). This leads to the schematic visible in Figure 1.                                         Springer, 2021.
   A number of mining techniques for uncertain event data                                           [3] M. Pegoraro, M. S. Uysal, and W. M. P. van der Aalst, “Conformance
are now present in literature. A taxonomy of uncertain event                                            checking over uncertain event data,” Information Systems, 2021.
                                                                                                    [4] ——, “Discovering process models from uncertain event data,” in Inter-
data is available [1], as well as a method to reliably compute                                          national Conference on Business Process Management (BPM). Springer,
the probability associated with each real-life scenario in an                                           2019, pp. 238–249.
uncertain trace [2]. There exist approaches for conformance                                         [5] ——, “Efficient construction of behavior graphs for uncertain event data,”
                                                                                                        in International Conference on Business Information Systems. Springer,
checking [3] and process discovery [4] over strongly uncertain                                          2020, pp. 76–88.
event data. The key phase in uncertain data analysis of building                                    [6] ——, “Efficient time and space representation of uncertain event data,”
graph representation has been optimized through efficient algo-                                         Algorithms, vol. 13, no. 11, p. 285, 2020.
                                                                                                    [7] ——, “PROVED: A tool for graph representation and analysis of uncer-
rithms [5], [6]. Such techniques are available in the PROVED                                            tain event data,” in International Conference on Applications and Theory
toolset [7], which employs an ad-hoc extension of the XES                                               of Petri Nets and Concurrency. Springer, 2021, pp. 476–486.
standard to represent uncertain data [8]. A real-life source of                                     [8] ——, “An XES extension for uncertain event data,” in International
                                                                                                        Conference on Business Process Management (BPM). Springer, 2021.
uncertain data, convolutional neural network sensing in video                                       [9] I. Cohen and A. Gal, “Uncertain process data with probabilistic
feeds of processes, has been described, as well as an additional                                        knowledge: Problem characterization and challenges,” CoRR/abs, 2021.
taxonomy also involving process models [9].                                                             [Online]. Available: https://arxiv.org/abs/2106.03324


 Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0
 International (CC BY 4.0).