=Paper=
{{Paper
|id=Vol-3098/dc_176
|storemode=property
|title=Process Mining on Uncertain Event Data (Extended Abstract)
|pdfUrl=https://ceur-ws.org/Vol-3098/dc_176.pdf
|volume=Vol-3098
|authors=Marco Pegoraro
|dblpUrl=https://dblp.org/rec/conf/icpm/000121
}}
==Process Mining on Uncertain Event Data (Extended Abstract)==
Process Mining on Uncertain Event Data (Extended Abstract) Marco Pegoraro* Chair of Process and Data Science (PADS) Department of Computer Science, RWTH Aachen University, Ahornstr. 55, 52074 Aachen, Germany *Corresponding author. Email: pegoraro@pads.rwth-aachen.de Abstract—With the widespread adoption of process mining in symptom, since it can be caused either by the condition or by organizations, the field of process science is seeing an increase external factors (such as very warm weather). The medic also in the demand for ad-hoc analysis techniques of non-standard reads the medical records of the patient and sees that, shortly event data. An example of such data are uncertain event data: events characterized by a described and quantified attribute prior to the lab exam, the patient was undergoing a heparin imprecision. This paper outlines a research project aimed at treatment (a blood-thinning medication) to prevent blood clots. developing process mining techniques able to extract insights The thrombocytopenia, detected by the lab exam, can then from uncertain data. We set the basis for this research topic, be either primary (caused by the blood cancer) or secondary recapitulate the available literature, and define a future outlook. (caused by other factors, such as a concomitant condition). Finally, the medic finds an enlargement of the spleen in the I. I NTRODUCTION patient (splenomegaly). It is unclear when this condition has Since its inception, process mining has ultimately proved its developed: it might have appeared at any moment prior to that value in commercial applications. An ever-increasing number point. These events are recorded in the trace ID192-1 (shown of success stories has led to a vast demand of the most in Table I) within the hospital’s information system. diverse process analysis techniques, often customized to meet Such scenario, with no known probability, is known as the needs of specific domains. Among these, novel techniques strong uncertainty. In this trace, the rightmost column refers have been introduced to mine non-standard types of data. to event indeterminacy: in this case, e1 has been recorded, This paper presents a research direction aimed to mine but it might not have occurred in reality, and is marked with one such type of anomalous (i.e, uncommon) type of event a “?” symbol. Event e2 has more then one possible activity data: uncertain data. Such data is associated with a degree of labels, either PrTP or SecTP. Lastly, event e3 has an uncertain imprecision that affects event attributes, which is described and timestamp, and might have happened at any point in time quantified through sets of possible attribute labels, intervals of between the 4th and 10th of July. possible values, or probability distributions. Uncertain events may also have probability values asso- The remainder of the paper is structured as follows. Sec- ciated with them, a scenario defined as weak uncertainty tion II illustrates with examples the structure of uncertain event (trace ID192-2 in Table I). In the example described above, data. Section III shows the research principles in regard of suppose the medic estimates that there is a high chance process mining on uncertain data, and reports recent results (90%) that the thrombocytopenia is primary (caused by the on the topic. Finally, Section IV outlines open challenges, cancer). Furthermore, if the splenomegaly is suspected to have outlook, and future perspectives of this line of research. developed three days prior to the visit, which takes place on the 10th of July, the timestamp of event e3 may be described II. U NCERTAIN DATA through a Gaussian curve with µ = 7. Lastly, the probability In order to more clearly visualize the structure of the that the event e1 has been recorded but did not occur in reality attributes in uncertain events, let us consider the following may be known (for example, it may be 25%). process instance, which is a simplified version of actually TABLE I: Two uncertain traces related to an example of occurring anomalies, e.g., in the processes of the healthcare healthcare process. The timestamps column shows only the domain. An elderly patient enrolls in a clinical trial for an day of the month. experimental treatment against myeloproliferative neoplasms, a class of blood cancers. This enrollment includes a lab exam Case ID Event ID Timestamp Activity Indeterminacy ID192-1 e1 5 NightSweats ? and a visit with a specialist; then, the treatment can begin. ID192-1 e2 8 PrTP, SecTP The lab exam, performed on the 8th of July, finds a low ID192-1 e3 4–10 Splenomeg level of platelets in the blood of the patient, a condition ID192-2 e4 5 NightSweats ? : 25% known as thrombocytopenia (TP). During the visit on the PrTP: 90%, ID192-2 e5 8 SecTP: 10% 10th of July, the patient reports an episode of night sweats ID192-2 e6 N (7, 1) Splenomeg on the night of the 5th of July, prior to the lab exam. The medic notes this but also hypothesizes that it might not be a Table II summarizes the types of uncertain data subject of Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). our research. Process Weak uncertainty Strong uncertainty (stochastic) (non-deterministic) Records Discrete probability distribution 50 Process 40 Set of possible values Agents Graph discovery Discrete data Raw data Conformance Process model 30 representation checking {x1 , x2 , x3 , . . . } ⊆ X 20 10 Abstracts 0 0 5 10 15 20 Probability density function Domain Uncertain 0.8 knowledge event log Interval 0.6 Continuous data 0.4 {x ∈ R | a ≤ x ≤ b} Fig. 1: The overall schema for process mining over uncertainty. 0.2 0 −2 −1 0 1 2 3 IV. O PEN C HALLENGES AND C ONCLUSION TABLE II: The four different types of uncertainty subject of The field of process mining over uncertain data is still this research project. in its infancy. While some techniques to perform discovery and conformance checking over uncertainty do exist, the III. R ESEARCH A PPROACH weakly uncertain case is still unexplored. The principle of the four quality metrics of logs and processes (fitness, precision, We will now illustrate the guiding principles of our research simplicity, precision), a cornerstone of process mining, needs plans, through a series of assertions. to be (re)developed in the context of uncertain data. Assertion 1 (Uncertainty is not noise). Uncertain data contain Through analyzing uncertain event data without discarding information and value. We do not aim to analyze the data any of the attributes in an uncertain event log, this research di- beyond the uncertainty, but the data within the uncertainty. rection unlocks the extraction of process information formerly inaccessible. Insights from process mining analyses can, as a Assertion 2 (Uncertainty should not be filtered or repaired). consequence, maintain quantified guarantees of reliability and To extract information from uncertainty itself, existing ap- accuracy even in presence of data affected by uncertainty. proaches to filter or repair data are not applicable: informa- tion from uncertainty must be accounted for, and not altered. ACKNOWLEDGMENTS I am very grateful to Prof. Wil van der Aalst, who advises Assertion 3 (Uncertainty is behavior). The many possible my doctoral studies, and to Merih Seran Uysal, who supervises values for event attributes entail numerous possible scenarios me in researching this topic. I thank the Alexander von Hum- for the control-flow perspective of an uncertain trace—which boldt (AvH) Stiftung for supporting my research interactions. can be represented as behavior. To fully analyze uncertain process instances, it is necessary to account for such behavior. R EFERENCES The fundamental technique that enables the analysis of [1] M. Pegoraro and W. M. P. van der Aalst, “Mining uncertain event data in process mining,” in International Conference on Process Mining (ICPM). uncertain traces is their representation as dynamic objects that IEEE, 2019, pp. 89–96. incorporate the intrinsic behavior of uncertain traces, such as [2] M. Pegoraro, B. Bakullari, M. S. Uysal, and W. M. P. van der Aalst, graphs or Petri nets (behavior graphs or behavior nets [1], “Probability estimation of uncertain process trace realizations,” in In- ternational Workshop on Event Data and Behavioral Analytics (EdbA). respectively). This leads to the schematic visible in Figure 1. Springer, 2021. A number of mining techniques for uncertain event data [3] M. Pegoraro, M. S. Uysal, and W. M. P. van der Aalst, “Conformance are now present in literature. A taxonomy of uncertain event checking over uncertain event data,” Information Systems, 2021. [4] ——, “Discovering process models from uncertain event data,” in Inter- data is available [1], as well as a method to reliably compute national Conference on Business Process Management (BPM). Springer, the probability associated with each real-life scenario in an 2019, pp. 238–249. uncertain trace [2]. There exist approaches for conformance [5] ——, “Efficient construction of behavior graphs for uncertain event data,” in International Conference on Business Information Systems. Springer, checking [3] and process discovery [4] over strongly uncertain 2020, pp. 76–88. event data. The key phase in uncertain data analysis of building [6] ——, “Efficient time and space representation of uncertain event data,” graph representation has been optimized through efficient algo- Algorithms, vol. 13, no. 11, p. 285, 2020. [7] ——, “PROVED: A tool for graph representation and analysis of uncer- rithms [5], [6]. Such techniques are available in the PROVED tain event data,” in International Conference on Applications and Theory toolset [7], which employs an ad-hoc extension of the XES of Petri Nets and Concurrency. Springer, 2021, pp. 476–486. standard to represent uncertain data [8]. A real-life source of [8] ——, “An XES extension for uncertain event data,” in International Conference on Business Process Management (BPM). Springer, 2021. uncertain data, convolutional neural network sensing in video [9] I. Cohen and A. Gal, “Uncertain process data with probabilistic feeds of processes, has been described, as well as an additional knowledge: Problem characterization and challenges,” CoRR/abs, 2021. taxonomy also involving process models [9]. [Online]. Available: https://arxiv.org/abs/2106.03324 Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).