<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1007/978-3-642-28108-2_19</article-id>
      <title-group>
        <article-title>Uncertainty for Explainable Process Mining (PhD Proposal)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arvid Lepsien</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Stockholm, Sweden</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kiel University, Group Process Analytics</institution>
          ,
          <addr-line>Hermann-Rodewald-Str. 3, 24118 Kiel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2011</year>
      </pub-date>
      <volume>99</volume>
      <fpage>37</fpage>
      <lpage>46</lpage>
      <abstract>
        <p>Process mining has proven its usefulness in a wide range of practical applications, however, it generally requires event logs to be certain to guarantee trustworthy results. Typical approaches to uncertainty in event logs, like frequency-based filtering or other heuristics, restrict insight into processes because they possibly discard relevant information. Several alternative approaches to uncertainty in process mining have been proposed to provide more detailed insights, but these approaches each only address a single type of uncertainty. A great number of domains could benefit from process mining techniques that can handle the simultaneous occurrence of multiple types of uncertainty, e.g., probabilistic processes where event logs are extracted from unstructured data. The proposed PhD project aims to develop a holistic approach to uncertainty in process mining, adding a comprehensive perspective of uncertainty to the insights generated by process mining analyses. To achieve this, methods concerned with diferent types of uncertainty in process mining, namely data, correlation, and process uncertainty, will be investigated, and then combined into a harmonized framework, providing a foundation for improved decision-making.</p>
      </abstract>
      <kwd-group>
        <kwd>Process mining</kwd>
        <kwd>Uncertainty</kwd>
        <kwd>Root cause analysis</kwd>
        <kwd>Unstructured data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>
        Process mining has proven its usefulness in a wide range of practical applications in order to
uncover bottlenecks and ineficiencies in processes or to identify tasks for automation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. One
future avenue for process mining should be to increase the trustworthiness of its results, i.e.,
to develop methods to provide confidence and trust in its automatically generated insights to
end users [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In general, process mining requires event logs that are accurate in terms that
the recorded events actually happened and the attributes of events were recorded correctly
[3]. While this assumption might be appropriate for some settings (e.g., when event logs are
extracted from process-aware information systems), it is more challenging to comply with this
assumption in other settings, leading to reduced result trustworthiness. For instance, event logs
sourced from unstructured data and unstructured processes can only indicate likelihoods when
mapping low-level events onto activities and cases [4]. Therefore, to increase trustworthiness
in process mining applications, the challenge to deal with is uncertainty.
      </p>
      <p>Generally, three diferent types of uncertainty exist for processes discovered from
unstructured data, namely data uncertainty, correlation uncertainty, and process uncertainty. Data
uncertainty refers to the degree of noise in the data like inaccurate, imprecise, untrustworthy,
and unknown data. Correlation uncertainty refers to the likelihood of event-activity mappings
since there is often more than one possible solution. Then, process mining can also be afected by
the uncertainty inherent in the analyzed process (i.e., probabilistic dependencies and contextual
influences afected by randomness).</p>
      <p>The reduction and quantification of uncertainty might improve process discovery results,
improve conformance checking and predictive monitoring and even elevate process discovery
techniques on unstructured data. The purpose of this PhD project is to develop a holistic
framework to address uncertainty in process mining, especially process mining applied to
unstructured data. This includes quantifying the impact of uncertainty, uncovering the sources
of uncertainty in event log extraction, quantifying the random factors of uncertainty when
mapping events onto activities, and providing a user-friendly communication of the
uncertaintyaware process mining methods.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>In most process mining approaches, uncertainty is treated as an issue of event log quality, and
addressed with frequency-based filtering or other heuristics. While this is a practical approach
to reduce noise (i.e., erroneous recordings), it may also suppress outliers (i.e., correctly recorded,
but unexpected behavior), which are highly relevant to analyze process deviations [5].</p>
      <p>Recently, some approaches have been suggested that integrate uncertainty into process
mining, instead of disregarding uncertain event data. Pegoraro [6] proposed a framework that
adds a perspective on data uncertainty to process mining. An event log is annotated with
(meta-)information related to the uncertainty of the events contained in an event log, which is
used as additional input to uncertainty-aware algorithms for process discovery and conformance
checking. Also, process models are annotated to represent the uncertainty of the event log they
were discovered from. Qafari et al. [7] proposed an approach to identify the causes of process
performance and compliance problems from event logs. Structural causal models are used to
discover causal relations between distinct features (e.g., event or case attributes, the occurrence
of an activity) and problematic process outcomes. Leemans et al. [8] developed a method to
identify long-term dependencies between control-flow decisions in a process. Control-flow
decisions are identified from a process model and event log of this process, then probabilistic
causalities between control-flow decisions at the decision points are discovered, and finally,
the size of each causal efect is estimated. Alman et al. [ 9] proposed a framework to extend
declarative process mining methods with a process uncertainty perspective.</p>
      <p>To sum up, no approach exists to quantify diferent types of uncertainty in process mining.
Mostly, existing approaches are limited to either data or process uncertainty and focus on settings
where structured event logs are available. The integration of multiple types of uncertainty
into process mining in a combined approach is still an open challenge. Additionally, current
event log extraction techniques are unable to provide explicit uncertainty information, leaving
a blind spot with respect to correlation uncertainty. Thus, handling uncertainty is particularly
challenging for process mining on unstructured data, where event logs need to be extracted
ifrst.</p>
      <p>A large body of research is available on the quantification of uncertainty in domains other
than process mining (e.g., deep learning [10], mechanical engineering [11], or climate modeling
[11]), which can be built on to provide uncertainty quantification techniques for process mining,
especially in order to address data and correlation uncertainty. For instance, Zhang et al. [12]
provide a general framework guiding the application of existing uncertainty quantification
methods. Abdar et al. [10] discuss applications of uncertainty quantification in deep learning.
Similarly, to address process uncertainty, the PhD project can rely on a large body of causal
inference techniques [13] to quantify probabilistic dependencies in processes.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Research Design</title>
      <p>III</p>
      <p>II</p>
      <p>I</p>
      <p>IV</p>
      <p>V</p>
      <p>The goal of the PhD project is to address the various challenges outlined above. Particularly, a
structured approach making uncertainty explicit in the process discovery phase and quantifying
the degree of multiple types of uncertainty will be developed. The focus of this approach lies
on the analysis of unstructured data since the level of uncertainty in the data is very high and
all three types of uncertainty are significant. Fig. 1 shows the general structure of approaches
to process mining on unstructured data (e.g., [14]), which are used to divide the PhD project
into five phases. Phases (I)-(III) serve to enrich event logs with information on uncertainty
and its impact on the trustworthiness of the insights, and phases (IV) and (V) are concerned
with developing uncertainty-aware process mining methods and communicating the impact of
uncertainty related to the gained insights. To support the generalizability of our approach, large
and heterogeneous evaluation datasets with known ground-truth processes and characteristics
(e.g., the amount of uncertainty) will be created by generating synthetic event data [15].</p>
      <p>The first step addresses data uncertainty in event logs (I). For this, a taxonomy of uncertainty
in event logs will be developed and used to generate synthetic event logs of certain processes
with varying levels of data uncertainty. Then, uncertainty quantification [ 12, 10] methods will
be adapted to assess the impact of data uncertainty related to the quality of insights gained
from the process mining results. Next, the scope is widened to include correlation uncertainty
by integrating uncertainty awareness into event log extraction (II). The applicability of the
methods developed in (I) will be extended to unstructured data sources by developing extraction
techniques that produce explicit uncertainty information. The quantification of data uncertainty
can be extended to event log extraction techniques to enable the automatic identification of
sources of data uncertainty. In the third step, process uncertainty is addressed (III). Process
uncertainty can be isolated by generating high-quality event logs of uncertain, probabilistic
processes. To provide a solution, (1) existing approaches for discovering (probabilistic) causalities
in process mining will be reviewed, (2) causal inference techniques [13] will be adapted to
address the gaps identified in this review, and (3) means to enrich event logs with quantitative
information of the discovered causalities will be developed. The goal of the fourth step is to
integrate the diferent views on uncertainty explicitly into common process mining tasks (IV). To
do this, uncertainty-aware process mining techniques need to be developed, which can explicitly
encode uncertainty to improve the quality compared to non-uncertainty-aware techniques
and improve the quantification of uncertainty through measures. Finally, explainability will
be addressed to improve the communication to non-experts between the technical design
and the process mining results (V). The explainability of uncertainty provides a basis to gain
additional insights for informed process management decisions. For this, diferent means to
communicate process mining results (e.g., process models, reports) need to be ofered for the
uncertainty-enriched outputs developed in the previous phases.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this PhD proposal, the challenges related to diferent types of uncertainty in process mining
were described. The goal of the proposed PhD project is to address these challenges by suggesting
a holistic framework to manage uncertainty in process mining. In order to achieve this, methods
to enrich event logs with information on data, correlation, and process uncertainty will be
developed, and this information will then be made explicit in the analysis results by integrating
uncertainty into process mining methods and the communication of process mining results.
The PhD project contributes to trustworthy process mining, laying the foundation for improved
process mining-based decision-making.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>I would like to thank my thesis advisor, Agnes Koschmider, for her invaluable guidance and
support in writing this proposal, and Milda Aleknonytė-Resch for the helpful discussions about
the research design and presentation of the proposal. This project has received funding from
the German Federal Ministry for Economic Afairs and Climate Action under the Marispace-X
project grant no. 68GX21002E.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Reinkemeyer</surname>
          </string-name>
          (Ed.), Process Mining in Action: Principles,
          <string-name>
            <given-names>Use</given-names>
            <surname>Cases</surname>
          </string-name>
          and Outlook, Springer, Cham,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>030</fpage>
          - 40172- 6.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Koschmider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Oppelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hundsdörfer</surname>
          </string-name>
          ,
          <article-title>Confidence-driven communication of process mining on time series</article-title>
          ,
          <source>Informatik Spektrum</source>
          <volume>45</volume>
          (
          <year>2022</year>
          )
          <fpage>223</fpage>
          -
          <lpage>228</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s00287- 022- 01470- 3.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>