<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Advancing Decomposed Conformance Checking in Process Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wai Lam Jonathan Lee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, School of Engineering, Pontificia Universidad Católica de Chile</institution>
          ,
          <addr-line>Vicuña Mackenna 4860, Macul, Santiago</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>As process mining gains traction as a tool for analyzing and improving processes in the industry as exemplified by companies such as Disco, Celonis, and Minit, many commercial process mining tools are extending beyond process discovery to include conformance checking features. Performing conformance checking in industrial settings means that techniques have to be functional with industrial sized processes and real-life settings. In this thesis, we propose three approaches: a divide-and-conquer approach to alignment, conformance checking algorithm selection and a HMM-based approach to online conformance checking.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Conformance checking</kwd>
        <kwd>Process mining</kwd>
        <kwd>Alignment-based conformance checking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>other is the approximation of the overall conformance between the process and the log, such as
pseudo-alignments or approximate alignments. As such, it is clear that current decomposition
techniques do not address the problem of computing the exact fitness between a given log and
model.</p>
      <p>Therefore, we investigate and show the required condition for merging optimal decomposed
pseudo-alignments into optimal valid alignments. The merged alignments are then shown to
have the same costs as the alignments computed under the monolithic approach. Taking this
merging condition, we propose an iterative approach that computes optimal alignments and
thereby fitness in a divide and conquer manner. In a nutshell, this approach checks for the
total border agreement condition for decomposed sub-alignments and resolves those which
do not meet the condition using a coarser valid decomposition in the following iteration. This
approach creates a full circle approach for decomposed alignment (‘there and back again’),
and makes it possible to solve alignment-based conformance problems that current techniques
cannot handle and can balance between quality and computation time.</p>
      <p>Our experimental results found that for the proposed recomposing conformance checking
framework, the bottleneck is in that some log traces require many iterations to reach the needed
merging condition. To tackle this bottleneck, we present various heuristics that can be applied at
the recomposition step so that the resulting coarser valid decomposition is more likely to yield
valid optimal alignments. In total, we propose three net recomposition strategies and one log
recomposition strategy. The net recomposition strategies focus on identifying and resolving sets
of border activities that are causing the alignment to not meet the required merging conditions.
Border activities are activities that are in multiple sub-nets of the valid decomposition. Similarly,
the log recomposition strategy increases the requirement for a to-be-aligned log trace to be
selected for the next iteration. This means that problematic log traces that are unlikely to result
in valid alignments are not included so that we avoid wasting computation.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Conformance checking algorithm selection</title>
      <p>There can be many conformance checking algorithms for the same task. For example, for
alignment-based techniques, there are many techniques that focus on being great at specific
aspects, e.g., computation time, specific data type and conformance issues. This means that the
end user needs to be aware of their advantages and disadvantages to choose the appropriate
technique for their data and objective. However, the end user may not have the expert knowledge
to always select the most appropriate algorithm. One solution is to have an oracle that have
been configured to help users with the decision given their objectives. We apply machine
learning techniques to learn a classifier that would predict the best alignment algorithm in
terms of whether to use decomposition or not to minimize computation time.</p>
      <p>We tackle the algorithm selection problem by encoding it as a classification task. By applying
well known classifiers, the trained predictive model can select the algorithm that is most likely
to achieve best performance among the set of available algorithms. As a first instance, we
tackle the problem of deciding when to apply decomposition-based algorithms to compute exact
optimal alignments using A* techniques so that computation time is minimized. We present
the analysis of the trained models and identifies features that can have a significant impact on
the performance of diferent algorithms. While this tackles a specific problem in the space of
alignment computation, it is easy to see that the proposed framework can have an impact on
other scopes, i.e., tackle more general alignment algorithm selection problems, as well as in
the considered algorithms to include other alignment-based algorithms such as planner-based
approaches, and approaches based on diferent theories.</p>
      <p>We trained several classification algorithms on a set of synthetic data for four selected
alignment algorithms. The model features used as input for the classification algorithms are
extracted from the log trace, the model, and the synchronous net product between the trace net
and model respectively. To ensure that the predictive models can practically perform predictions
prior to alignment, the features are relatively simple and can all be extracted in linear time with
respect to the size of the synchronous net product. The classifiers are then evaluated in terms of
their classification performance (precision, recall and F1-score) as well as the penalized average
runtime with a penalty factor of 10, i.e., a timeout counts as 10 times the timeout (PAR10).</p>
      <p>Unfortunately, our results show that none of the classifiers were able to beat the single best
(SB) solver, i.e., the algorithm with the best overall performance. Yet we did find that the trained
classifiers have comparable performance to the SB solver and are better performing than the
baseline random classifier.</p>
    </sec>
    <sec id="sec-3">
      <title>3. A HMM-based approach to online conformance checking</title>
      <p>Lastly, we turn to the challenge of conformance checking under an online context. Given the
volume and velocity at which event data comes in, organizations may not store these data for
ofline analysis and have to resort to online techniques. Moreover, performing analysis in real
time allows process stakeholders to react to conformance issues. Most existing conformance
checking techniques require the trace of events to correspond to a completed case. This means
that these techniques target ofline scenarios and do not typically cater for online contexts
where it is desirable to raise alerts as soon as a significant deviation is observed for cases that
have not reached completion. Moreover, due to the continuous increase in recorded data, it
can be infeasible for organizations to store data for ofline processing. For example, Walmart is
estimated to collect more than 2.5 petabytes of data every hour from its customer transactions.
As such, in recent years, a new set of algorithms has been proposed for online scenarios in
which we assume to have an event stream as input so that each item relates to an observed
event for a case.</p>
      <p>
        Here, we propose a novel online approach which performs conformance checking on an event
stream with constraints on memory and time. There are several works on online conformance
checking, but there still exists areas for improvement. For example, prefix alignments [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
and a similar approach based on enriching a transition system using alignment concepts [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
have dificulties handling warm start scenarios. Another approach that performs conformance
checking on behavioral patterns can lose information due to its abstraction [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>One fundamental challenge of explaining the conformance of a running case is in balancing
between making sense at the process level as the case reaches completion and putting emphasis
on the current information at the same time. Given the trace of a complete case,
alignmentbased techniques excels at giving a globally optimal conformance solution. However, as the
case unfolds in an online scenario, alignment techniques can be slow to realise such eventual
explanation since they are always seeking a globally optimal explanation. Moreover, there is
no flexibility in allowing warm start scenarios. At the other end of the spectrum, focusing
only on the current information given by an incoming event of a case can be insuficient in
providing a conformance explanation coherent at the process level. However, if we only check
directly following behavioral patterns as new events of the case come in, the conformance issue
would not be detected since new events always form a modeled directly following behavioral
pattern with the previous event. As such, it is desirable to have an online framework that yields
balanced conformance explanations.</p>
      <p>To tackle this problem, we present such framework based on Hidden Markov Models (HMM).
As new events come in for running cases, the model alternates between localizing the running
case within the reference model using the observed event and computing conformance from
such estimated position. Diferent to the assumption of the standard HMM, both the previous
state and observation can influence the next state due to non-conformance. This is modeled
by conditioning state transition and observation probabilities by both the previous state and
observation. Furthermore, rather than deciding beforehand the efects of non-conformance, an
Expectation-Maximization (EM) algorithm is applied to compute the parameters from past data.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>
        To evaluate the proposed techniques, both synthetic and real-life datasets are used. For the
recomposition framework, synthetic data was generated using the PTandLogGenerator [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and
the PLG2 tool. For the experiments, 15 synthetic nets were generated with various settings so
that they include large processes containing combinations of all the most common workflow
patterns, such as XOR, AND, loops, or invisible transitions. Logs were generated from the
models using simulation, and diferent operations were applied to emulate diferent plausible
noise scenarios: no noise, noise by removing events and noise by swapping. These generated
data are all publicly available. Two real life datasets are used to evaluate the recomposition
framework - the BPIC 2012 dataset and the BPIC 2018 dataset. Since no explicit process model
is provided with the dataset, a model is constructed with consideration to the task semantics.
      </p>
      <p>As explained previously, for the algorithm selection task, two publicly available synthetic
datasets were used. After verifying, cleaning and formatting the generated data, the final dataset
records performance data for over 800,000 alignment computations on 140,000 model trace
pairs that have been generated from 20 models.</p>
      <p>Lastly, for the HMM-based online conformance checking framework, a publicly available
synthetic dataset was used to perform a stress test and a correlation analysis with existing
online techniques. Then the framework is evaluated using a real-life dataset of a hospital billing
process.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Tools</title>
      <p>All of the proposed techniques have working implementations available. The recomposing
conformance checking framework has been implemented as the Replay with Recomposition
plugin in the DecomposedReplayer package of ProM6.9. ProM is an extensible framework that
supports a wide variety of Process Mining techniques in the form of plug-ins. It is platform
independent as it is implemented in Java, and can be downloaded free of charge. In the plugin,
users can configure diferent values for the parameters of the framework. Both the algorithm
selection and HMM-based online framework are implemented in Python and can be found on
Github so that results can be reproduced.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by CONICYT-PCHA / Doctorado Nacional / 2017-21170612.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Decomposing petri nets for process mining: Ageneric approach</article-title>
          ,
          <source>Distributed and Parallel Databases</source>
          <volume>31</volume>
          (
          <year>2013</year>
          )
          <fpage>471</fpage>
          -
          <lpage>507</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bolt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hassani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. F. van Dongen</given-names>
            ,
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Online conformance checking: relating event streams to process models using prefix alignments</article-title>
          ,
          <source>International Journal of Data Science and Analytics</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carmona</surname>
          </string-name>
          ,
          <article-title>A framework for online conformance checking, Business Process Management Workshops - BPM 2017 International Workshops (</article-title>
          <year>2017</year>
          )
          <fpage>165</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Armas-Cervantes</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. F. van Dongen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carmona</surname>
          </string-name>
          ,
          <article-title>Online conformance checking using behavioural patterns</article-title>
          ,
          <source>Business Process Management - 16th International Conference</source>
          (
          <year>2018</year>
          )
          <fpage>250</fpage>
          -
          <lpage>267</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Jouck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Depaire</surname>
          </string-name>
          ,
          <article-title>Ptandloggenerator: A generator for artificial eventdata</article-title>
          ,
          <source>BPM (Demos) 1789</source>
          (
          <year>2016</year>
          )
          <fpage>23</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>