<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Tool for Aligning Event Logs and Prescriptive Process Models through Automated Planning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Massimiliano de Leoni</string-name>
          <email>m.d.leoni@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giacomo Lanciano</string-name>
          <email>lanciano.1487019@studenti.uniroma1.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Marrella</string-name>
          <email>marrella@dis.uniroma1.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eindhoven University of Technology</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sapienza - University of Rome</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In Conformance Checking, alignment is the problem of detecting and repairing nonconformity between the actual execution of a business process, as recorded in an event log, and the model of the same process. Literature proposes solutions for the alignment problem that are implementations of planning algorithms built ad-hoc for the specific problem. Unfortunately, in the era of big data, these ad-hoc implementations do not scale sufficiently compared with wellestablished planning systems. In this paper, we tackle the above issue by presenting a tool, also available in ProM, to represent instances of the alignment problem as automated planning problems in PDDL (Planning Domain Definition Language) for which state-of-the-art planners can find a correct solution in a finite amount of time. If alignment problems are converted into planning problems, one can seamlessly update to the recent versions of the best performing automated planners, with advantages in term of versatility and customization. Furthermore, by employing several processes and event logs of different sizes, we show how our tool outperforms existing approaches of several order of magnitude and, in certain cases, carries out the task while existing approaches run out of memory.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In general, a large number of possible alignments exist between a process model
and a log trace, since there may exist manifold explanations why a trace is not
conforming. It is clear that one is interested in finding the most probable explanation, i.e.,
one of the alignments with the least expensive deviations (i.e., optimal alignments [
        <xref ref-type="bibr" rid="ref1 ref2">1,
2</xref>
        ]), according to some function assigning costs to deviations. The work by Adriansyah
et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] provides a technique to compute these optimal alignments, based on the A
algorithm and ad-hoc heuristics. However, experiments (also reported in this paper)
show that their technique does not scale sufficiently. In the era of big data, scalable
approaches are desperately necessary, as also advocated by the IEEE Task Force in
Process Mining [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: “In some domains, mind-boggling quantities of events are recorded.
[...] Therefore, additional efforts are needed to improve performance and scalability.”.
      </p>
      <p>We believe that re-implementing standard planning algorithms, such as A*, for
solving specific planning problems in an ad-hoc way is not ideal. First, ad-hoc
implementations cannot compare in scalability with well-established planning systems. Second,
ad-hoc implementations prevent one from seamlessly plugging in new algorithms or
evaluate alternative heuristics. As a consequence, the possibility of evaluating
different alternatives is hampered; also, if the literature proposes new algorithms that clearly
overtake the existing ones for solving instances of the conformance checking problem,
a massive amount of work is needed to modify those ad-hoc implementations.</p>
      <p>
        In order to facilitate the integration of different planning algorithms, in this paper
we present a tool that (i) allows to formulate the problem of computing optimal
alignments as a planning problem in the standard Planning Domain Definition Language [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
(aka PDDL), and (ii) employs state-of-the-art automated planners [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to solve such a
problem. While the theoretical approach underlying the tool is discussed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], here we
discuss its architecture and report on the most interesting results from the experiments,
which show that our tool is versatile, allowing multiple planners to be integrated, and
scales better than existing approaches with larger models and event logs.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Tool</title>
      <p>
        The proposed tool is able to constructs an optimal alignment of an event log and a
process model [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. To model business processes, we opted for Petri nets. The tool has
been developed both as a plugin of ProM, a framework to develop process-mining
techniques, and as a stand-alone standard Java application.3 The ProM version is available
in package PlanningBasedAlignment under a plugin named “Planning-based Alignment
of Event Logs and Petri Nets”. In the remainder, we mostly focus on the ProM version.
The alignment engine relies on two main architectural layers as shown in Fig. 1(a).
      </p>
      <p>The Design Layer provides a wizard-based GUI that assists the process designer in:
(i) the import of existing event logs and Petri nets stored in external repositories; (ii) the
customization of a cost function on legal alignment moves to define the severity of a
deviation4; (iii) the specification of the initial/final marking of the Petri net under
analysis, and (iv) the selection of a search heuristic within an available repertoire. The tool
supports the standards XES (eXtensible Event Stream) and PNML (Petri Net Markup
Language) for respectively storing and analyzing event logs and Petri nets.
3 The stand-alone version of the tool can be downloaded at https://goo.gl/ZK9Frb
4 The computation of the optimal alignment is based on this cost function: we want to minimize
the cost of the alignment’s moves.</p>
      <p>(a) Architecture of the tool.</p>
      <p>(b) Overview of the trace-model alignments.</p>
      <p>(c) Projections of alignment moves on the Petri net.</p>
      <p>The Encoding &amp; Planning Layer is in charge of translating the input specifications
in PDDL format, which is readable by any state-of-the-art planner, and performing
the alignment task. Specifically, starting from a Petri net N and an event log L to be
aligned, the PDDL Encoder builds, for each log trace L 2 L, (i) a PDDL domain file
PD, which encodes the domain propositions needed to capture the structure of N and to
monitor the evolution of its marking, and three classes of planning actions that represent
“alignment” moves: synchronous moves (associated with no cost), model moves and
log moves; and (ii) a PDDL problem file PR, which includes a number of constants
required to properly ground all the domain propositions in PD; in our case, constants
will correspond to the place and transition instances involved in N . Secondly, we define
the initial state of PR to capture the exact structure of the specific log trace L of interest
and the initial marking of N . Thirdly, we encode the goal condition of PR to represent
the fact that N is in the final marking and L has been completely analyzed. At this
point, for any trace of the event log (i.e., for any of the generated planning problems) a
state-of-the-art planner is invoked. With such inputs, the planner synthesizes a plan to
reach the final goal from the initial state, i.e., a sequence of alignment moves (each of
which is a planning action) that establish an alignment between L and N .</p>
      <p>
        The ProM version is integrated with the FAST-DOWNWARD planning framework [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
to find optimal alignments of event logs and process models. FAST-DOWNWARD is a
progression planner that uses hierarchical decompositions of planning tasks for
computing its heuristic function, called the causal graph heuristic, which approximates
goal distances by solving a hierarchy of “local” planning problems. To produce
optimal alignments, FAST-DOWNWARD uses a best-first search in first iteration to find a
plan and a weighted A* search to iteratively decreasing weights of plans.
      </p>
      <p>
        When a plan (i.e., an optimal alignment) satisfying the goal is found, the Translator
component makes it available through the GUI. Fig. 1(b) shows the optimal alignments
for the road-traffic management process and respective event log (see, e.g.,[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]).5 Here,
we leverage on the visualization developed for [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]: Each sequence of triangles refers to
an optimal alignment of a trace with respect to the model. Each triangle is a different
alignment move; the color of the move depends on its type: moves in the log, moves
in the model or moves in both are colored in yellow, purple and green respectively.
Each sequence is also associated with a number that identifies the fitness score of the
specific trace. In addition, together with the alignment moves, when all the traces of a
log have been analyzed, the tool produces several statistics to evaluate the quality of
the alignments (e.g., the average cost of the alignment, the percentage of dirty traces in
the log, etc.). Fig. 1(c) shows a screenshot of the tool where alignments are projected
onto the process model. Each activity is associated with a color that reflects its degree
of conformance, namely the fraction of log and model moves wrt. all moves for that
activity. Red and white transitions indicate a degree 0 or 1 of conformance, respectively.
Intermediate shades of red and yellow represent in-between values.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Maturity</title>
      <p>
        Our tool was validated with several combinations of real-life and synthetic event logs
and processes. We refer to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for a thorough discussions of the experimental test bed
and of the results. Here, for the sake of space, we focus on the results on the synthetic
process models and event logs. Specifically, we artificially generated eight process
models of increasing sizes in form of Petri nets to evaluate the scalability of the approach
and, for each Petri net, we automatically generated event logs (containing 1000 traces
each) with increasing amount of noise. In the remainder, an event log with X% of noise
indicates that, starting from an event log where each trace is conforming, an event is
swapped with the next event in the trace with a probability of X%. This swapping
generally causes the trace to be no more conforming.
5 The log is available at https://doi.org/10.1007/s00607-015-0441-1
      </p>
      <p>
        The experiments with the synthetic process models are summarized in Table 1. Each
row refers to a different Petri net. For each row, the first column indicates the size of the
Petri net in term of number of transitions; the subsequent 12 columns refer to the
average time to alignment and event log trace with the respective process model when the
corresponding event log contain 10%, 20% and 30% noise. For each amount of noise,
we compared the result of the existing ad-hoc approach by Adriansyah et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] with
the results obtained by our tool. We ran our tool using its built-in planner, i.e.,
FASTDOWNWARD, and two further external well-known state-of-the-art planners –
SymBA2* and SPM&amp;S6 – that provide searching algorithms that guarantee the optimality of
the returned alignments. The results show that the existing ad-hoc approach is faster
with models of small sizes and/or when using event logs with smaller amount of noise.
Conversely, our approach is faster with larger models. In particular, it seems that,
generally speaking, FAST-DOWNWARD performs better with middle-size models whereas
SymBA-2* works better with very large models. The existing approach by Adriansyah
et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is confirmed not to sufficiently scale when event logs and process models
become very large. As a matter of fact, when testing with the two largest models, the
existing approach was not able to align every log trace with the corresponding model
and was running out of memory, even though we were using a i7 CPU machine with
16 GBs of RAM. Conversely, tracking the memory consumption, all of three employed
planners were never using more than 4 GBs of RAM. Even when the existing approach
was able to carry out the alignment tasks, it could do it significantly slower. As an
example, compare the results for the model with 115 activities and an event log with
30%: On average, the existing approach requires 1987 ms per trace, versus 716 ms for
FAST-DOWNWARD. A saving of 1.2 seconds per trace means that, to align all traces of
an event log with, e.g., 100000 traces (not uncommon in real-world case studies), the
use of FAST-DOWNWARD would allow us to save 120000 seconds: 83.3 hours.
Acknowledgments. The work of Andrea Marrella has been partly supported by the
Sapienza project “Data-aware Adaptation of Knowledge-intensive Processes in
CyberPhysical Domains through Action-based Languages”.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          : Process Mining: Data Science in Action. Springer Berlin (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Adriansyah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidorova</surname>
          </string-name>
          , N.,
          <string-name>
            <surname>van Dongen</surname>
            ,
            <given-names>B.F.</given-names>
          </string-name>
          :
          <article-title>Cost-Based Fitness in Conformance Checking</article-title>
          .
          <source>In: 11th Int. Conf. on Application of Concurrency to System Design, IEEE</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Van Der Aalst</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , et al.:
          <article-title>Process Mining Manifesto</article-title>
          .
          <source>In: 9th Int. Conf. on Business Process Management (BPM</source>
          <year>2011</year>
          ), Springer Berlin (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>McDermott</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , et al.:
          <article-title>PDDL-The Planning Domain Definition Language</article-title>
          .
          <source>Technical Report DCS TR-1165</source>
          ,
          <article-title>Yale Center for Computational Vision</article-title>
          and Control (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Geffner</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonet</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A Concise Introduction to Models and Methods for Automated Planning</article-title>
          .
          <source>Synthesis Lectures on Artificial Intelligence and Machine Learning</source>
          <volume>8</volume>
          (
          <issue>1</issue>
          ) (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. de Leoni,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Marrella</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <source>Aligning Real Process Executions and Prescriptive Process Models through Automated Planning. Expert System with Applications</source>
          <volume>82</volume>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Helmert</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The Fast Downward Planning System</article-title>
          .
          <source>JAIR 26</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Mannhardt</surname>
          </string-name>
          , F.,
          <string-name>
            <surname>de Leoni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reijers</surname>
          </string-name>
          , H.A.,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Balanced multi-perspective checking of process conformance</article-title>
          .
          <source>Computing</source>
          <volume>98</volume>
          (
          <issue>4</issue>
          ) (
          <year>2015</year>
          )
          <article-title>6 Planners are</article-title>
          downloadable at: https://helios.hud.ac.uk/scommv/IPC-14/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>