<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A ProM Operational Support Provider for Predictive Monitoring of Business Processes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Federici</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Williams Rizzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chiara Di Francescomarino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marlon Dumas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chiara Ghidini</string-name>
          <email>ghidinig@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabrizio Maria Maggi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irene Teinemaa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FBK-IRST</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Tartu</institution>
          ,
          <country country="EE">Estonia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Trento</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Predictive process monitoring is concerned with exploiting event logs to predict how running (uncompleted) cases will unfold up to their completion. In this paper, we propose an implementation in the ProM toolset of a predictive process monitoring framework for estimating the probability that an ongoing case will lead to a certain outcome among a set of possible outcomes. An outcome refers to a label associated to completed cases, like, for example, a label indicating that a given case completed \on time" (with respect to a given desired duration) or \late", or a label indicating that a given case led to a customer complaint or not. The framework takes into account both the sequence of events observed in the current trace, as well as data attributes associated to these events. The prediction problem is approached in two phases. First, pre xes of previous traces are clustered according to control ow information. Secondly, a classi er is built for each cluster to discriminate among a set of possible outcomes. At runtime, a prediction is made on a running case by mapping it to a cluster and applying the corresponding classi er.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Often, questions and predictive challenges can arise during the execution of
business processes. For example, in a medical process execution a doctor may
ponder whether a surgery, a pharmacological therapy or a manipulation is the
best choice to be made in order to guarantee the patient recovery. Predictive
business process monitoring [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a family of techniques that apply what we do
in everyday life to the eld of business processes. In particular, predictive process
monitoring exploits event logs, which are more and more widespread in modern
information systems, to predict how current (uncompleted) cases will unfold up
to their completion. A predictive process monitor allows users to predict the
most likely outcome of the ongoing case. In this context, an outcome could be,
for example, the timely completion of the case with respect to a deadline (versus
late completion), or the ful llment of a desired business goal (e.g., a sales process
Copyright c 2015 for this paper by its authors. Copying permitted for private and
academic purposes.
leading to an order, or an issue handling process leading to successful resolution).
Based on the analysis of execution traces, the monitor provides the user with
estimations of the likelihood of achieving a given outcome for a running case.
      </p>
      <p>
        In this paper, we describe an implementation in the ProM process mining
toolset of a general customizable predictive process monitoring framework [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
that allows users to assign a \label" (outcome) to an ongoing case based on: (i)
a pre x thereof; and (ii) a set of labeled completed sequences (the \history").
ProM provides a generic Operational Support (OS) environment [
        <xref ref-type="bibr" rid="ref2 ref7">2,7</xref>
        ] that allows
the tool to interact with external work ow management systems at runtime. A
stream of events coming from a work ow management system is received by an
OS service. The OS service is connected to a set of OS providers implementing
di erent types of analysis that can be performed online on the stream. The
predictive process monitoring framework has been implemented as an OS provider.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Framework and Tool</title>
      <p>The framework requires as input a set of past executions of the process. Based
on the information extracted from such execution traces (sequences of events
with their associated payload, i.e., attribute-value pairs), it tries to predict how
currently evolving executions will develop in the future. To this aim, before the
process execution, an automated pre-processing phase is carried out. In such a
phase, state-of-the-art approaches for clustering and classi cation are applied
to the historical data in order to (i) identify and group historical trace pre xes
with a similar control ow, i.e., to delimitate the search space on the control
ow base (clustering from a control ow perspective) and, hence, avoid noise;
(ii) get a precise classi cation in terms of data of traces with similar control
ow (data-based classi cation). At runtime, the classi cation of the historical
trace pre xes is used to classify new traces during their execution and predict
how they will behave in the future. In particular, the new trace is matched to
a cluster, and the corresponding classi er is used to estimate the probability for
the trace to achieve a certain outcome. The overall picture of the framework is
illustrated in Fig. 1.</p>
      <p>
        The modules of the framework have been implemented by using di erent
techniques for experimentation purposes. The clustering module has been
implemented by using two di erent types of trace encoding and di erent types of
clustering algorithms. In particular, a frequency based and a sequence based trace
encoding approaches have been implemented. The former is realized encoding
each execution trace as a vector of event occurrences (on the alphabet of the
events), while, in the latter, the trace is encoded as a sequence of events. These
encodings can then be passed to the clustering techniques. For instance, the
frequency based encoding has been used with the Model-based clustering [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and
the sequence based encoding with the DBSCAN clustering [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In addition, for
model-based clustering, we use the Euclidean distance to identify the clusters
while, for DBSCAN, we use the edit distance. Finally, the supervised learning
module has been implemented by using decision tree and random forest
learning. The possible \instances" of our framework can be obtained through di erent
combinations of these techniques.
      </p>
      <p>The framework has been implemented as an OS provider.4 Fig. 2 shows the
architecture based on the OS. The OS service receives a stream of events (the
current execution trace) from a client and forwards it to the OS provider
(Predictive Monitor ) that, based on a repository of historical traces, returns back
predictions. The OS service sends these results back to the client. For the
implementation of the Predictive Monitor, we rely on (i) the Weka implementation of
the clustering methods, and on (ii) the WeKa J48 implementation of the C4.5
algorithm and the Weka implementation of random forest for the supervised
learning.</p>
      <p>As an additional utility, we have implemented a client application providing
users with (i) a simple interface for the choice of the (set of) con guration(s), i.e.,
the selection and the combination of the techniques available in the framework
and of the corresponding parameters, to be used for making predictions about
the current trace; (ii) a functionality which allows for an extended and fast
evaluation of di erent instances (con gurations) of the framework, when a set
of testing execution traces (gold standard) is available for evaluation purposes.
The client utility presents, indeed, two execution modalities: prediction, for the
prediction over one or more online execution traces (coming from a work ow
engine), and prediction for evaluation, returning a set of metrics related to the
quality of the results of di erent con gurations (if a testing log is available).
4 A screencast of this demo can be found at https://www.dropbox.com/s/
yrqszjmvv07okj1/PredictiveMonitoringTool.zip?dl=0
(a) Connection interface</p>
      <p>(b) Con guration interface</p>
      <p>Fig. 3a shows the starting interface of the client application. Through the
GUI the user can select the IP and the port of the OS hosting the Predictive
Monitor. Once connected to the server, the user is asked to choose the (set of)
con guration(s), i.e., the (combination of) framework clustering and classi
cation techniques and the corresponding parameters, she wants to use (Fig. 3b).
By clicking on the button run, the con gurations are sent to the server.</p>
      <p>Fig. 2 shows the logical architecture of the client application and its
interactions with the OS Service. As mentioned above, the user can choose whether to
use the client just as a \replayer" of a stream of events coming from a work ow
engine for prediction purposes or as an evaluation utility for di erent con
gurations of the predictive monitoring framework. The Unfolding Module combines
all the parameters provided by the user into a set of di erent con guration runs.
Here on, each con guration run is associated with an ID (Run ID ), which will be
used to refer such a con guration. The Con guration Sender sequentially sends
each Run ID to the server that uses it to build the clusters for that speci c
con guration. As soon as the server has done with the pre-processing, the
Conguration Sender starts sending the traces to the Replayer Scheduler in charge
of optimizing the distribution of the traces among di erent replayers on di erent
threads. Each replayer sends the trace (and the reference to the speci c con
guration run id) to the server and waits for the results. As soon as the results are
provided by the OS Service, they are visualized in the result interface (Fig. 4).
Each tab of the result interface refers to a speci c con guration run, while the
summary tab reports a summary of all the runs.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Maturity and Inherence</title>
      <p>
        Predictive Monitoring [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is an emerging paradigm based on the continuous
generation of predictions and recommendations on what activities to perform and
what input data values to provide, so that the likelihood to achieve a certain
outcome is maximized. Based on an analysis of execution traces, the idea of
predictive monitoring is to continuously provide the user with estimations of
the likelihood of achieving a certain outcome for a given case. Such predictions
generally depend both on: (i) the sequence of activities executed in a given case;
and (ii) the values of data attributes after each activity execution in a case.
      </p>
      <p>
        We have conducted a set of experiments by using the BPI challenge 2011 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
event log. This log pertains to a healthcare process and, in particular, contains
the executions of a process related to the treatment of patients diagnosed with
cancer in a large Dutch academic hospital. The performed experiments allowed
us to positively answer the following two research questions: (1) \is the
framework e ective in providing accurate results as early as possible?", and (2) \is
the framework e cient in providing results?". In addition, we could conclude
that the solutions provided by the di erent instances of the framework o er the
possibility to meet di erent types of needs, by opportunely setting the available
con guration parameters. For more information about our experimentation of
the tool, the reader is referred to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>1. 3TU Data Center: BPI Challenge 2011 Event Log (</article-title>
          <year>2011</year>
          ), doi:10.4121/uuid:
          <fpage>d9769f3d</fpage>
          -0ab0
          <string-name>
            <surname>-</surname>
          </string-name>
          4fb8
          <string-name>
            <surname>-</surname>
          </string-name>
          803b
          <article-title>-0d1120 cf54</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pesic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Beyond process mining: From the past to present and future</article-title>
          .
          <source>In: Proc. of CAiSE</source>
          . pp.
          <volume>38</volume>
          {
          <issue>52</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Di</given-names>
            <surname>Francescomarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Maggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.M.</given-names>
            ,
            <surname>Teinemaa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            :
            <surname>Clustering-Based Predictive Process Monitoring. ArXiv</surname>
          </string-name>
          e-prints (
          <year>Jun 2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ester</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sander</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>X.:</given-names>
          </string-name>
          <article-title>A density-based algorithm for discovering clusters in large spatial databases with noise</article-title>
          .
          <source>In: Proc. of 2nd International Conference on Knowledge Discovery and</source>
          . pp.
          <volume>226</volume>
          {
          <issue>231</issue>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fraley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raftery</surname>
            ,
            <given-names>A.E.</given-names>
          </string-name>
          :
          <article-title>Enhanced model-based clustering, density estimation, and discriminant analysis software: MCLUST</article-title>
          .
          <source>Journal of Classi cation 20</source>
          ,
          <volume>263</volume>
          {286 (
          <year>September 2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Maggi</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Di</given-names>
            <surname>Francescomarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ghidini</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>Predictive monitoring of business processes</article-title>
          .
          <source>In: Proceedings of CAiSE</source>
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Westergaard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maggi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Modelling and Veri cation of a Protocol for Operational Support using Coloured Petri Nets</article-title>
          .
          <source>In: Proc. of ATPN</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>