=Paper= {{Paper |id=Vol-2270/short1 |storemode=property |title=Prediction of Business Process Instances with Dynamic Bayesian Networks |pdfUrl=https://ceur-ws.org/Vol-2270/short1.pdf |volume=Vol-2270 |authors=Jens Brunk,Kate Revoredo,Matthias Stierle,Martin Matzner,Patrick Delfmann,Jörg Becker |dblpUrl=https://dblp.org/rec/conf/simpda/BrunkRSMD018 }} ==Prediction of Business Process Instances with Dynamic Bayesian Networks== https://ceur-ws.org/Vol-2270/short1.pdf
    Prediction of Business Process Instances with
             Dynamic Bayesian Networks

    Jens Brunk1 , Kate Revoredo2 , Matthias Stierle3 , Martin Matzner3 , Patrick
                          Delfmann4 , and Jörg Becker1
1
    European Research Center for Information Systems (ERCIS), University of Münster,
                      {jens.brunk,becker}@ercis.uni-muenster.de
          2
             Federal University of Rio de Janeiro katerevoredo@ppgi.ufrj.br
     3
        Chair of Digital Industrial Service Systems, Friedrich-Alexander Universität
           Nürnberg-Erlangen, {matthias.stierle,martin.matzner}@fau.de
      4
        Institute for Information Systems Research, University of Koblenz-Landau,
                                delfmann@uni-koblenz.de



        Abstract. Predicting undesirable events during a process instance exe-
        cution provides the process participants with an opportunity to intervene
        and keep the process aligned with its goals. Few approaches for tackling
        this challenge consider a multi-perspective view, where the flow perspec-
        tive of the process is combined with its surrounding context. In particular,
        the dynamism of this context over time has been ignored in most predic-
        tion approaches. In this paper we tackle this issue by leveraging previous
        work on probabilistic finite automata to develop a Dynamic Bayesian
        Network (DBN). The DBN includes different kinds of context information
        in customizable process models and then predicts the next event of a
        process instance. The initial results reveal two major challenges: choosing
        the optimal DBN structure and using meaningful contextual information.


1     Introduction
Monitoring a business process instance or rather anticipating an unexpected event,
provides an opportunity for decision-makers to evaluate the best way to overcome
obstacles and to continuously achieve the process goals. The task of learning a
process model that is also able to predict future behavior of a process instance is
gaining increased attention [4, 11]. In [2], an approach, henceforth called RegPFA,
was proposed, where a probabilistic predictive model is learned through process log
data and used to predict future events given an observed sequence of events. This
probabilistic model also represents the process model, thus RegPFA is a process-
aware approach The evaluation of the method showed that an improvement in
sensitivity is necessary, i.e., an improvement on the rate of correct predictions of
a next event. The hypothesis we investigate in this paper is that solely looking
at the control-flow data is not enough towards learning an accurate model. For
example, when predicting whether the next event in a procurement process after
sending out the order is the goods receipt or cancellation of the order. Then
we can expect that the prediction is highly influenced by whether the ordered




                                            50
goods were cheap or expensive and/or which vendor we ordered from. The data
attributes related to an activity during process execution are called contextual
elements [18]. They define the context in which the ongoing instance is running.
Thus, a certain event may occur conditioned on the current state of the process
and a set of contextual elements. For example, if in a procurement process an
order is currently in the goods and invoice received state and the order value is
below 500, then the next event will be the payment of the invoice (whereas if the
goods were not received yet or the invoice has a high value, another event would
occur). Furthermore, the contextual elements in this set may have their values
determined by the current state (e.g. if the state is goods received the order is not
blocked for payment) and additionally they can, together with the current event,
determine the next state of a process (e.g. if the number of goods received are less
than expected, the next state could be Quality Check ). A model that represents
these dependencies, as well as uncertainty in terms of changing context, may be
more adequate for prediction.
    In this work, we investigate the benefits of considering dynamic contextual
information to process-aware prediction. More specifically, we extend the RegPFA
approach towards considering contextual information that can dynamically change
over time. i.e. throughout process execution. Therefore, we propose a technique
for learning a Dynamic Bayesian Network (DBN). We investigate the hypothesis
that the combination of historical events and dynamically changing contextual
information improves the prediction of unknown future events.
    Following the Design Science Research (DSR) paradigm defined by Hevner [8]
we build two artifacts which is, on the one hand, a theoretical model of the DBN
as well as its instantiation. Our method is guided by the DSR process proposed
by Peffers et al. [15] of which we finished the first iteration of build, demonstrate
and evaluate leading us to the challenges of how to choose the optimal network
structure and how to identify meaningful contextual information.


2    Related Work

In this section, we describe works that perform prediction in the presence of
contextual data and thus are related to our proposal. These works were evaluated
from two perspectives: process-aware [11] and attribute-dynamic-aware. In the
former, an approach is considered process-aware if the process model is used as
input for the prediction. In the latter, an approach is attribute-dynamic-aware if
the change over time of the attribute is considered for the prediction.
   Fifteen related works were found and we could observe that none of them
considered the process model during prediction. On regards of attribute-dynamic-
aware the works [20, 22, 9, 12, 10, 5] considered the dynamism of the attributes for
the prediction while the works [19, 21, 1, 7, 3, 14, 17, 16, 6] did not.
   The limitation that we address in this paper is that the minority of works
consider the dynamism of the attributes and that they generally neglect the
process model for prediction.




                                        51
3     Proposal

Our proposal follows the general idea behind the RegPFA [2], which is based on
probabilistic finite automata (PFA). They represent the model through hidden
and observed nodes for which parametrization is key to modeling a process and
execute predictions for it. Figure 1 (left part) depicts a PFA, where z0 and
z1 represent the hidden state variables and x0 ...xn the activity sequence event.
Since apart from the events (observed nodes) our approach will also model the
contextual information, we will make use of Dynamic Bayesian Networks (DBN)
[13]. In our case, we consider two-time-slice DBNs. This means that the visual
representation consists of an initial first time slice and a second slice, which is
repeated indefinitely to model the steps of the time dimension. Although there
are only two time slices, the value of the variables change over time and in this
way the dynamism of the context is modeled.




                      Fig. 1. PFA and DBN Model Structure
    In Figure 1 (right side), nodes cA0 ...cN1 represent contextual elements, which
are dependent on the hidden state and influence both, the event and the next
state. Similarly, more context variables and many different DBN structures can
be imagined.
    The major challenge of log data based approaches is learning and running
inference on the created models. As the idea behind using DBNs is to generalize
and be able to construct many different structures, it is important that the
available learning and inference algorithms are able to support this. To date,
there exist various frameworks for probabilistic graphical models 5 and each of
them implements a different set of inference algorithms. These algorithms can
generally be distinguished by doing exact/approximate inference, the support
of different node types (e.g. discrete and/or continue) and the type of topology
that they support for the DBN. To our knowledge, only the Bayes Net Toolbox
includes an inference algorithm, which supports any topology, different node types
and that can be used for inference as well as prediction. For this reason it was our
choice for implementing our solution. Our implementation supports the creation
5
    AMIDST Toolbox:        https://github.com/amidst/toolbox; Graphical Mod-
    els Toolkit: https://melodi.ee.washington.edu/gmtk/; Bayes Net Toolbox:
    https://github.com/bayesnet/bnt




                                        52
of HMM, PFA and freely customizable DBNs and the prediction of the next event
of process instances.


4     Evaluation and Discussion

To assure the correctness of the implementation we benchmark our PFA imple-
mentation against the RegPFA with the dataset of the BPI challenge 20126 and
yield similar results. The slightly better performance of RegPFA is expected, as we
implemented only a basic version, especially in terms of optimizing input parame-
ters. As initial evaluation of our context-aware prediction approach we considered
the resource column of the BPI2012 dataset as a single context attribute. The
results displayed in Table 1 show the impact of considering contextual information
and its dynamic over time. The preliminary results indicate that our approach is
underperforming for all three performance measurements and it is far from the
results yielded by the PFA/RegPFA.
                      Table 1. Predictor Performance Measures

              Event log Predictor Accuracy Sensitivity Specificity
                          RegPFA       0,801      0,723        0,980
              BPI2012 A    PFA        0,6981      0,6997      0,6986
                           DBN        0,2083      0,1594      0,2074

    We can identify two major reasons for the bad results. First the quality or
rather suitability of the context data. Looking at the content of the resource
column that was used, we have to notice that it is hardly correlated to the process
flow. Thus, choosing the context is a challenge. Second the network structure that
we chose (see Figure 1). Currently, the contextual elements are both influencing
the observation as well as the future state and they themselves are influenced
by the current state. Especially the latter seems to introduce a high degree of
uncertainty into the model as the context in t + 1 is unknown when predicting
the observation in t + 1.
    Given these observations, we plan to continue our research by running further
evaluations with various datasets and different combinations of context attributes
as well as by varying the network structure.


5     Conclusion and Future Research

We implemented a DBN based business process prediction approach that is based
on the Bayes Net Toolbox for MATLAB. Our approach contributes to the body of
knowledge by introducing a process-aware business process prediction approach
that can handle dynamic context attributes and is based on probabilistic graphical
models. It enables and calls for future research on which types of dynamic context
6
    https://data.4tu.nl/repository/uuid:3926db30-f712-4394-aebc-75976070e91f




                                         53
attributes can be used for business process instance predictions and which struc-
tures work best. This is also the major limitation of our work, as we developed
and initially evaluated the approach, but in-depth discussions and studies, as well
as suitable data logs, are needed to improve the accuracy of our DBN predictions,
until they outperform e.g. the RegPFA.

   Acknowledgments: This work is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation program under
the Marie Skodowska-Curie Grant agreement No 645751.

References
 1. Borkowski, M., Fdhila, W., Nardelli, M., Rinderle-Ma, S., Schulte, S.: Event-based failure pre-
    diction in distributed business processes. Information Systems (2017)
 2. Breuker, D., Matzner, M., Delfmann, P., Becker, J.: Comprehensible predictive models for busi-
    ness processes. MIS Quarterly 40(4), 1009–1034 (2016)
 3. Di Francescomarino, C., Dumas, M., Maggi, F.M., Teinemaa, I.: Clustering-based predictive
    process monitoring. IEEE Transactions on Services Computing (2016)
 4. Di Francescomarino, C., Ghidini, C., Maggi, F.M., Milani, F.: Predictive process monitoring
    methods: Which one suits me best? In: BPM. pp. 462–479 (2018)
 5. Di Francescomarino, C., Ghidini, C., Maggi, F.M., Petrucci, G., Yeshchenko, A.: An eye into
    the future: Leveraging a-priori knowledge in predictive business process monitoring. In: BPM.
    pp. 252–268. Springer, Cham (2017)
 6. Folino, F., Guarascio, M., Pontieri, L.: Discovering context-aware models for predicting business
    process performances. In: OTM. pp. 287–304. Springer (2012)
 7. Frey, M., Emrich, A., Fettke, P., Loos, P.: Event entry time prediction in financial business
    processes using machine learning: A use case from loan applications. In: HICSS. pp. 1386–1394.
    IEEE Computer Society (2018)
 8. Hevner, A.R.: Design science in information systems research. MIS Quarterly 28(1), 75–105
    (2004)
 9. Leontjeva, A., Conforti, R., Di Francescomarino, C., Dumas, M., Maggi, F.M.: Complex Sym-
    bolic Sequence Encodings for Predictive Monitoring of Business Processes. pp. 297–313. Springer,
    Cham (2015)
10. Maggi, F.M., Di Francescomarino, C., Dumas, M., Ghidini, C.: Predictive monitoring of business
    processes. In: CAISE. pp. 457–472. Springer (2014)
11. Marquez-Chamorro, A.E., Resinas, M., Ruiz-Cortes, A.: Predictive monitoring of business pro-
    cesses: a survey. IEEE Transactions on Services Computing (2017)
12. Marquez-Chamorro, A.E., Resinas, M., Ruiz-Corts, A., Toro, M.: Run-time prediction of busi-
    ness process indicators using evolutionary decision rules. Expert Syst. Appl. 87(C), 1–14 (2017)
13. Murphy, K.P., Russell, S.: Dynamic bayesian networks: representation, inference and learning
    (2002)
14. Navarin, N., Vincenzi, B., Polato, M., Sperduti, A.: LSTM networks for data-aware remaining
    time prediction of business process instances. CoRR (2017)
15. Peffers, K., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A Design Science Research
    Methodology for Information Systems Research. Journal of Management Information Systems
    24(3), 45–77 (2007)
16. Polato, M., Sperduti, A., Burattin, A., de Leoni, M.: Data-aware remaining time prediction of
    business process instances. In: In IJCNN. pp. 816–823 (2014)
17. Polato, M., Sperduti, A., Burattin, A., de Leoni, M.: Time and activity sequence prediction of
    business process instances. CoRR abs/1602.07566 (2016)
18. Rosemann, M., Recker, J.C., Flender, C., Ansell, P.D.: Understanding context-awareness in
    business process design (2006)
19. Senderovich, A., Francescomarino, C.D., Ghidini, C., Jorbina, K., Maggi, F.M.: Intra and inter-
    case features in predictive process monitoring: A tale of two dimensions. In: BPM. vol. 10445,
    pp. 306–323. Springer (2017)
20. Teinemaa, I., Dumas, M., Maggi, F.M., Francescomarino, C.D.: Predictive business process
    monitoring with structured and unstructured data. In: BPM. vol. 9850, pp. 401–417. Springer
    (2016)
21. Unuvar, M., Lakshmanan, G.T., Doganata, Y.N.: Leveraging path information to generate pre-
    dictions for parallel business processes. Knowledge and Information Systems 47(2), 433–461
    (2016)
22. Verenich, I., Dumas, M., Rosa, M.L., Maggi, F.M., Francescomarino, C.D.: Complex symbolic se-
    quence clustering and multiple classifiers for predictive process monitoring. In: BPM Workshops.
    vol. 256, pp. 218–229. Springer (2015)




                                                54