<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>processpredictR: predictice process monitoring using bupaR</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ivan Esin</string-name>
          <email>ivan.esin@student.uhasselt.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitri Beloshitskiy</string-name>
          <email>dmitri.beloshitskiy@student.uhasselt.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gert Janssenswillen</string-name>
          <email>gert.janssenswillen@uhasselt.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ICPM'23 Demo Track: International Conference on Process Mining</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Predictive Process Analytics, Predictive Process Monitoring</institution>
          ,
          <addr-line>Transformer, bupaR</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>UHasselt - Faculty of Business Economics</institution>
          ,
          <addr-line>Agoralaan, 3590 Diepenbeek</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2088</year>
      </pub-date>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>model configuration. This demo paper introduces processpredictR, a new library for predictive process monitoring in the bupaR-ecosystem. The library provides functionalities with diferent levels of customization, from completely standard of-the-shelf models, to tools for advanced customization of preprocessing and ∗Corresponding author.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        The field of predictive business process monitoring aims to improve the execution of processes
by analyzing event logs and foreseeing future events and process outcome. Over the past
decade, much research has been done in this field, comparing diferent model architectures and
improving predictive accuracy. Implementing these state-of-the-art predictive models is often
done in an ad-hoc way, using generic tools such as TensorFlow and Keras, creating a barrier
to smooth adoption by practitioners. Notable exceptions on this front are Apromore [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
Nirdizati [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In this demo paper, we extend this available of-the-shelf process prediction tool
support with processpredictR.
      </p>
      <p>
        processpredictR is an extension of the bupaR ecosystem for process analysis with R [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
The models provided are based on the Transformer architecture proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The tool is
designed to be used with varying levels of customisation, supporting entirely standardised
models — which can be used to get familiar with predictive process monitoring without the
need for far-fetched configurations — as well as models that can be highly customized, while
still relying on processpredictR for typical steps such as data preparation and evaluation.
nEvelop-O
CEUR
Workshop
Proceedings
split_train_test()
create_model()
compile()
fit()
      </p>
      <p>stack_layer()
predict()
evaluate()</p>
    </sec>
    <sec id="sec-3">
      <title>2. Features</title>
      <sec id="sec-3-1">
        <title>2.1. Main workflow</title>
        <p>In the next paragraphs, we briefly describe the main workflow, as shown in Figure
Prepare examples. The first step is preparing the dataset using the
1.
prepare_examples
function. When preparing the examples for prediction, the user can select one of the five
diferent prediction tasks: next activity, next time, remaining trace, remaining time, and outcome.
The latter one, outcome, can be defined in various ways, e.g. based on the final activity, some
attribute, or some other logical condition. At this point, it is also possible to specify any
additional features that should be used in the prediction, next to the default ones, i.e. the activity
prefix and time features. An example of the preparation step is shown in Listing 1, line 2-4.</p>
        <p>Listing 1: Coding workflow.
l i b r a r y ( p r o c e s s p r e d i c t R )
e x a m p l e s &lt;− p r e p a r e _ e x a m p l e s ( log ,
t a s k = ” outcome ” ,
f e a t u r e s = c ( . . . , . . . ) )
s p l i t &lt;− s p l i t _ t r a i n _ t e s t ( examples , s p l i t = 0 . 7 )
t r a i n &lt;− s p l i t $ t r a i n _ df
t e s t &lt;− s p l i t $ t e s t _ df
model &lt;− c r e a t e _model (
t r a i n , # t r a i n i n g s e t
custom = FALSE , # d e f a u l t a r c h i t e c t u r e
num_ h e a d s = 4 , # number o f a t t e n t i o n h e a d s
o u t p u t _dim_emb = 3 6 , # e m b e d d i n g s o u t p u t d i m e n s i o n s
dim_ f f = 6 4 ) # d i m e n s i o n s o f f e e d − f o r w a r d n e t w o r k
model &lt;− c o m p i l e ( model )
h i s t &lt;− f i t ( model , t r a i n _ data = t r a i n , e p o c h s = 1 0 )
p r e d i c t i o n s &lt;− p r e d i c t ( model , t e s t _ data = t e s t , o u t p u t = ’ append ’ )
c o n f u s i o n _ matrix ( p r e d i c t i o n s )
p l o t ( p r e d i c t i o n s )
e v a l u a t e ( model , t e s t _ data= t e s t )</p>
        <p>The output of this step is a datatable with examples. A case containing  activity instances is
turned into  − 1 examples, each consisting of an activity prefix, the target variable (depending
on the specified task), and any other features as specified in the call.</p>
        <p>Split train test. The preprocessed dataset can be divided into train and test sets using the
split_train_test function, simplifying dataset separation for model training and evaluation.
This function takes the output of prepare_examples() and generates two data frames, as
detailed in Listing 1, line 6-8.</p>
        <p>Note that the event log is split chronologically, ensuring that all train examples precede test
examples. Additionally, the split is based on case proportions rather than individual examples,
preventing the division of the last case between the training and test sets, which would otherwise
lead to observations scattered between both sets.</p>
        <p>
          Create model. The next workflow step involves defining the model’s structure. The
create_model() function considers the task and dataset parameters (e.g. additional features,
output possibilities, case length) and initializes a default transformer architecture (based on
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]). create_model() simplifies layer assembly and predefined dataset-based hyperparameters,
providing an automated solution. Moreover, the model’s complexity can be manually
finetuned by adjusting hyperparameters like attention heads, embedding output dimensions, and
feed-forward network dimensions (Listing 1, line 10-15).
        </p>
        <p>Compile. To configure the training process the model must be compiled, specifying the
optimization process, loss function and evaluation metrics. This is done automatically by the
compile function (see Listing 1 l.17).</p>
        <p>Fit. The training of the model is facilitated with the provided fit method (see Listing 1 line
19). It naturally allows flexibility in selecting hyperparameters (e.g. the number of training
epochs, batch size, and validation split).</p>
        <p>Predict. Having trained the model, the predictions can be made using the provided predict
method. The function can return up to three diferent types of output: raw predicted values as
returned by the generic keras predict function (without argmax applied), an array of predicted
values, or a data frame combining the input data with the predicted values (as illustrated in
Listing 1 line 21). Additionally, for classification tasks, processpredictR allows to quickly
compute and visualize the confusion matrix (Listing 1 line 23-24).</p>
        <p>Evaluate. The evaluate function allows to quickly evaluate the performance of the model
on the test set, returning the value of the loss and accuracy metrics (see Listing 1 line 26).</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Customization</title>
        <p>Aside from the default models as in the main workflow, processpredictR provides
customization options. Users can include extra categorical and/or numerical features alongside examples
and adjust the model’s complexity.</p>
        <p>Additional features. Additional features to be used by the model, beyond examples (activity
sequences) and default features, can be defined when using prepare_examples(). The features
can be either numerical variables or categorical factors. Numerical variables will be automatically
scaled using the min-max normalization. Factors will be automatically converted to hot-encoded
variables.</p>
        <p>
          Model complexity and custom layers. The default complexity settings for the models are
determined by the following parameters which can be adjusted by the user: the number of
attention heads ( _ℎ = 4 ), the output dimensions of the embeddings ( _ _ =
36), and the dimensions of the feed-forward network ( _  = 64 ) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>More advanced flexibility is ofered by allowing the creation of a custom model. In such a
case, processpredictR will provide a partial model, containing only the input layers and the
encoder block of the model, which the user can then complete with generic keras layers using
the provided stack_layer method.</p>
        <p>Advanced customization. The methods described thus far abstract from many preprocessing
and configurations steps. However, one may be interested in custom preprocessing, model
architecture or simply getting a more detailed interpretation and understanding of the model
parameters. Hence, additional auxiliary functions are made available, which together with
generic keras methods ofer even greater flexibility.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Maturity</title>
      <p>
        While the processpredictR library was published only in January 2023, the back-end is based
on the tried and tested functionalities of the keras library [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Furthermore, it is embedded in
the bupaR eco-system [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which has built up a strong user base over the last couple of years.
      </p>
    </sec>
    <sec id="sec-5">
      <title>4. Further materials</title>
      <p>The library is published on CRAN1 and can be installed as a regular R-package. Usage does
require a Python distribution to be installed also. More information on the installation can be
found on docs.bupar.net.2 A more detailed tutorial on the workflow as well as customization
options can be found there as well.3 A four minute screen-cast on processpredictR can be
found here: https://tinyurl.com/processpredictR</p>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions and Future Work</title>
      <p>This demo paper introduced processpredictR, an easy-to-use tool for predictive process
monitoring embedded in the bupaR-ecosystem. Next to default of-the-shelf models, the library
provides multiple avenues to customize predictive models within a specific context. Future
eforts will be focused on improving the default model architectures used, as well as more
advanced visual tools to aid the interpretability of the obtained models and predictions.
1https://cran.r-project.org/package=processpredictR
2https://bupaverse.github.io/docs/install.html#Installing_bupaR
3https://bupaverse.github.io/docs/predict_workflow.html</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>La Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Reijers</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. Van Der Aalst</surname>
            ,
            <given-names>R. M.</given-names>
          </string-name>
          <string-name>
            <surname>Dijkman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mendling</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dumas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>García-Bañuelos</surname>
          </string-name>
          ,
          <article-title>Apromore: An advanced process model repository</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>38</volume>
          (
          <year>2011</year>
          )
          <fpage>7029</fpage>
          -
          <lpage>7040</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Simonetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Francescomarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghidini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kasekamp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          ,
          <article-title>Nirdizati 2.0: New features and redesigned backend</article-title>
          ,
          <source>Proceedings of the Dissertation Award, Doctoral Consortium, and Demonstration Track at BPM</source>
          <year>2019</year>
          2420 (
          <year>2019</year>
          )
          <fpage>154</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Janssenswillen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Depaire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Swennen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jans</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>Vanhoof, bupar: Enabling reproducible business process analysis</article-title>
          ,
          <source>Knowledge-Based Systems</source>
          <volume>163</volume>
          (
          <year>2019</year>
          )
          <fpage>927</fpage>
          -
          <lpage>930</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z. A.</given-names>
            <surname>Bukhsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Dijkman</surname>
          </string-name>
          ,
          <article-title>ProcessTransformer: Predictive Business Process Monitoring with Transformer Network</article-title>
          , arXiv:
          <fpage>2104</fpage>
          .00721 [cs] (
          <year>2021</year>
          ). ArXiv:
          <volume>2104</volume>
          .
          <fpage>00721</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          , Attention Is All You Need, arXiv:
          <fpage>1706</fpage>
          .03762 [cs] (
          <year>2017</year>
          ). ArXiv:
          <volume>1706</volume>
          .
          <fpage>03762</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chollet</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Keras</surname>
          </string-name>
          ,
          <year>2015</year>
          . URL: https://github.com/fchollet/keras.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>