<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Leveraging Multi-view Deep Learning for Next Activity Prediction⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vincenzo Pasquadibisceglie</string-name>
          <email>vincenzo.pasquadibisceglie@uniba.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Annalisa Appice</string-name>
          <email>annalisa.appice@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanna Castellano</string-name>
          <email>giovanna.castellano@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donato Malerba</string-name>
          <email>donato.malerba@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Consorzio Interuniversitario Nazionale per l'Informatica - CINI</institution>
          ,
          <country>Italy Tel.:</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Informatics, Universiat` degli Studi di Bari Aldo Moro</institution>
          ,
          <addr-line>via Orabona, 4 - 70125 Bari -</addr-line>
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Predicting the next activity in a running trace is a fundamental problem in business process monitoring since such predictive information may allow analysts to intervene proactively and prevent undesired behaviors. This paper describes a predictive process approach that couples multi-view learning and deep learning, in order to gain accuracy by accounting for the variety of information possibly recorded in event logs. Experiments with benchmark event logs show the accuracy of the proposed approach compared to several recent state-of-the-art methods.</p>
      </abstract>
      <kwd-group>
        <kwd>Predictive process mining • Next actvitiy prediction • Deep</kwd>
        <kwd>Learning • Multi-view Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Nowadays predictive process mining is playing a fundamental role in the business
scenario as it is emerging as an efective means to monitor the execution of any
business running process. In particular, knowing in advance the next activity
of a running process instance may foster an optimal management of resources
and promptly trigger remedial operations to be carried out. Recently, accounting
for the results achieved with deep artificial neural networks, significant interest
has arisen in applying deep learning to analyze event logs and gain accurate
insights into the future activities of the logged processes (e.g. [
        <xref ref-type="bibr" rid="ref1 ref5 ref6 ref8 ref9">1,5,6,8,9</xref>
        ]).
However, the common approach in these studies is to simply consider an event from
the single perspective of the executed activities with their timestamps. Based
on these premises, we have recently proposed a richer representation that takes
into account diferent perspectives for each trace. In particular, in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we have
introduced a process predictive approach called MiDA (Multi vIew Deep learning
based approach for next Activity prediction) for yielding accurate prediction of
the next activity in a running trace.3 MiDA combines multi-view learning with
⋆ Copyright ' 2021 for this paper by its authors. Use permitted under Creative
      </p>
      <p>Commons License Attribution 4.0 International (CC BY 4.0).
3 The source code of the proposed approach is available on the GitHub repository
https://github.com/vinspdb/MiDA
deep learning. Specifically, it resorts to a multi-view input scheme that injects
each characteristic-based view of an event into a deep neural network with Long
Short-Term Memory (LSTM) layers. These layers are able to process the
multiview information by taking into account the sequential nature of event logs in
business processes. In short, the advantage of our proposal is that the
information collected along any process perspective can be, in principle, taken into
account to gain predictive accuracy. Experiments with various benchmark event
logs show the accuracy of the proposed approach compared to several recent
state-of-the-art methods. The paper is organized as follows. Section 2 reports
preliminary concepts, while Section 3 describes the proposed approach. In
Section 4 we describe the experimental setting and discuss the relevant results.
Finally, Section 5 draws conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminary concepts</title>
      <p>The basic assumption is that the event log contains information on activities
executed for specicfi traces of a certain process type, as well as their durations
and any other optional characteristics (e.g. resources, costs). So, an event e is
a complex entity characterized by a set of mandatory characteristics, that are
the activity and its timestamp indicating date and time of occurrence calculated
as the time elapsed from the start of the event. In addition, an event may be
associated with a set of optional characteristics, such as the resource triggering
the activity, the life cycle of the activity or the cost of completing the activity.
An event log is a set of events. Each event in the log is linked to a trace and is
globally unique. A trace σ represents the execution of a process instance. It is a
ifnite sequence of distinct events, such that time is non-decreasing in the trace
(i.e. for 1 ≤ i &lt; j ≤ | σ | : ei.T imestamp ≤ ej .T imestamp with |σ | = length(σ )).</p>
      <p>N
An event log L = {σ i}i=1 is a bag of N traces. By accounting for the
structure of events, traces of an event log can be characterized by diferent views.
A view is a description of the traces along a specific event characteristic
(perspective). Therefore, every event log can be defined on the mandatory views
that are associated with the activities and the timestamps, as well as on
additional views associated with the optional characteristics of events. A prefix trace
σ ik = ⟨e1, e2, ..., ek⟩ is a sub-sequence of a trace starting from the beginning of
the trace. Of course from each trace σ we can derive several prefix traces σ k
with 1 ≤ k = |σ k| ≤ | σ |. Hence, a trace is a complete process instance (started
and ended), while a prefix trace is an instance in execution (running trace).
3</p>
    </sec>
    <sec id="sec-3">
      <title>MiDA</title>
      <p>In MiDA, each prefix trace is represented on every mandatory perspective recorded
in the log (activities and timestamps), as well as on every additional
perspective possibly recorded in the log (e.g. resource, life cycle, cost). In particular,
we consider both categorical attributes (e.g. the activities and the resources)
and numerical attributes (e.g. the timestamp) to represent each event. Hence
given a prefix trace σ ik = ⟨ei1, ei2, ..., eik⟩, each event eij ∈ σ ik is defined by both
categorical and numerical attributes. We indicate by AC the set of categorical
attributes and by AN the set of numerical attributes characterizing an event. A
padding technique is adopted to deal with equal-length prefix traces with length
equal to AV Gl (average trace length).</p>
      <p>Every categorical attribute in AC is converted into a numerical
representation, in order to be processed by a neural network. For each categorical attribute
attl ∈ AC having vocabulary Vl, we define a coding function:</p>
      <p>f : Vl → [0, 1, 2, ..., |Vl|],
that univocally assigns an integer value to each categorical value in the
vocabulary Vl of attribute attl. On the other had, every numerical attribute in AN ,
such as those related to temporal information of the events, does not require any
coding since their real values can be directly processed by the neural network.</p>
      <p>
        However, the structured integer representation for categorical attributes
introduced above is not directly applicable to be processed by a neural network,
due to the continuous nature of neural computation. So, to treat both
integervalued views and real-valued views of traces in a unified manner, we use the
entity embedding method [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to automatically learn a multi-dimensional
realvalued representation of categorical views. Given the integer-valued
representation of a categorical view x = (x1, x2, ..., xAV GL ) we fed it into an extra layer
of linear neurons, called embedding layer, that maps each integer value in xi to
an entity embedding, i.e. a fixed size vector yi ∈ Rd. Hence a 1D integer-valued
vector x (size AV GL) is mapped to a 2D real-valued matrix Y (size d × AV GL).
The matrix Y , called embedding matrix, is jointly learned with the model during
training of the neural network. The size d of an embedding layer is d = [D/2],
where D is the cardinality of the vocabulary V. Then the output of the
embedding layers are concatenated into a single vector that represents a high-level
representation of the multiple views information related to events in traces. This
high-level representation is fed into a recurrent neural network module composed
of two stacked LSTM layers. The first LSTM layer provides a sequence output
to fed the second LSTM layer.
      </p>
      <p>The LSTM approach is used as the core of our deep learning architecture
since it is suitable to process sequences, such as those underlying a business
process event log. The LSTM recurrent module used in our deep architecture
also includes two Batch Normalization layers that are interspersed with the two
LSTM layers, in order to accelerate the learning process. Finally, the output
of the LSTM module is fed into a softmax layer, in order to compute the final
output (i.e. the next activity) from probabilities of diferent classes (activities)
computed using the softmax activation function. The training of the network
is accomplished by the Backpropagation algorithm that has been applied with
early stopping to avoid overfitting. In particular, the training phase is stopped
when there is no improvement of the loss on the validation set for 20
consecutive epochs. To minimize the loss function we use the Nadam optimizer. The
maximum number of epochs was set to 200. The optimization phase of the
hyperparameters (learning rate in [0.00001, 0.01], LSTM unit size among 50, 75 and</p>
      <p>Event log</p>
      <p>BPI12</p>
      <p>Perspectives
activity, timestamp, resource,</p>
      <p>loan amount
activity, timestamp, resource, impact,</p>
      <p>org group, org role, org country,
org involved, product, resource country
activity, timestamp, resource, impact,</p>
      <p>org group, org role, org country,
org involved, product, resource country
activity, timestamp, resource, channel,
department, group, org group, responsible
activity, timestamp, resource, monthly cost,</p>
      <p>
        credit score, first withdrawal amount,
ofered amount, number of terms, action
activity, timestamp, resource, org,
project, task, role
Method
MiDA
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
100, and Batch size in [25, 210]) is conducted by using the 20% of the training set
as validation set and performing optimization with SMAC [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The cross-entropy
loss function is used for optimization.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>
        To provide a compelling evaluation of the efectiveness of our approach, we have
conducted a range of experiments on eighth benchmark event logs4. Table 1
summarizes the characteristics of the considered logs. The main objective of
these experiments is to investigate the performance of MiDA compared to that
of the most recent state-of-the-art deep learning methods that address the task
of predicting the next activity of a running trace. We compare our method to
that of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Table 2 reports the characteristics of the compared
methods. We run the state-of-the-art methods using the sets of hyper-parameters
considered in the reference studies. The source codes of these approaches are
4 The event logs are available on https://data.4tu.nl
Fig. 1: Comparison between MiDA and related methods defined by in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] in terms of Fscore Mean and standard deviation of metrics are reported.
publicly available. Hence, we evaluate all the methods on the same event log
splits. In particular, for each event log, we evaluate the performance of each
compared approach by partitioning the event log in training and testing traces
according to a 3-fold cross validation.
      </p>
      <p>Figure 1 collects the Fscore metric of MiDA and the baselines. These results
provide the empirical evidence that MiDA is systematically more accurate than
the evaluated baselines along Fscore metric.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>
        In this paper, we have illustrated a novel multi-input, deep learning-based,
business process predictive approach recently proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This approach can take
advantage of all the characteristics possibly recorded with events. In particular,
we couple a multi-view learning approach with a deep learning architecture, in
order to gain predictive accuracy from the diversity of data in each view without
sufering from the curse of dimensionality. The experiments performed on several
event logs confirm the efectiveness of the proposed approach.
      </p>
      <p>One limitation of the proposed approach is the lack of prescription and
explanation with predictions. A research direction is that of enriching traditional
business process mining approaches, that are able to discover interpretable
models of processes, with the predictive ability of a deep learning architectures and
take advantage of the interpretability of the model for the prescriptive scope.
Additional directions for further work include the extension of the proposed
approach to deal with the presence of a condition of activity imbalance, i.e.
activities that occur less frequently, in an event log. For example, techniques of
training data augmentation may be explored, in order to achieve the balanced
condition in the learning stage.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The research of Vincenzo Pasquadibisceglie is funded by PON RI 2014-2020
Big Data Analytics for Process Improvement in Organizational Development
CUP H94F18000270006.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Camargo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rojas</surname>
            ,
            <given-names>O.G.</given-names>
          </string-name>
          :
          <article-title>Learning accurate LSTM models of business processes</article-title>
          . In: Hildebrandt, T.T.,
          <string-name>
            <surname>van Dongen</surname>
            ,
            <given-names>B.F.</given-names>
          </string-name>
          , Rog¨linger,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Mendling</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.) International Conference on Business Process Management,
          <string-name>
            <surname>BPM</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>LNCS</article-title>
          , vol.
          <volume>11675</volume>
          , pp.
          <fpage>286</fpage>
          -
          <lpage>302</lpage>
          . Springer (
          <year>2019</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          - 26619-6 19
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Evermann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rehse</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fettke</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Predicting process behaviour using deep learning</article-title>
          .
          <source>Decision Support Systems</source>
          <volume>100</volume>
          ,
          <fpage>129</fpage>
          -
          <lpage>140</lpage>
          (
          <year>2017</year>
          ). https://doi.org/https://doi.org/10.1016/j.dss.
          <year>2017</year>
          .
          <volume>04</volume>
          .003, smart Business Process Management
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berkhahn</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Entity embeddings of categorical variables</article-title>
          .
          <source>CoRR abs/1604</source>
          .06737 (
          <year>2016</year>
          ), http://arxiv.org/abs/1604.06737
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoos</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leyton-Brown</surname>
          </string-name>
          , K.:
          <article-title>Sequential model-based optimization for general algorithm configuration</article-title>
          . In: Coello,
          <string-name>
            <surname>C.A.C.</surname>
          </string-name>
          (ed.)
          <source>Learning and Intelligent Optimization - 5th International Conference, Selected Papers. LNCS</source>
          , vol.
          <volume>6683</volume>
          , pp.
          <fpage>507</fpage>
          -
          <lpage>523</lpage>
          . Springer (
          <year>2011</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -25566-3 40
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Pasquadibisceglie</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Appice</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castellano</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malerba</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Using convolutional neural networks for predictive process analytics</article-title>
          . In: International Conference on Process Mining,
          <string-name>
            <surname>ICPM</surname>
          </string-name>
          <year>2019</year>
          . pp.
          <fpage>129</fpage>
          -
          <lpage>136</lpage>
          (
          <year>2019</year>
          ). https://doi.org/10.1109/ICPM.
          <year>2019</year>
          .00028
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pasquadibisceglie</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Appice</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castellano</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malerba</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Predictive process mining meets computer vision</article-title>
          . In: Fahland,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Ghidini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Dumas</surname>
          </string-name>
          , M. (eds.) Business Process Management Forum,
          <string-name>
            <surname>BPM</surname>
          </string-name>
          <year>2020</year>
          . pp.
          <fpage>176</fpage>
          -
          <lpage>192</lpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pasquadibisceglie</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Appice</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castellano</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malerba</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>A multi-view deep learning approach for predictive business process monitoring</article-title>
          .
          <source>IEEE Transactions on Services Computing</source>
          pp.
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          (
          <year>2021</year>
          ). https://doi.org/10.1109/TSC.
          <year>2021</year>
          .3051771
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Rama-Maneiro</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lama</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Deep learning for predictive business process monitoring: Review and benchmark (</article-title>
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Tax</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verenich</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>La</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Predictive business process monitoring with LSTM neural networks</article-title>
          . In: Dubois,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Pohl</surname>
          </string-name>
          ,
          <string-name>
            <surname>K</surname>
          </string-name>
          . (eds.)
          <source>International Conference on Advanced Information Systems Engineering</source>
          ,
          <string-name>
            <surname>CAISE</surname>
          </string-name>
          <year>2017</year>
          . pp.
          <fpage>477</fpage>
          -
          <lpage>492</lpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>