<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Business Process Representation Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter Pfeifer</string-name>
          <email>peter.pfeiffer@dfki.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Business Process Data, Representation Learning, Process Analytics</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>German Research Center for Artificial Intelligence (DFKI)</institution>
          ,
          <addr-line>Campus D3_2, 66123 Saarbrücken</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Saarland Informatics Campus</institution>
          ,
          <addr-line>66123 Saarbrücken</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>34</fpage>
      <lpage>41</lpage>
      <abstract>
        <p>Data stored in information systems gathered from the execution of business processes is a rich source of information. Process mining aims to extract and gain knowledge from such data, usually captured in event logs, in order to understand and improve the business processes. From a data-science perspective, event log data is a very interesting yet complex data modality. It does not only describe the process from the control-flow perspective, but also contains additional information like entities and organizations involved, temporal aspects and much more. While there is a lot of work on applying existing machine learning techniques on event log data for solving a specific problem, little work has focused on how to learn from such data efectively. This work presents the idea of developing representation learning models for event logs, i.e. neural-network-based methods specifically designed for this data modality, which learn generic and rich representations of events and cases. The representations are expected to be used for solving diferent business problems such as process prediction, anomaly detection or other process mining tasks more eficiently and efectively.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Description</title>
      <p>
        Extracting and gaining knowledge from data of business process executions, e.g., provided by
information systems has gained a lot of attention over the past years. In the research field of
process mining, a large variety of methods to analyse such data have been developed where the
standard data format to store such information and make analysis on are event logs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. While
the majority of process mining methods extract hand-crafted features from the information in
the event log, like footprint matrices to describe how activities are in relation to each other,
some use machine learning (ML) methods based on neural networks to make predictions about
the future state of running process instances [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. On other data modalities like images or text,
it has been shown that neural-network-based methods perform equally well as, or outperform,
traditional feature-based methods in image [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or language understanding tasks [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For event
log data, this has mainly been shown for process prediction [
        <xref ref-type="bibr" rid="ref2 ref5 ref6">2, 5, 6</xref>
        ]. Recently, a study showed
that process discovery can be solved with graph neural networks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] reaching comparable
performance to feature-based methods. However, many problems in process mining still rely
on hand-crafted instead of learned features.
CEUR
      </p>
      <p>
        Part of the success of neural-network-based methods on other data modalities like images and
texts is due to their ability to learn and generate a rich representation of the concepts in the data
in form of a feature vector which allows to be used for a variety of tasks. For instance,
neuralnetwork-based language models are trained to fill gaps in sentences and to predict whether
two sentences follow each other which teaches them in learning efective representation of
words and sentences [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] without relying on labels created by domain experts. Afterwards, they
can be fine-tuned on a variety of specific tasks using labeled data. Such approaches belong to
representation learning, a field of machine learning that deals with learning a representation
of data that ”makes it easier to extract useful information when building classifiers or other
predictors” [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. After pre-training using such a self-supervised training method, the same model
can be used to solve various downstream tasks utilizing the learned representation. If the
pre-trained model produces ”good” representations, it is suficient to add task-specific, simple
neural networks to solve certain problems. The idea behind such pre-training is that the features
learned thereby are useful for the downstream tasks. Thus, representation learning aims at
incorporating features in the representation which are useful for solving diferent settings and
tasks, enabling transfer learning, domain adaption and multitask learning [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        This two-stage strategy, which has to lead to great success in machine learning, makes solving
many tasks more efective which is indicated by a high accuracy on various downstream tasks.
One example is BERT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], a transformer-based model that was pre-trained on a very large
corpus of textual data in a self-supervised fashion to generate representations of words and
sentences. The learned representations have been used for various diferent downstream tasks
and set new state-of-the-art results in question answering, sentence classification and other
tasks that require reasoning over words and sentences. For other data modalities, approaches
that adapt these ideas [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ] perform equally well or better than existing methods.
      </p>
      <p>For data describing business processes, there is no comparable approach that can produce
representations efective for solving diferent tasks. As the modality of event data is diferent to
text or images, pre-trained models like BERT cannot directly be applied. While textual data
is often described as a single sequence of words, a case consists of events with attributes that
describe what action has been performed at what time (which is not necessarily a sequence).
Additional attributes can be added, e.g., who performed the activity or what objects have been
involved. Events can have diferent numbers of attributes which can be of diferent types, e.g.,
categorical, numerical and temporal and have diferent scales. Furthermore, some attributes
change within each event while others are fixed for the whole case. These characteristics make
it challenging to learn representations from event log data with neural networks. Furthermore,
we argue that it is dificult to process them with existing network architectures like LSTMs or
BERT as they do not fully adapt to and account for the characteristics of that data.</p>
      <p>
        In order to enable ML-based analysis on event log data on a large scale, more research is
required how to learn rich representations of concepts describing business processes. For
instance, to make such networks aware of what event logs consist of by learning the concepts
of events, cases and processes. Existing neural-network-based approaches for event logs
either focus on solving one task, like next step prediction [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or anomaly detection [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], or
learn representations using a subset of the information available in the event log. For instance,
representations learning approaches of cases [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ] or events [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] include categorical attributes,
but no numerical or temporal information. This makes such approaches less generic, as they lack
to include all relevant information found in the event log and only work for certain tasks. Thus,
the learned representation of such approaches are task-specific. Findings ways to learn a generic
representation is one of the main objectives of representation learning which could, applied
to event logs, allow to use such methods for more tasks. Some work [
        <xref ref-type="bibr" rid="ref14 ref16 ref17">14, 16, 17</xref>
        ] indicates that
approaches being able to learn rich representations of cases or events containing and combining
information from multiple event or case perspectives could be helpful to solve certain tasks on
such data. However, this is hard to archive using hand-crafted features.
      </p>
      <p>This PhD project is about developing methods that can learn rich representations of diferent
concepts in data gathered from the execution of business processes to be used for solving
analytical tasks more efectively. Thus, the research problem is to develop specialized neural
network architecture and training objective which can be used for learning from such data as
well as applications and assessments that demonstrate and measure the efectiveness of the
representations. For instance, how to adapt and adjust successful training procedures from
other modalities to work on event log data as well as encoding approaches which preserve the
structure and semantic of the data. Furthermore, we want to investigate what characteristics
the representations should have to be efective.</p>
      <p>
        In general, a good representation is one that makes solving subsequent tasks easier [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Some
general-purpose priors exists which are not task specific and can be followed when developing
representation learning methods [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. However, as the representations in this work are expected
to be efective for solving process mining tasks, they should contain characteristics that support
problem solving on event data. The following characteristics reflect desirable characteristics
from our current point of view but a more systematic approach of collecting and aligning them
with process mining tasks is planned as part of the project. First, representations should be
efective for solving diferent tasks and trainable with small quantities of manually labeled data.
Furthermore, the approach should be able to learn representations for events as well as for
cases which requires a hierarchical perception of the data. Event representations are expected
to contain feature describing the semantic of the activity and all its attributes, its position in
the case as well as its context, i.e., the events nearby. Similarly, representations of cases should
aggregate the semantics of all event as well as behavioural characteristics of the case itself.
Another desired property is that it allows domain adaption, i.e., that representations can also be
created for event logs where the model was not trained on. However, while textual data and
the features describing it do not change too much when considering one language, diferent
processes might have very diferent behaviour that makes it challenging to transfer certain
features learned from one set of event logs to another. Thus, we want to investigate which
features are transferable or how to transfer them with little efort.
      </p>
      <p>Certain process mining task to solve ultimately might require an additional training phase
as the generic characteristics learned are not suficient or the features required are specific
to the event log. Thus, domain knowledge (labeled data) or a diferent training objective is
needed. For instance, it could be helpful for some tasks to have the case representation contain
information which cases fit the underlying process and which not, i.e., for case classification
tasks. Such features are dificult to learn generically and might require an additional training
phase or labeled data on the event log of interest.</p>
      <p>
        What makes representation learning diferent to other machine learning tasks is that no
objective or target function exists which can be directly optimized during training to obtain
the desired characteristics. Rather, training methods have to be developed that ”shape” the
representations in the desired form. Nevertheless, pre-training to learn certain generic
characteristics has shown to be beneficial on other modalities [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] which makes investigating this idea
for event data interesting.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology and Techniques</title>
      <p>
        The research project follows the design science methodology [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] including exploratory phases
within its cycles that steer the development of a representation learning approach for business
process data. The problem and opportunity identification, i.e., to design a representation
learning approach for event log data that makes event log analysis more eficient, is part of
the relevance cycle. Furthermore, acceptance criteria must be defined. For investigating which
characteristics the representations should have to be efective, we follow the general priors
for good representations [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and systematically collect tasks and problems the approach will
be applied on. From these observations we will to derive features that are necessary to be
contained in the representation for process mining tasks. As discussed earlier, it is very hard
to directly measure whether the representations contain the desired characteristics. Instead,
representation learning approaches are usually evaluated on a variety of diferent tasks. If
they are able to solve the tasks accurately, the representations are supposed to be efective.
We follow the same evaluation procedure by assessing whether the learned event and case
representations can be used to solve diferent process mining tasks, like process prediction or
anomaly detection with similar or better performance to existing algorithms measured in terms
of established metrics like precision, recall and accuracy. Furthermore, we try to assess how the
learned representations can be utilized for tasks not considered so far.
      </p>
      <p>State-of-the-art machine learning techniques will be used, i.e., neural network architectures
and training methods for learning representations as well as knowledge from the process mining
ifeld. Experience from both domains is combined to develop new methods to process and learn
from event log data. Thereby, additions to the knowledge base in both domains are made, i.e. in
learning (hierarchical) representations of complex data modalities as well as how to learn and
use such representations to solve process mining tasks.</p>
      <p>
        In the design cycle, new representation learning approaches are being developed in an iterative
and exploratory fashion. Knowledge from the corresponding domains is used to enhance the
approach and to test its efectiveness on various tasks. Each representation learning method
consists of a neural network architecture and a self-supervised pre-training phase. Both need to
be combined appropriately to create valuable feature vectors. Techniques applied throughout
the design are for instance network architectures like transformer models [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], specifically,
customized transformer architectures for modalities like time series [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] which are further
extended to appropriately capture the concepts in event logs. For training, self-supervised
learning techniques similar to BERT are adopted, e.g., reconstructing missing parts in the input
or predicting characteristics of higher level concepts in the data. They will be combined with
process analytic knowledge to find efective learning objectives that fit business process data.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Solutions and Results</title>
      <p>
        Results archived so far in this PhD project include the Multi-Perspective Process Network
(MPPN) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] which learns representations of cases. With respect to the network architecture
and training method, an image encoding approach for time-series was applied that allows to
process categorical, numerical and temporal information in the event log in the same way
using a self-supervised pre-training phase and an architecture based on convolutional neural
networks. Instead of training embeddings for diferent attributes in the event log each time, we
transform all perspectives into distinct image representations and use pre-trained convolutional
neural networks to extract features describing the perspective. We first pre-train MPPN on
the next event prediction task by predicting several attributes values of the next event at once.
Thereby, it learns the general characteristics of the event log, i.e., how activities and attributes
influence each other, before being applied on a certain task.
      </p>
      <p>
        It has been demonstrated [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] that the learned representations are efective for solving
diferent process prediction tasks and, without additional training, suitable for case retrieval,
i.e., retrieving contextual similar cases to one case of interest. Nevertheless, MPPN has some
shortcomings. The main disadvantages are that a lot of semantic information is lost when
transforming perspectives to 2D images and that it does not learn representations for events.
Loosing the semantic information, e.g., activity and attribute names and values makes the
approach less generic as it requires to train one MPPN per event log which prevents domain
adaption.
      </p>
      <p>In the next iteration, a new approach following the two-stage scheme illustrated in Figure 1
is being developed that overcomes the issues of MPPN. Again the model will be pre-trained
on a self-supervised task to generate representations which can afterwards be utilized to solve
diferent problems. The new approach learns representations of events and cases simultaneously,
compromising as much semantic information found in the event log as possible using a new
architecture that is more eficient and flexible.</p>
      <p>
        In the pre-training phase, the objective is to train the model N to recognize and interpret
the diferent concepts in the data and how they relate to each other by predicting diferent
characteristics of business processes on event- and case-level to learn important features. This
involves, e.g., to reconstruct missing event values in order to learn features on event-level
and predicting case or process characteristics to learn higher-level features. By including and
encoding as much semantic information from the event log as possible, such as attribute and
activity names, the representations should be rich in information and the approach generic.
This is achieved by splitting each event into distinct tokens where each token carries a certain
part of the information [
        <xref ref-type="bibr" rid="ref10 ref19">19, 10</xref>
        ] which enables a more flexible and generic way of processing
and learning from event data. For instance, each activity in each event is encoded as a token
by contextualising the activities name and combining that with the contextualized attribute
name. The same is done for other for attributes that contain semantic information. Depending
on the attribute type, appropriate encoding techniques are used to transform the information
into tokens which serve as input to the neural network. Similar to the positional encoding used
in BERT [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], a ”event encoding” it added that indicates which tokens belong to the same event.
Thus, each token contains information what attribute is represents, which value/content the
attribute has as well as to which event it belongs. Using this token-based encoding approach,
which embeds information how event log data is structured into the input given to  allows that
data from the event log can be processed generically and that  can interpret the information
as intended by the data structure. Additionally, special tokens like the  -Token in BERT are
added which can be used to solve classification tasks on cases or events.
      </p>
      <p>
        Splitting and encoding the data into tokens allows to process them with transformer-based
architectures [
        <xref ref-type="bibr" rid="ref10 ref19">19, 10</xref>
        ]. After pre-training  on a large set of synthetic and real-world business
processes and event log it will be fine-tuned or directly applied on diferent tasks to measure
its efectiveness. For some tasks, no fine-tuning using labeled data for the target task may
be required. For instance unsupervised ones such as anomaly detection [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] or
clusteringbased tasks like behaviour mining [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] or event abstraction [
        <xref ref-type="bibr" rid="ref16 ref21">21, 16</xref>
        ]. For other tasks, the
representations need to be fine-tuned, e.g., for process prediction as demonstrated in previous
work [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. For other tasks, additional labeled data might be required indicated by the dotted
arrow in Figure 1.
      </p>
      <p>In order to demonstrate that the representations are helpful, the performance will be compared
to existing techniques using common datasets and standardised evaluation procedures. For
example next-step and outcome prediction tasks on the BPIC event logs or classifying traces into
iftting and unfitting ones using the data provided by the process discovery challenge (PDC) 1. A
dedicated representation learning benchmark is also planned which combines diferent process
mining tasks and datasets to test and compare approaches on.
1https://www.tf-pm.org/competitions-awards/discovery-contest</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>Designing customized neural-network-based architectures and training methods for event logs
that are generic and learn to interpret the modality, i.e., learn representations of events and cases
including their semantics is novel. The idea is diferent from existing approaches that apply
network architectures developed for other data modalities like LSTM and training strategies
like next step prediction on event log data in the sense that we aim for approaches that learn
the concepts in the data and can be used for various tasks. By separating the training from
the application phase, representations can be learned and applied for solving diferent tasks.
This is supposed to be more efective as designing diferent approaches for diferent problems.
Furthermore, having a rich representation of events and cases makes problem solving easier as
a simple classifier with a few samples can be suficient for solving a specific task.</p>
      <p>Using a feature vector representation also brings some limitations as the feature vectors
produced by N are not as explainable and understandable by humans as hand-crafted features.
Furthermore, pre-training a representation learning model requires a lot of data and careful
parameter optimization. Not all tasks in all domains benefit from using one representation
which might also apply to this project. However, once the model is pre-trained, it can be used
on diferent datasets and easily fine-tuned to various tasks.</p>
      <p>The results archived so far indicate that representations can work for diferent predictive tasks
as well as for retrieval. In the future, expect to get insights how the architecture of the neural
network, encoding methods and training objective have to be designed for learning efective
representations of event logs and which features are useful for solving process mining tasks. As
the data modality of event logs is challenging, learning representations makes it a interesting
problem not only from a process mining but also from the machine learning perspective. This
could enable to solve other process mining tasks, that yet rely on hand-crafted feature more
efectively, enabling ML-based analysis in process mining on a larger scale.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <source>Process Mining: A 360 Degree Overview</source>
          , Springer International Publishing, Cham,
          <year>2022</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>34</lpage>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 0 3 1 - 0 8 8 4 8 - 3</volume>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Evermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Rehse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fettke</surname>
          </string-name>
          ,
          <article-title>Predicting process behaviour using deep learning</article-title>
          ,
          <source>Decision Support Systems</source>
          <volume>100</volume>
          (
          <year>2017</year>
          )
          <fpage>129</fpage>
          -
          <lpage>140</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>25</volume>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: NAACL-HLT</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tax</surname>
          </string-name>
          , I. Teinemaa,
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          ,
          <article-title>An interdisciplinary comparison of sequence modeling methods for next-element prediction</article-title>
          ,
          <source>Software and Systems Modeling</source>
          <volume>19</volume>
          (
          <year>2020</year>
          )
          <fpage>1345</fpage>
          -
          <lpage>1365</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Kratsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Manderscheid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Röglinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Seyfried</surname>
          </string-name>
          ,
          <article-title>Machine learning in business process monitoring: A comparison of deep learning and classical approaches used for outcome prediction</article-title>
          ,
          <source>Business &amp; Information Systems Engineering</source>
          <volume>63</volume>
          (
          <year>2021</year>
          )
          <fpage>261</fpage>
          -
          <lpage>276</lpage>
          .
          <source>doi:1 0 . 1 0 0 7 / s 1 2</source>
          <volume>5 9 9 - 0 2 0 - 0 0 6 4 5 - 0</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sommers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Menkovski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <article-title>Process discovery using graph neural networks</article-title>
          ,
          <source>in: 3rd International Conference on Process Mining (ICPM)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>40</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <article-title>Representation learning: A review and new perspectives</article-title>
          ,
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          (
          <year>2013</year>
          )
          <fpage>1798</fpage>
          -
          <lpage>1828</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          , Deep Learning, MIT Press,
          <year>2016</year>
          . URL: http://www. deeplearningbook.org.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Zerveas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jayaraman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhamidipaty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Eickhof</surname>
          </string-name>
          ,
          <article-title>A transformer-based framework for multivariate time series representation learning</article-title>
          ,
          <source>in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &amp; Data Mining, Association for Computing Machinery</source>
          ,
          <year>2021</year>
          , p.
          <fpage>2114</fpage>
          -
          <lpage>2124</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 3 4 4 7 5 4 8 . 3 4 6 7 4 0 1 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jaegle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gimeno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carreira</surname>
          </string-name>
          , Perceiver:
          <article-title>General perception with iterative attention</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>4651</fpage>
          -
          <lpage>4664</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Nolle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seeliger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mühlhäuser</surname>
          </string-name>
          , Binet:
          <article-title>Multivariate business process anomaly detection using deep learning</article-title>
          ,
          <source>in: Business Process Management</source>
          , Springer International Publishing,
          <year>2018</year>
          , pp.
          <fpage>271</fpage>
          -
          <lpage>287</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Luettgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seeliger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nolle</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Mühlhäuser, Case2vec: Advances in representation learning for business processes</article-title>
          ,
          <source>Process Mining Workshops, ICPM 2020</source>
          , Springer International Publishing,
          <year>2020</year>
          , pp.
          <fpage>162</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Seeliger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Luettgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nolle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mühlhäuser</surname>
          </string-name>
          ,
          <article-title>Learning of process representations using recurrent neural networks</article-title>
          ,
          <source>in: International Conference on Advanced Information Systems Engineering</source>
          , Springer International Publishing,
          <year>2021</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>P. De Koninck</surname>
          </string-name>
          , S. vanden Broucke, J. De Weerdt,
          <year>act2vec</year>
          ,
          <year>trace2vec</year>
          , log2vec, and model2vec:
          <article-title>Representation learning for business processes</article-title>
          ,
          <source>in: Business Process Management</source>
          , Springer International Publishing,
          <year>2018</year>
          , pp.
          <fpage>305</fpage>
          -
          <lpage>321</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rebmann</surname>
          </string-name>
          ,
          <article-title>Abstracting low-level event data for meaningful process analysis</article-title>
          ,
          <source>in: Proceedings of the Demonstration &amp; Resources Track</source>
          ,
          <source>Best BPM Dissertation Award, and Doctoral Consortium at BPM</source>
          <year>2021</year>
          co
          <article-title>-located with the 19th</article-title>
          <source>International Conference on Business Process Management</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Abb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bormann</surname>
          </string-name>
          , H. van der Aa,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Rehse</surname>
          </string-name>
          ,
          <article-title>Trace clustering for user behavior mining</article-title>
          ,
          <source>in: 30th European Conference on Information Systems (ECIS</source>
          <year>2022</year>
          ),
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Hevner</surname>
          </string-name>
          ,
          <article-title>A three cycle view of design science research</article-title>
          ,
          <source>Scandinavian journal of information systems 19</source>
          (
          <year>2007</year>
          )
          <article-title>4</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Advances in neural information processing systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pfeifer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lahann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fettke</surname>
          </string-name>
          ,
          <article-title>Multivariate business process representation learning utilizing gramian angular fields and convolutional neural networks</article-title>
          ,
          <source>Business Process Management</source>
          , Springer International Publishing,
          <year>2021</year>
          , pp.
          <fpage>327</fpage>
          -
          <lpage>344</lpage>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 0 3 0 - 8 5 4 6 9 - 0</volume>
          \ _ 2
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mannhardt</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. de Leoni</surname>
          </string-name>
          , A. Koschmider,
          <article-title>Event abstraction in process mining: literature review and taxonomy</article-title>
          ,
          <source>Granular Computing</source>
          <volume>6</volume>
          (
          <year>2021</year>
          )
          <fpage>719</fpage>
          -
          <lpage>736</lpage>
          .
          <source>doi:1 0 . 1 0 0 7 / s 4 1</source>
          <volume>0 6 6 - 0 2 0 - 0 0 2 2 6 - 2</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>