Abstracting Low-level Event Data for Meaningful
Process Analysis
Adrian Rebmann1
1
    Data and Web Science Group, University of Mannheim, Mannheim, Germany


                                         Abstract
                                         Most process mining techniques assume the event data to be on the right level of detail for analysis, i.e.
                                         the level of activities. Low-level recording and a high flexibility of the underlying process lead to results
                                         of limited value, when applying process discovery techniques, as output models are overly complex. In
                                         this PhD project, we aim to develop approaches for event abstraction that balance between reducing the
                                         complexity of process mining results, while ensuring meaningful analyses by incorporating previously
                                         disregarded perspectives. Specifically, we will focus on three research streams, developing solutions that
                                         consider the meaning of events, take the purpose of analyses into account, and address online settings.
                                         In this paper, we outline the current state and future plans for each of these research streams.

                                         Keywords
                                         Process analysis, Event abstraction, Semantic labeling, Purpose-driven abstraction, Stream-based pro-
                                         cess mining


1. Introduction
Process mining enables the analysis of processes based on sequences of events recorded by
information systems. This leads to actionable insights into how a process is really executed.
The majority of the available process mining techniques still makes the assumption that event
data are captured on the level of activities relevant for analysis.
   However, this is often not the case, e.g., when events are recorded on the level of clicks in a
system. Besides fine-granular event recording, also mixed-granular recording and an inherent
flexibility of the underlying process leads to high variability in the event sequences [1]. Applying
process discovery techniques to such data can lead to results that are hard to interpret due
to a high complexity of the output models. To tackle this issue, event abstraction approaches
have been proposed recently [2, 1]. While such approaches incorporate different perspectives,
none of them actually considers the meaning of events, e.g., by taking into account the specific
actions applied to certain objects in a process instance. This can be problematic, since the
meaning can be crucial for the decision whether or not to group events. For instance, consider
two events “check insurance” and “draw blood sample” that always occur close to each other in
the traces of an event log. If only the control flow is considered, these are prime candidates

Proceedings of the Demonstration & Resources Track, Best BPM Dissertation Award, and Doctoral Consortium at BPM
2021 co-located with the 19th International Conference on Business Process Management, BPM 2021, Rome, Italy,
September 6-10, 2021
" rebmann@informatik.uni-mannheim.de (A. Rebmann)
 0000-0001-7009-4637 (A. Rebmann)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
to be grouped. However, from a semantic perspective grouping them into the same high-level
activity is undesirable as they are clearly different activities. A further problem is that users
of current event abstraction approaches have no control over the characteristics of the output
event log. This is necessary to account for the specific purpose of a user’s process analysis goal
or to incorporate their knowledge about high-level activities. For instance, a user that aims to
analyze the resource perspective does not want low-level events to be grouped, if these were
performed by different resources. Finally, especially in online settings event data is typically
recorded on a low-level, in the form of sensor readings or click stream data, for instance. This
calls for online abstraction techniques to reduce complexity and data volume in a timely manner,
to enable the application of stream-based process mining techniques, such as online process
discovery [3]. However, there is limited work on online event abstraction in process mining [1].
   In this PhD project, we aim to overcome these gaps. First, we aim to use the meaning of
events extracted from textual payload of events to guide abstraction techniques towards better
results. Moreover, we consider characteristics to which abstracted event sequences should
adhere. These can be defined by a user of the technique, as they know best what the purpose of
their overall analysis is. This allows for more meaningful, purpose-driven abstractions. Finally,
we will focus on online scenarios, developing approaches to group events in real-time.
   The next section discusses the state of the art of event abstraction in process mining. Section
3 gives an overview of the three research streams addressed in this project, their current state,
and the respective research plans. Section 4 concludes the paper.


2. State of the Art
Two recent literature reviews give a comprehensive overview of the state of the art in event
abstraction in process mining, which indicates an increasing interest in the topic [2, 1]. As shown
there, the assumptions these approaches make differ greatly. The most prominent assumption is
that the target level of abstraction is known (cf. [4, 5]). This knowledge is assumed to come from
existing process modesl, or from a set of event sequences on two levels of which a mapping
is known. This, however, can only be assumed for a fraction of real-life scenarios. Thus,
unsupervised or semi-supervised event abstraction approaches have been proposed (cf. [6, 7]).
In this project, we focus on such unsupervised and semi-supervised approaches and consider
event abstraction as the meaningful grouping of low-level events into higher-level activities.
   In a related approach the authors suggest to use semantic similarity between event labels to
aggregate events. However, the actual event abstraction is not described [8]. Regarding purpose-
diven event abstraction, the most closely related approach was developed by Mannhardt et
al. [9], which takes as input the activities defined as low-level process models. This requires
a lot of effort from the user and, most importantly, very detailed knowledge of how activities
manifest themselves in low-level event sequences. With respect to event abstraction in streaming
scenarios, related approaches can be found in work that combines process mining with online
activity recognition [10]. Moreover, Complex Event Processing (CEP) techniques offer concepts
for event abstraction in process mining [11].
3. Current State and Research Plans
In this PhD project we aim to develop solutions to event abstraction in three research streams.
We (1) consider the meaning of events, (2) take the analysis purpose into account, and (3)
address online and offline scenarios with their own specific challenges. We follow a design
science paradigm [12] throughout the project. The objective of a design-oriented approach is
the development of artifacts that address relevant, unresolved problems. The relevance of the
artifacts to be developed is derived from the increasing complexity of processes in organizations,
their increasing flexibility, and the increasing level of detail of recorded event data. This calls
for solutions to reduce the complexity of event data in offline as well as online settings.
   For the first two research streams, initial concepts and approaches have already been devel-
oped, while the third will be addressed in future work. In the following, we give an overview of
the current state and the plans for each of the research streams.

3.1. Using semantic process information for event abstraction
Events frequently contain textual information that allows to reason about the meaning of
underlying actions, objects, and other semantic components in the process. These can contain
valuable information of the level of abstraction and related events that can be used for abstraction.
Our work on extracting semantic components from the events recorded in event logs yielded a
first publication in this project [13]. The goal here is to label the values of event attributes with
semantic roles. To achieve this, our approach follows three main steps, visualized in Figure 1.


                                                                                  attribute
                      1. Data type        2. Instance-level    3. Attribute-level classes     Augmented
Event log L
                     categorization            labeling          classification               event log L’

                                                               labeled textual values

Figure 1: Main steps of the semantic extraction approach.

   The outcome of developed approach can already be used for event abstraction tasks. For
instance, we can group together multiple actions applied to a business object, lifting the log to an
object-centric perspective. However, we plan to extend this work to be able to group low-level
events into meaningful stages in a process using the meaning of the extracted components and
how they behave throughout a process instance. Furthermore, once events are grouped, there is
a need for a meaningful name for this higher-level event. Thus, we aim to develop an approach
to automatically derive such names taking into account behavioral relations in the process as
well as extracted semantic information. We plan to evaluate these techniques based on publicly
available real-life event logs, which were shown to contain diverse natural language labels [13].
Several also contain information about higher levels of the respective process, e.g., attributes
indicating the subprocess, which can be used to validate our results.

3.2. Purpose-driven event log abstraction
We also aim to account for the specific knowledge a user can bring to an event log abstraction
task and consider the purpose of subsequent process analyses. For instance, a user who wants to
analyze an event log with respect to the resource perspective wants to be sure that each activity
in the abstracted log was performed by a single actor per case. This should be incorporated by
an event abstraction approach, leading to more purpose-oriented event logs. Hence, the goal is
to find an optimal grouping of a set of event classes, while user-defined constraints are met.
These constraints can be defined in a declarative manner on the entire grouping, the high-level
activities, and specific activity instances including their attributes in the abstracted log. We are
currently developing an approach that achieves this following the steps visualized in Figure 2.

       Event log L
                                1. Compute         2. Find an       3. Create an
                                   group            optimal          abstracted
     Constraints R              candidates         grouping           event log
                                                                                   Event log L

Figure 2: Main steps of the constraint-based event log abstraction approach.

  We evaluate this approach, by employing it in case studies using real-life logs to show the
value of considering constraints in event abstraction. Furthermore, we assess its abstraction
quality based on the complexity reduction of process models discovered after abstration.
  Using this approach, the user can then incorporate their knowledge, respectively their idea of
characteristics to which an activity should adhere. Thus, event groups become more meaningful
and produce more understandable results through just the right amount of abstraction. This
contributes to the overall goal of balancing simplicity and utility of abstracted event data.

3.3. Event abstraction over event streams
The third scenario shifts the focus from post hoc to online event abstraction. The challenge
here is the sheer volume and velocity of events arriving, which calls for efficient single pass
solutions for event abstraction. Therefore, in this third research stream we will build on the
results of the previous ones to develop techniques for event abstraction in streaming settings.
Existing work on stream-based process mining, such as online conformance checking [14, 15]
and online concept drift detection [16, 17] can serve as a starting point. An initial literature
study was conducted for this research stream, which revealed the potential of adopting ideas
from other fields. Especially CEP [11] and stream-based sequential pattern mining [18] can
serve as a foundation on which to build event abstraction techniques for online settings.
   As a first step we aim to apply parts of our purpose-driven abstraction approach, as constrains
can be used to guide event pattern detection in a stream. Yet, the computational complexity of
mining such patterns calls for approximation techniques to make such an approach feasible.


4. Conclusion
We gave an overview of the current state and the research plans of the proposed PhD project.
Therein, we develop solutions to event abstraction that consider the meaning of events and
the purpose of subsequent process analyses. Moreover, both offline and online settings are
addressed. All developed concepts and solutions serve the overarching goal of enabling more
meaningful process analyses through event abstraction.
References
 [1] S. J. van Zelst, F. Mannhardt, M. de Leoni, A. Koschmider, Event abstraction in process
     mining: literature review and taxonomy, Granular Computing 2 (2020).
 [2] K. Diba, K. Batoulis, M. Weidlich, M. Weske, Extraction, correlation, and abstraction of
     event data for process mining, Wiley Interdisciplinary Reviews 10 (2020) 1–31.
 [3] S. J. van Zelst, B. F. van Dongen, W. M. van der Aalst, Event stream-based process discovery
     using abstract representations, Knowledge and Information Systems 54 (2018) 407–435.
 [4] T. Baier, J. Mendling, M. Weske, Bridging abstraction layers in process mining, Information
     Systems 46 (2014) 123–139.
 [5] N. Tax, N. Sidorova, R. Haakma, W. M. van der Aalst, Event abstraction for process mining
     using supervised learning techniques, in: Proceedings of SAI, Springer, 2016, pp. 251–269.
 [6] M. De Leoni, S. Dündar, Event-log abstraction using batch session identification and
     clustering, Proceedings of the ACM Symposium on Applied Computing (2020) 36–44.
 [7] C. W. Gunther, A. Rozinat, W. M. Van Der Aalst, Activity mining by global trace segmen-
     tation, Lecture Notes in Business Information Processing 43 (2010) 128–139.
 [8] P. H. P. Richetti, F. A. Baião, F. M. Santoro, Declarative process mining: Reducing discovered
     models complexity by pre-processing event logs, in: S. Sadiq, P. Soffer, H. Völzer (Eds.),
     BPM, Springer, Cham, 2014, pp. 400–407.
 [9] F. Mannhardt, M. D. Leoni, H. A. Reijers, M. D. Leoni, H. A. Reijers, Guided process
     discovery–a pattern-based approach, Information systems 76 (2018) 1–18.
[10] F. Mannhardt, R. Bovo, M. Oliveira, S. Julier, A taxonomy for combining activity recognition
     and process discovery in industrial environments, in: H. Yin, D. Camacho, P. Novais, A. J.
     Tallón-Ballesteros (Eds.), IDEAL 2018, Springer, Cham, 2018, pp. 84–93.
[11] P. Soffer, A. Hinze, A. Koschmider, H. Ziekow, C. Di Ciccio, B. Koldehofe, O. Kopp, A. Ja-
     cobsen, J. Sürmeli, W. Song, From event streams to process models and back: Challenges
     and opportunities, Information Systems 81 (2019) 181–200.
[12] A. R. Hevner, S. T. March, J. Park, S. Ram, Design science in information systems research,
     MIS quarterly (2004) 75–105.
[13] A. Rebmann, H. van der Aa, Extracting semantic process information from the natural
     language in event logs, in: CAiSE, Springer, 2021, pp. 57–74.
[14] A. Burattin, S. J. van Zelst, A. Armas-Cervantes, B. F. van Dongen, J. Carmona, Online
     conformance checking using behavioural patterns, in: BPM, Springer, 2018, pp. 250–267.
[15] D. Schuster, S. J. van Zelst, Online process monitoring using incremental state-space
     expansion: An exact algorithm, in: D. Fahland, C. Ghidini, J. Becker, M. Dumas (Eds.),
     BPM, Springer, Cham, 2020, pp. 147–164.
[16] M. Hassani, Concept drift detection of event streams using an adaptive window., in: ECMS,
     2019, pp. 230–239.
[17] F. Stertz, S. Rinderle-Ma, J. Mangler, Analyzing process concept drifts based on sensor
     event streams during runtime, in: D. Fahland, C. Ghidini, J. Becker, M. Dumas (Eds.), BPM,
     Springer, Cham, 2020, pp. 202–219.
[18] M. Hassani, S. J. van Zelst, W. M. van der Aalst, On the application of sequential pattern
     mining primitives to process discovery: Overview, outlook and opportunity identification,
     Wiley interdisciplinary reviews: Data Mining and Knowledge Discovery 9 (2019).