Agnes Koschmider, Judith Michael (eds.): EMISA Workshop 2021
12 CEUR-WS.org Proceedings


Data Science Methods for Declarative Process Mining


Nico Grohmann,1 Gottfried Vossen2


Abstract: Process mining has recently drawn the attention of a large audience from both research and
industry for the analysis of business processes based on execution data. Also, declarative approaches
for process modeling and mining which are especially suitable for complex and loosely-structured
processes have become of interest in research in the last years. Several tools and approaches allow the
generation of declarative process models (most commonly in form of the DECLARE language) from
event data. This research proposal focuses on the discovery step for declarative processes and suggests
to supplement or enrich it with commonly known techniques from data science. The assumption is
that data science methods like association rule mining or sequential pattern mining can also discover
certain types of declarative constraints and additionally generate activation, time, and correlation
conditions. These conditions can support the practical understanding of declarative process models
and deliver additional insights into the analyzed processes.

Keywords: Digitalization; Declarative Process Mining; Data Science Methods


1    Introduction
Business Process Management (BPM) has a long tradition as the field of research that deals
with all aspects of processes in organizations. It focuses on the understanding and modeling
of existing processes and tries to identify deviations from desired process executions. With
current technological advances on business and information systems, it is now possible to
not only model processes by hand, but use the traces of process executions in an IT system
to discover the real-world processes. Approaches like this are subsumed under the term
process mining and are a relatively new discipline of research. Process mining connects the
disciplines of process (BPM) and data science by exploiting data sources for an analysis
of processes. The Process Mining Manifesto [VdAAdM12] has outlined the foundations
for this technology and set standards in terminology and research foci, and was continued
with the work of van der Aalst [Va16]. Process Mining and BPM, in general, is very much
focused on the imperative modeling paradigm that creates process models as diagrams in
the form of, e.g., Petri nets or the Business Process Model and Notation (BPMN).
However, researchers have investigated a different style of process modeling, namely the
declarative modeling paradigm. With this paradigm, process models are no longer depicted
1 University of Muenster, Department of Information Systems, Leonardo-Campus 3, 48149 Muenster, Germany

 nico.grohmann@wi.uni-muenster.de
2 University of Muenster, Department of Information Systems, Leonardo-Campus 3, 48149 Muenster, Germany

 gottfried.vossen@wi.uni-muenster.de


Copyright © 2021 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                Data Science and Declarative Process Mining 13

as graphs but rather constraints and conditions that restrict the model at hand. Any behavior
that is not captured by these constraints is allowed, whereas imperative modeling explicitly
states the flows allowed. Researchers have also started to use the declarative style for process
mining, e.g., Alman et al. [Al20] developed a tool that covers several aspects of declarative
process mining like discovery, conformance checking and a multi-perspective extension that
allows the introduction of additional data constraints. The declarative modeling paradigm is
especially useful for complex, loosely-structured processes that would result in a “spaghetti”
diagram when modeling all possible flows. As process mining bridges the gap between
process and data science, it seems natural to use data science techniques to support process
mining, here specifically the process discovery for the declarative paradigm. Samples
for such techniques include rule mining or sequential pattern mining. In this paper, we
investigate how data science techniques can support the process discovery step for declarative
process models. Besides the application of data mining techniques on event data we also
suggest to evaluate existing declarative process mining algorithms and to hint about how
these algorithms and data science techniques could complement each other for declarative
process discovery.

The remainder of this paper is organized as follows: Section 2 introduces the fundamental
terms and concepts related to the research field and focus of this paper. Section 3 encompasses
the actual research proposal and planned contribution. Furthermore, it shows the current
status and preliminary results of our research. Finally, Section 4 concludes the paper and
offers future research opportunities.


2   Background
Process mining exploits event data generated by the execution of real business processes for
process analysis. This data is left as traces like logs, database entries or message interchanges
in information systems that support a process. To use it for process mining, the event data
has to be transformed and integrated into an event log that can be imported into process
mining tools. There is a variety of commercial tool vendors like Celonis or Signavio or
applications originating from research like Disco or ProM, which offer process mining
functionality to different extents. The most prominent one is process discovery. Process
discovery denotes the type of process mining that generates process models from event
data. Thereby, performance indicators for all real executions can be created. Conformance
checking denotes the activity of checking whether a given (possibly hand-made) process
model matches its actual execution and, if not, where the deviations lie. Process discovery
is also possible when using the declarative modeling paradigm.
One prominent notation form for declarative process models is DECLARE [PSVdA07].
DECLARE includes a set of constraint templates for describing and restricting a certain
process. Maggi et al. [MMVdA11] present an approach to automatically discover DECLARE
models from event data. Their algorithm takes a set of candidate constraints and then checks
whether they are satisfied in a given event log. Tools like RuM [Al20] can generate declarative
14 Nico Grohmann, Gottfried Vossen

process models from event log input and express them using a set of DECLARE constraints.
Figure 1 shows a declarative process model for a sepsis treatment process in an hospital.


                Fig. 1: Declarative Model of a Sepsis Treatment Process [Al20].
It shows four instances of unary and six instances of binary DECLARE constraints. Unary
constraints are applied to a single activity of the process. Binary constraints define the
relationships between the activities. A user can specify attributes for each of the activities.
Besides the DECLARE constraints, RuM allows users to specify activation, correlation,
and time conditions. Activation conditions define conditions that have to hold to make
the constraint active. Correlation conditions define the relationship between the attributes,
e.g., two activities have to have the same name assigned; they are only possible on binary
constraints. The same holds for the time conditions that can specify in which time window
the second activity is executed after the first, e.g., between 2 and 5 hours after an activity.
Our goal is to include data science methods in the discovery process. The first two methods
we have used are association rule mining and sequential pattern mining. Association rule
mining generates if/then-statements that uncover existence relationships between items.
Sequential pattern mining additionally takes the sequential order of events into account
and generates rules that allow the deduction of temporal orders. We select these methods
because we think that the set (or a subset) of DECLARE constraints is easily transferable
to outputs of rule learning algorithms and vice versa. Additionally, rules and sequential
patterns as an alternative representation of DECLARE constraints can contribute to the
understanding of declarative process models. The following section introduces the concrete
research question and outlines the contribution of our research.


3   Research Proposal

We claim that it is possible to generate certain DECLARE constraints and conditions from
event logs using association rule mining and sequential pattern mining if the event log is in
the usual format for process mining including a case id, an activity name and timestamp
information for each line. After transforming the event data in a way that every activity
belonging to one case is in one line of the data set, frequent itemset mining calculates
                                                Data Science and Declarative Process Mining 15

itemsets of activities that frequently occur in instances of the process. Then, they serve as
input for the creation of the association rules. The result is a set of association rules that
have one or more activities as the premises and one or more activities as the conclusion.
All in all, the output of frequent itemset, association rule and sequential pattern mining is
combined to generate DECLARE models as well relations and conditions on additional
attributes of an event log corresponding to the multi-perspective extension of DECLARE.
Considering the process depicted in Figure 1, for instance, there could be an association
rule that has ER_Registration as the premise and IV_Antibiotics as the conclusion with a
support (proportion of transactions that contain an itemset) of 1 and confidence (conditional
probability of B given A) of 0.5. We demonstrate our deliberations on an imaginary sample
scenario, the process model does not indicate the used values. One could derive from that
rule that ER_Registration and IV_Antibiotics are part of the process in a hundred percent
of the cases. This could result in the introduction of an EXISTENCE-constraint in the
DECLARE notation for both activities.

Furthermore, we can introduce a COEXISTENCE-constraint for the activities
ER_Registration and IV_Antibiotics. One point worthy to note, however, is that we cannot
conclude something about the temporal orders of the activities. To this end, we use sequential
pattern mining, more concretely the Generalized Sequential Pattern (GSP) algorithm [SA96].
The GSP algorithm is capable of generating sequence patterns that take the timestamp
information of an activity into account which is highly relevant for considerations with
processes. Applying the GSP algorithm on an event log results in a set of sequential
patterns whereby each of them consists of a transaction which in this case is an activity
of the process. If we assume that the GSP algorithm discovers a sequential pattern for
IV_Antibiotics and ER_Sepsis_Triage with a support of 1 we can conclude that whenever
the former activity is executed the latter one eventually follows. Thus, we can introduce a
DECLARE RESPONSE-constraint for these two activities meaning that when the former
activity is executed the later one has to follow eventually. The generation of SUCCESSION
constraints is also possible. We see that in this way association rule and sequential pattern
mining can perform an important intermediate step for declarative process mining from
which the way to explicit declarative constraints in the DECLARE notation is not very far.
Sequential pattern mining is capable of considering additional attributes of the event log, e.g.,
date and time, cost or resource information. When including them into the GSP algorithm, it
can discover patterns for activity and attribute sequences. For instance, a sequential pattern
of IV_Antibiotics/name=Max and name=Max could be interpreted as a correlation condition
same name for the SUCCESSION constraint for IV_Antibiotics and ER_Sepsis_Triage
given that this pattern also appears for other names. Similar deliberations are possible for
timestamp information when trying to construct the time conditions in the multi-perspective
DECLARE approach. Without the use of MP-Declare mining these conditions have to be
added manually after discovering the process model. Further investigations, also regarding
activation conditions, have to follow here. One idea is that an implementation of our
approach can generate recommendations for activation conditions. For instance, a pattern of
16 Nico Grohmann, Gottfried Vossen

ER_Registration/name=Sara with a support of 1 could lead to the proposal to introduce
name=Sara as an activation condition for the INIT constraint of ER_Registration. Human
users can then manually decide whether this should be a valid activation condition or the
pattern found is just a statistical accumulation that does not represent any condition on the
process.
A sequential pattern of IV_Antibiotics/name=Max and ER_Triage could also be expressed in
form of an Event-Condition-Action (ECA) rule. ECA rules originate from active databases
where they are used, for example, to describe triggers. When an event happens (E), a
condition is checked (C). A true condition then triggers the execution of an action (A). For
the example depicted in Figure 1, the event could be the execution of activity IV_Antibiotics,
the condition the check whether attribute name has value ’Max’ and the action the execution
of activity ER_Triage afterwards. ECA rules could in this way represent an alternative
and additional notation for declarative constraints on a process that is possibly easier to
understand for practitioners or process owners that are not familiar with the DECLARE
notation or (declarative) process models in general. One can choose different support and
confidence settings for frequent itemsets, association rule and sequential pattern mining.
Support and confidence measures are comparable to the constraint support parameters in
the RuM tool. Higher values generate more precise process models but can lead to missing
out on special cases and rarer process executions.


4   Conclusion and Outlook

Our research proposal focuses on the study and application of data science methods for
declarative process mining and how they can play a role in discovering patterns leading
to constraints on the process. We have shown how such patterns could look like using
association rule and sequential pattern mining. Next, we have to further investigate how
these discovered patterns can lead to constraints, especially to the constraints available
in the DECLARE language for declarative process models. In this context, we require a
comparative evaluation of declarative mining algorithms and our approach. Besides the
discovery of declarative constraints, data science methods can also help to automatically
discover activation, time, and correlation conditions for the multi-perspective DECLARE
approach or ECA rules instead of manually adding them after the discovery process. A
positive side effect of using data science methods for declarative process discovery is that it
could improve transparency on the discovered constraints in comparison to the outcome of
a traditional mining algorithm, especially for people with a non-technical or data science
background. We will continue by applying the proposed methods on other event logs to
further validate our ideas.
                                                 Data Science and Declarative Process Mining 17

References
[Al20]        Alman, Anti; Di Ciccio, Claudio; Haas, Dominik; Maggi, Fabrizio Maria; Nolte,
              Alexander: Rule Mining with RuM. In (van Dongen, Boudewĳn F.; Montali, Marco;
              Wynn, Moe Thandar, eds): ICPM. IEEE, 2020.

[MMVdA11] Maggi, Fabrizio M; Mooĳ, Arjan J; Van der Aalst, Wil MP: User-guided discovery of
          declarative process models. In: 2011 IEEE symposium on computational intelligence
          and data mining (CIDM). IEEE, pp. 192–199, 2011.

[PSVdA07]     Pesic, Maja; Schonenberg, Helen; Van der Aalst, Wil MP: Declare: Full support for
              loosely-structured processes. In: 11th IEEE international enterprise distributed object
              computing conference (EDOC 2007). IEEE, pp. 287–287, 2007.

[SA96]        Srikant, Ramakrishnan; Agrawal, Rakesh: Mining sequential patterns: Generalizations
              and performance improvements. In: International conference on extending database
              technology. Springer, pp. 1–17, 1996.

[Va16]        Van der Aalst, Wil: Process Mining - Data Science in Action. Springer-Verlag Berlin
              Heidelberg, 2016.

[VdAAdM12] Van der Aalst, Wil; Adriansyah, Arya; de Medeiros, Ana Karla Alves et al.: Process
           Mining Manifesto. In: Business Process Management Workshops, volume 99, pp.
           169–194. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.