=Paper= {{Paper |id=Vol-2673/paperDR05 |storemode=property |title=Dynamic Pattern-based Case Filters using Regular Expressions |pdfUrl=https://ceur-ws.org/Vol-2673/paperDR05.pdf |volume=Vol-2673 |authors=Thomas Vogelgesang,Janina Nakladal,Jerome Geyer-Klingeberg,Peyman Badakhshan |dblpUrl=https://dblp.org/rec/conf/bpm/0001NGB20 }} ==Dynamic Pattern-based Case Filters using Regular Expressions== https://ceur-ws.org/Vol-2673/paperDR05.pdf
      Dynamic Pattern-based Case Filters using
               Regular Expressions

Thomas Vogelgesang, Janina Nakladal, Jerome Geyer-Klingeberg, and Peyman
                              Badakhshan

                          Celonis SE, Munich, Germany
              Corresponding author: j.geyerklingeberg@celonis.com
                           https://www.celonis.com/



       Abstract. Process mining allows for a fact-based analysis of business
       processes by discovering descriptive process models. However, providing
       an overall view of the process is not enough. To gain valuable insights
       and identify potentials for process improvement, it is crucial to filter the
       underlying event log to a subset of cases of interest. Usually this is done
       by attribute-based filters, e.g. filtering for a specific vendor, production
       line or time period. However, sometimes the interesting cases are defined
       by a more complex pattern, e.g. by a sequence of certain events. In this
       demo, we present a new feature of Celonis that allows the analysts to
       filter for such complex patterns by defining a regular expression. Due
       to its integration into the Celonis Process Query Language, it can be
       applied to arbitrary analysis components. This allows for ad-hoc filtering
       by experienced analysts as well as pre-defined filters for business users.

       Keywords: Process Mining · Regular Expression · Process Discovery.


1    Introduction
Process mining provides business analysts and process owners a fact-based view
on their processes, e.g. by discovering a descriptive process model [2]. However,
just providing an overall view of all cases is not sufficient. To gain valuable
insights into the processes, the users must be able to focus on particular cases of
interest. For example, this can be orders with a delayed shipment or customer
journeys that resulted in a low customer satisfaction. To understand what goes
wrong in the process and identify possible root causes for this, it is crucial to
drill-down the event log to the relevant cases.
    Usually, these drill-downs are performed along dimensions stored with the
data (e.g. region, vendor, etc.) or based on KPIs like throughput time. However,
all of them are based on simple attribute values but do not consider the process
behavior, such as the flow of activities. For example, a user might be interested in
cases where a delivery block has been set but has never been removed afterwards.
    In this demo, we present a novel feature of Celonis Process Mining which
allows to filter cases based on user-defined process flow patterns. These patterns
are defined as regular expressions that are processed by a specific filter operator




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
2        Thomas Vogelgesang et al.




                                     USER
                  SEARCH                                 INSIGHTS

(.)*
 REGULAR
EXPRESSION                           REPLAY                              PROCESS MAP




    TRANSLATION        DFA           FILTER         EVENT LOG          DRILL-DOWN

                   Fig. 1. Concept overview of pattern-based filter.


integrated into the Celonis Process Query Language (PQL). This allows for ap-
plying the filter to a specific analysis component (e.g. a table or chart) or even
the entire analysis. As shown in this demo, advanced users can also interactively
filter the cases by entering a regular expression describing the process flow pat-
tern of interest. This enables them to dynamically explore the processes in depth
in order to find potential flaws or undesired behaviors in the process.
     In the remainder of this paper, we provide an overview of the pattern-based
case filters and its capabilities in Section 2. Section 3 discusses related work and
Section 4 describes the presented demo and its setting.


2      Pattern-based Case Filters

The searched behavior is defined as a regular expression – a well-known concept
for pattern-matching in programming. In contrast to programming, we do not
apply them to strings but to activity sequences. The syntax is inspired by the
widely used Perl syntax but has some adoptions to its application to event logs.
    Figure 1 gives an overview of our approach. The user-defined regular expres-
sion is translated into a deterministic finite automaton (DFA) by applying the
Berry-Sethi-algorithm [3] and powerset construction [5]. Then the event log is
replayed on the DFA to solve the acceptance problem. All cases not matching
the pattern are filtered out and the process map is drilled-down accordingly.
    To use the pattern-based case filters in an analysis, the Celonis PQL provides
the match process regex(column, pattern ) operator. The first parameter is
the activity column in the data model which stores the executed activity for each
event in the event log. The second parameter is the regular expression describing
the searched pattern. The result of the operator is a new integer column attached
to the case table. For each case matching the pattern, the value is 1 while for
non-matching cases the value is 0. This allows for the seamless integration of the
             Dynamic Pattern-based Case Filters using Regular Expressions       3

operator into any PQL filter formula defined in the analysis. Binding the regular
expression to a variable allows to dynamically adjust the pattern by selecting
pre-defined patterns or by editing it in an ad-hoc fashion in a text-field. The
following language constructs can be used to define regular expressions which
can also be nested to define complex patterns.
Activities: Activities are the primitives of the regular expressions and are iden-
   tified by their single-quoted name (e.g. 'Create Invoice'). If the activity
   should match multiple activities with similar name, it is also possible to
   use wildcard matching for activities (e.g. LIKE 'Create%'). Instead of the
   activity name, the ANY keyword can be used to match any arbitrary activity.
Sequences: A sequence defines a directly follows relationship between two reg-
   ular expressions. Usually, this is expressed by concatenating two symbols of
   the regular expression. However, as we have activity names as primitives, this
   would result in a very confusing syntax. Therefore, we use >> to express a
   sequence of two activities (e.g., 'Create Invoice' >> 'Clear Invoice').
Choices: To choose between multiple valid regular expressions, we separate
   the options with | (e.g. 'Change Quantity' | 'Change Price'). Alterna-
   tively, we can also give a comma-separated list of activity names surrounded
   by squared brackets to define a choice (e.g. ['Change Quantity', 'Change
   Price']). While the former is applicable to any regular expression, the lat-
   ter only accepts activities as the primitives. Though the latter allows to in-
   vert the set of activities. For example, [ ! 'Change Quantity', 'Change
   Price'] matches all activities except Change Quantity and Change Price.
Quantifiers: With quantifiers (∗ , +, ?) one can declare that a regular expres-
   sion should not match only once, but with a certain cardinality. A regular
   expression must be surrounded by brackets when applying a quantifier to it.
   While ('Create Invoice')∗ matches any arbitrary number of occurrence,
   ('Create Invoice')+ requires the activity Create Invoice to occur at least
   once. With ('Create Invoice')? one can mark it as optional, i.e. Create
   Invoice occurs once or not.
Match at start/end: To declare that a regular expression must match at the
   start (i.e. from the first event of the trace), we can prepend a circumflex
   (^) to it. Analogously, we can append $ to the regular expression if it must
   match at the end. If the regular expression is not marked to match at the
   start / end, we implicitly prepend / append (ANY)∗ to the regular expres-
   sion for convenience in order to match the pattern anywhere within a trace.
   Please note that this may interfere with other quantifiers. For example, the
   regular expression ('Create Invoice')? will also match traces having two
   or more consecutive Create Invoice activities, as the additional activities are
   consumed by the implicitly added (ANY)∗ .
    Due to the usage of activities as primitives, regular expressions may be-
come quite long. To reduce the user’s effort to construct complex expressions
and to improve their readability, regular expressions can be assigned to an alias
which then can be referenced in other regular expressions. Note that the regu-
lar expressions must be comma-separated and the last regular expression must
4       Thomas Vogelgesang et al.

not have an alias, e.g. 'Change Quantity' | 'Change Price' AS A, 'Create
Invoice' AS B, A >> (ANY)∗ >> A >> B
    Instead of the activity column, it is also possible to use any other string
column of the activity table. This enables the user to define search patterns not
only over the activities but also over other event attributes like the resource.
Due to its integration into PQL, also columns of other type can be converted
into strings so effectively each column of the activity table can be used.
    The pattern-based case filters are available in the Celonis Intelligent Business
Cloud (IBC) and in Celonis Process Mining 4.5. It is accessible to thousands of
users and actively used by various Celonis customers.
    The Celonis Academic Alliance1 offers free licenses for academic purposes.


3     Related Work

Filtering event logs by a specific pattern of behavior can also be achieved by
other approaches. For example, LTL Checker [1] allows the user to define such
patterns in linear temporal logic and filter the event log to cases matching the
LTL program. However, regular expressions are usually less verbose than LTL
programs. Furthermore, regular expressions are a well-known and widely used
concept (e.g. in programming) which makes it easier to adopt by users.
     Similar results can be achieved by drawing the behavioral pattern as pro-
cess model, then checking its conformance [4] with the event log and finally
filtering the event log to all conforming cases. However, conformance checking
algorithms (especially alignment-based approaches) are computational expen-
sive. While creating DFAs from regular expressions is highly complex too, the
evaluation of traces on DFAs has linear run-time. To our experience, the fast
evaluation usually outweighs the initial effort, especially in real-world scenarios
with huge event logs. Besides, drawing process models does not integrate well
with query languages.


4     Demo Description

In this demo, we show a ready-to-use analysis running in the Celonis IBC.
The components of the analysis are configured to apply a filter containing the
process match regex operator. The regular expressions used in the component
filters are bound to a global variable to share the same regular expression among
different components. To change this variable one can enter a new regular ex-
pression into one of the text fields. Alternatively, it is also possible to select a
pre-defined regular expression from the drop-down menu on the left-hand side of
the analysis. This shows that the new filter is capable to be used either by power
users that interactively filter the cases by defining a regular expression ad-hoc
or by less experienced users who select some prepared filters from a menu.
1
    https://www.celonis.com/academic-signup
               Dynamic Pattern-based Case Filters using Regular Expressions         5




      Fig. 2. Filtered process map showing cases with unresolved delivery blocks.


    For this demo, we use a demo data set of a standard order-to-cash (O2C)
process. The data consists of more than 6,000,000 events of almost 1,000,000
cases with almost 500 different traces.
    Figure 2 shows a screenshot of the demo. The process map is filtered to all
cases with an unresolved delivery block. This is done by the regular expression
'Set Delivery Block' >> ([! 'Release Delivery'])∗ $ entered in the text
field below. As indicated by the numbers on the left-hand side, the filtered data
comprises 8,605 cases forming 53 unique variants (traces). A screencast showing
a short walk-through of the demo is available online2 .

References
1. van der Aalst, W., de Beer, H.T., van Dongen, B.: Process mining and verification
   of properties: An approach based on temporal logic. In: R. Meersman et. al (ed.)
   On the Move to Meaningful Internet Systems 2005. LNCS, vol. 3760, pp. 130–147.
   Springer (2005)
2. van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second Edition.
   Springer (2016)
3. Berry, G., Sethi, R.: From regular expressions to deterministic automata. Theor.
   Comput. Sci. 48(3), 117–126 (1986)
4. Carmona, J., van Dongen, B.F., Solti, A., Weidlich, M.: Conformance Checking -
   Relating Processes and Models. Springer (2018)
5. Rabin, M.O., Scott, D.S.: Finite automata and their decision problems. IBM Journal
   of Research and Development 3(2), 114–125 (1959)

2
    https://bit.ly/2UmC3G5