<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dynamic Pattern-based Case Filters using Regular Expressions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Vogelgesang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Janina Nakladal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jerome Geyer-Klingeberg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peyman Badakhshan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Celonis SE</institution>
          ,
          <addr-line>Munich</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Process mining allows for a fact-based analysis of business processes by discovering descriptive process models. However, providing an overall view of the process is not enough. To gain valuable insights and identify potentials for process improvement, it is crucial to lter the underlying event log to a subset of cases of interest. Usually this is done by attribute-based lters, e.g. ltering for a speci c vendor, production line or time period. However, sometimes the interesting cases are de ned by a more complex pattern, e.g. by a sequence of certain events. In this demo, we present a new feature of Celonis that allows the analysts to lter for such complex patterns by de ning a regular expression. Due to its integration into the Celonis Process Query Language, it can be applied to arbitrary analysis components. This allows for ad-hoc ltering by experienced analysts as well as pre-de ned lters for business users.</p>
      </abstract>
      <kwd-group>
        <kwd>Process Mining • Regular Expression • Process Discovery</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Process mining provides business analysts and process owners a fact-based view
on their processes, e.g. by discovering a descriptive process model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However,
just providing an overall view of all cases is not su cient. To gain valuable
insights into the processes, the users must be able to focus on particular cases of
interest. For example, this can be orders with a delayed shipment or customer
journeys that resulted in a low customer satisfaction. To understand what goes
wrong in the process and identify possible root causes for this, it is crucial to
drill-down the event log to the relevant cases.
      </p>
      <p>Usually, these drill-downs are performed along dimensions stored with the
data (e.g. region, vendor, etc.) or based on KPIs like throughput time. However,
all of them are based on simple attribute values but do not consider the process
behavior, such as the ow of activities. For example, a user might be interested in
cases where a delivery block has been set but has never been removed afterwards.</p>
      <p>In this demo, we present a novel feature of Celonis Process Mining which
allows to lter cases based on user-de ned process ow patterns. These patterns
are de ned as regular expressions that are processed by a speci c lter operator</p>
      <p>REGULAR
EXPRESSION</p>
      <p>SEARCH</p>
      <p>INSIGHTS
USER
REPLAY</p>
      <p>PROCESS MAP
TRANSLATION</p>
      <p>DFA</p>
      <p>FILTER</p>
      <p>EVENT LOG</p>
      <p>DRILL-DOWN
integrated into the Celonis Process Query Language (PQL). This allows for
applying the lter to a speci c analysis component (e.g. a table or chart) or even
the entire analysis. As shown in this demo, advanced users can also interactively
lter the cases by entering a regular expression describing the process ow
pattern of interest. This enables them to dynamically explore the processes in depth
in order to nd potential aws or undesired behaviors in the process.</p>
      <p>In the remainder of this paper, we provide an overview of the pattern-based
case lters and its capabilities in Section 2. Section 3 discusses related work and
Section 4 describes the presented demo and its setting.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Pattern-based Case Filters</title>
      <p>The searched behavior is de ned as a regular expression { a well-known concept
for pattern-matching in programming. In contrast to programming, we do not
apply them to strings but to activity sequences. The syntax is inspired by the
widely used Perl syntax but has some adoptions to its application to event logs.</p>
      <p>
        Figure 1 gives an overview of our approach. The user-de ned regular
expression is translated into a deterministic nite automaton (DFA) by applying the
Berry-Sethi-algorithm [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and powerset construction [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Then the event log is
replayed on the DFA to solve the acceptance problem. All cases not matching
the pattern are ltered out and the process map is drilled-down accordingly.
      </p>
      <p>To use the pattern-based case lters in an analysis, the Celonis PQL provides
the match process regex(column, pattern ) operator. The rst parameter is
the activity column in the data model which stores the executed activity for each
event in the event log. The second parameter is the regular expression describing
the searched pattern. The result of the operator is a new integer column attached
to the case table. For each case matching the pattern, the value is 1 while for
non-matching cases the value is 0. This allows for the seamless integration of the
operator into any PQL lter formula de ned in the analysis. Binding the regular
expression to a variable allows to dynamically adjust the pattern by selecting
pre-de ned patterns or by editing it in an ad-hoc fashion in a text- eld. The
following language constructs can be used to de ne regular expressions which
can also be nested to de ne complex patterns.</p>
      <p>Activities: Activities are the primitives of the regular expressions and are
identi ed by their single-quoted name (e.g. ’Create Invoice’). If the activity
should match multiple activities with similar name, it is also possible to
use wildcard matching for activities (e.g. LIKE ’Create%’). Instead of the
activity name, the ANY keyword can be used to match any arbitrary activity.
Sequences: A sequence de nes a directly follows relationship between two
regular expressions. Usually, this is expressed by concatenating two symbols of
the regular expression. However, as we have activity names as primitives, this
would result in a very confusing syntax. Therefore, we use &gt;&gt; to express a
sequence of two activities (e.g., ’Create Invoice’ &gt;&gt; ’Clear Invoice’).
Choices: To choose between multiple valid regular expressions, we separate
the options with | (e.g. ’Change Quantity’ | ’Change Price’).
Alternatively, we can also give a comma-separated list of activity names surrounded
by squared brackets to de ne a choice (e.g. [’Change Quantity’, ’Change
Price’]). While the former is applicable to any regular expression, the
latter only accepts activities as the primitives. Though the latter allows to
invert the set of activities. For example, [ ! ’Change Quantity’, ’Change
Price’] matches all activities except Change Quantity and Change Price.
Quanti ers: With quanti ers ( , +, ?) one can declare that a regular
expression should not match only once, but with a certain cardinality. A regular
expression must be surrounded by brackets when applying a quanti er to it.
While (’Create Invoice’) matches any arbitrary number of occurrence,
(’Create Invoice’)+ requires the activity Create Invoice to occur at least
once. With (’Create Invoice’)? one can mark it as optional, i.e. Create
Invoice occurs once or not.</p>
      <p>Match at start/end: To declare that a regular expression must match at the
start (i.e. from the rst event of the trace), we can prepend a circum ex
(^) to it. Analogously, we can append $ to the regular expression if it must
match at the end. If the regular expression is not marked to match at the
start / end, we implicitly prepend / append (ANY) to the regular
expression for convenience in order to match the pattern anywhere within a trace.
Please note that this may interfere with other quanti ers. For example, the
regular expression (’Create Invoice’)? will also match traces having two
or more consecutive Create Invoice activities, as the additional activities are
consumed by the implicitly added (ANY) .</p>
      <p>Due to the usage of activities as primitives, regular expressions may
become quite long. To reduce the user's e ort to construct complex expressions
and to improve their readability, regular expressions can be assigned to an alias
which then can be referenced in other regular expressions. Note that the
regular expressions must be comma-separated and the last regular expression must
not have an alias, e.g. ’Change Quantity’ | ’Change Price’ AS A, ’Create
Invoice’ AS B, A &gt;&gt; (ANY) &gt;&gt; A &gt;&gt; B</p>
      <p>Instead of the activity column, it is also possible to use any other string
column of the activity table. This enables the user to de ne search patterns not
only over the activities but also over other event attributes like the resource.
Due to its integration into PQL, also columns of other type can be converted
into strings so e ectively each column of the activity table can be used.</p>
      <p>The pattern-based case lters are available in the Celonis Intelligent Business
Cloud (IBC) and in Celonis Process Mining 4.5. It is accessible to thousands of
users and actively used by various Celonis customers.</p>
      <p>The Celonis Academic Alliance1 o ers free licenses for academic purposes.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Filtering event logs by a speci c pattern of behavior can also be achieved by
other approaches. For example, LTL Checker [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] allows the user to de ne such
patterns in linear temporal logic and lter the event log to cases matching the
LTL program. However, regular expressions are usually less verbose than LTL
programs. Furthermore, regular expressions are a well-known and widely used
concept (e.g. in programming) which makes it easier to adopt by users.
      </p>
      <p>
        Similar results can be achieved by drawing the behavioral pattern as
process model, then checking its conformance [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] with the event log and nally
ltering the event log to all conforming cases. However, conformance checking
algorithms (especially alignment-based approaches) are computational
expensive. While creating DFAs from regular expressions is highly complex too, the
evaluation of traces on DFAs has linear run-time. To our experience, the fast
evaluation usually outweighs the initial e ort, especially in real-world scenarios
with huge event logs. Besides, drawing process models does not integrate well
with query languages.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Demo Description</title>
      <p>In this demo, we show a ready-to-use analysis running in the Celonis IBC.
The components of the analysis are con gured to apply a lter containing the
process match regex operator. The regular expressions used in the component
lters are bound to a global variable to share the same regular expression among
di erent components. To change this variable one can enter a new regular
expression into one of the text elds. Alternatively, it is also possible to select a
pre-de ned regular expression from the drop-down menu on the left-hand side of
the analysis. This shows that the new lter is capable to be used either by power
users that interactively lter the cases by de ning a regular expression ad-hoc
or by less experienced users who select some prepared lters from a menu.
1 https://www.celonis.com/academic-signup</p>
      <p>For this demo, we use a demo data set of a standard order-to-cash (O2C)
process. The data consists of more than 6,000,000 events of almost 1,000,000
cases with almost 500 di erent traces.</p>
      <p>Figure 2 shows a screenshot of the demo. The process map is ltered to all
cases with an unresolved delivery block. This is done by the regular expression
’Set Delivery Block’ &gt;&gt; ([! ’Release Delivery’]) $ entered in the text
eld below. As indicated by the numbers on the left-hand side, the ltered data
comprises 8,605 cases forming 53 unique variants (traces). A screencast showing
a short walk-through of the demo is available online2.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>van der Aalst</surname>
          </string-name>
          , W., de Beer, H.T.,
          <string-name>
            <surname>van Dongen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Process mining and veri cation of properties: An approach based on temporal logic</article-title>
          . In: R. Meersman et. al (ed.)
          <article-title>On the Move to Meaningful Internet Systems 2005</article-title>
          . LNCS, vol.
          <volume>3760</volume>
          , pp.
          <volume>130</volume>
          {
          <fpage>147</fpage>
          . Springer (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          : Process Mining - Data Science in Action,
          <source>Second Edition</source>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Berry</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sethi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>From regular expressions to deterministic automata</article-title>
          .
          <source>Theor. Comput. Sci</source>
          .
          <volume>48</volume>
          (
          <issue>3</issue>
          ),
          <volume>117</volume>
          {
          <fpage>126</fpage>
          (
          <year>1986</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Carmona</surname>
            , J., van Dongen,
            <given-names>B.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weidlich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Conformance Checking - Relating Processes</surname>
          </string-name>
          and Models. Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Rabin</surname>
            ,
            <given-names>M.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scott</surname>
            ,
            <given-names>D.S.:</given-names>
          </string-name>
          <article-title>Finite automata and their decision problems</article-title>
          .
          <source>IBM Journal of Research and Development</source>
          <volume>3</volume>
          (
          <issue>2</issue>
          ),
          <volume>114</volume>
          {
          <fpage>125</fpage>
          (
          <year>1959</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>