<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Discovering Hierarchical Process Models Using ProM</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>R.P. Jagadeesh Chandra Bose</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric H.M.W. Verbeek</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wil M.P. van der Aalst</string-name>
          <email>w.m.p.v.d.aalstg@tue.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>6</institution>
          ,
          <addr-line>Best</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Mathematics and Computer Science, University of Technology</institution>
          ,
          <addr-line>Eindhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Philips Healthcare</institution>
          ,
          <addr-line>Veenpluis 5</addr-line>
        </aff>
      </contrib-group>
      <fpage>33</fpage>
      <lpage>40</lpage>
      <abstract>
        <p>Process models can be seen as \maps" describing the operational processes of organizations. Traditional process discovery algorithms have problems dealing with ne-grained event logs and lessstructured processes. The discovered models (i.e., \maps") are spaghettilike and are di cult to comprehend or even misleading. One of the reasons for this can be attributed to the fact that the discovered models are at (without any hierarchy). In this paper, we demonstrate the discovery of hierarchical process models using a set of interrelated plugins implemented in ProM.3 The hierarchy is enabled through the automated discovery of abstractions (of activities) with domain signi cance.</p>
      </abstract>
      <kwd-group>
        <kwd>process discovery</kwd>
        <kwd>process maps</kwd>
        <kwd>hierarchical models</kwd>
        <kwd>abstractions</kwd>
        <kwd>common execution patterns</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        We have applied process mining techniques in over 100 organizations. These
practical experiences revealed two problems: (a) processes tend to be less
structured than what stakeholders expect, and (b) events logs contain ne-grained
events whereas stakeholders would like to view processes at a more coarse-grained
level. In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we showed that common execution patterns (e.g., tandem arrays,
maximal repeats etc.) manifested in an event log can be used to create powerful
abstractions. These abstractions are used in our two-phase approach to process
discovery [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The rst phase comprises of pre-processing the event log based on
abstractions (bringing the log to the desired level of granularity) and the
second phase deals with discovering the process maps while providing a seamless
zoom-in/out facility. Figure 1 highlights the di erence between the traditional
approach to process discovery and our two-phase approach. Note that the process
model (map) discovered using the two-phase approach is much simpler.
      </p>
      <p>
        The two-phase approach to process discovery [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] enables the discovery of
hierarchical process models. In this paper, we demonstrate the discovery of
hierarchical process models using a chain of plugins implemented in ProM. The chain
of plugins and their order of application is illustrated in Figure 2.
3 ProM is an extensible framework that provides a comprehensive set of
tools/plugins for the discovery and analysis of process models from event logs. See
http://www.processmining.org for more information and to download ProM.
      </p>
      <p>Filtering</p>
      <p>Event Log
s a m b c u d n j e
s a m q f h l l h g i k e
s a m f g h l h i k q e
s a m b c d n u j e
s a m f h l g i h l h k q e
s a m q f g i h l h k e
s a m q f g h l h i k e
s a m p c u d n r e
s a m b d n c u j e
s a m p d n c u r e</p>
      <p>Two-phase</p>
      <p>Approach
Abstractions de ned over
common execution patterns</p>
      <sec id="sec-1-1">
        <title>Filter Plugins</title>
        <p>Traditional
Approach
Transformed
Log
X b Z j e
X q Y Y e
X Y Y q e
X b Z Z j e
X Y Y Y q e
X q Y Y Y e
X q Y Y Y e
X p Z r e
X b Z j e
X p Z r e</p>
      </sec>
      <sec id="sec-1-2">
        <title>Pattern</title>
      </sec>
      <sec id="sec-1-3">
        <title>Abstractions</title>
      </sec>
      <sec id="sec-1-4">
        <title>Fuzzy Miner</title>
        <p>
          The event log may rst be cleansed using some simple lters (e.g., adding
arti cial start/end events, ltering events of a particular transaction type such as
considering only `complete' events etc.). The Pattern Abstractions plugin is then
applied on this ltered log one or several times. The Pattern Abstractions plugin
has been implemented as a log visualizer in ProM and caters to the discovery
of common execution patterns, the de nition of abstractions over them, and the
pre-processing of the event log with these abstractions. The transformed log
(preprocessed log with abstractions) obtained in iteration i is used as the input for
the Pattern Abstractions plugin in iteration i + 1. It is this repetitive application
of the Pattern Abstractions plugin that enables the de nition of multiple levels
of hierarchy (new abstractions can be de ned over existing abstractions). During
the pre-processing phase, for each de ned abstraction, the Pattern Abstractions
plugin generates a sub-log that captures the manifestation of execution patterns
de ned by that abstraction as its process instances. The Fuzzy Miner plugin
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is then applied on the transformed log obtained after the last iteration. The
Fuzzy Miner plugin in ProM has been enhanced to utilize the availability of
sub-logs for the de ned abstractions. Process models are discovered for each of
the sub-logs and are displayed upon zooming in on its corresponding abstract
activity.
        </p>
        <p>Running Example. We use the work ow of a simple digital photo copier as
our running example. The copier supports photocopying, scanning and printing
of documents in both color and gray modes. The scanned documents can be sent
to the user via email or FTP. Upon receipt of a job, the copier rst generates
an image of the document and subsequently processes the image to enhance
its quality. Depending on whether the job request is for a copy/scan or print,
separate procedures are followed to generate an image. For print requests, the
document is rst interpreted and then a rasterization procedure is followed to
form an image. The image is then written on the drum, developed, and fused on
to the paper.</p>
        <p>
          We have modeled this work ow of the copier in CPN tools [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and generated
event logs by simulation. We use one such event log in this paper. The event
log consists of 100 process instances, 76 event classes and 40; 995 events. The
event log contains ne-grained events pertaining to di erent procedures (e.g.,
image processing, image generation etc.) mentioned above. An analyst may not
be interested in such low level details. We demonstrate the discovery of the
work ow at various levels of abstractions for this event log.
2
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Pattern Abstractions Plugin</title>
      <p>The basic building blocks of the Pattern Abstractions plugin are shown in
Figure 3. Figures 4 and 5 illustrate these building blocks.</p>
      <p>Discover
Common
Execution
Patterns</p>
      <p>Compute
Pattern
Metrics</p>
      <p>Filter
Patterns</p>
      <p>Form &amp;</p>
      <p>Select
Abstractions</p>
      <p>Transform</p>
      <p>
        Log
{ Discover Common Execution Patterns: The Pattern Abstractions plugin
supports the discovery of tandem arrays (loop patterns) and maximal repeats
(common subsequence of activities within a process instance or across
process instances) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These can be uncovered in linear time and space with
respect to the length of the traces.
Filter Patterns
      </p>
      <p>Compute
Pattern Metrics</p>
      <p>Discover Common
Execution Patterns
Uncovered Patterns</p>
      <p>
        Pattern
Metric Values
{ Compute Pattern Metrics: Various metrics (e.g, overlapping and
non-overlapping frequency counts, instance count etc.) to assess the signi cance of
the uncovered patterns are supported.
{ Filter Patterns: It could be the case that too many patterns are uncovered
from the event log. To manage this, features to lter patterns that are less
signi cant are supported.
{ Form and Select Abstractions: Abstractions are de ned over the ltered
patterns. Patterns that are closely related are grouped together to form
abstractions. The approach for forming abstractions is presented in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Furthermore, various features to edit/select abstractions such as merging two or
more abstractions and deleting activities related to a particular abstraction
are supported. Figure 5 depicts a few abstractions de ned over loop patterns
for the copier event log e.g., half-toning, a procedure for enhancing the image
quality, is uncovered as an abstraction.
{ Transform Log: The event log is pre-processed by replacing activity
subsequences corresponding to abstractions. A replaced activity subsequence is
captured as a process instance in the sub-log for the corresponding abstract
activity.
      </p>
      <p>At any iteration, if n abstractions are selected, the Pattern Abstractions plugin
generates a transformed log, and n sub-logs (one for each of the n chosen
abstractions). We recommend to process for loop patterns in the initial iterations
and maximal repeats in the subsequent iterations. For the example event log, we
have performed three iterations. The transformed log after the third iteration
has 19 event classes and 1601 events. In the process, we have de ned various
abstractions such as half-toning, image processing, capture image, etc.</p>
      <p>Select Abstractions</p>
      <p>Form Abstractions
Transform Log</p>
      <p>The Pattern Abstractions plugin supports additional features such as
visualizing patterns and exporting the traces that contain the patterns.
3</p>
      <p>
        (Enhanced) Fuzzy Miner Plugin
The Fuzzy Miner [
        <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
        ] is a process miner that mines an event log for a family
of process models using a \map" metaphor. As many maps exist that show the
city of Amsterdam at di erent levels of abstraction, also di erent maps exist
for a process model mined from an event log. In this map metaphor, an object
of interest in Amsterdam (like the Rijksmuseum or the Anne Frank House)
corresponds to a node in the process model, where streets (like the Kalverstraat
or the PC Hooftstraat) correspond to edges in the model. For sake of convenience,
we call a single map a fuzzy instance whereas we call a family of maps (like all
Amsterdam maps) a fuzzy model.
      </p>
      <p>Like high-level maps only show major objects of interest and major streets,
high-level fuzzy instances also show only major elements (nodes and edges). For
this purpose, the Fuzzy Miner computes from the log a signi cance weight for
every element and an additional correlation weight for every edge. The higher
these weights are, the more major the element is considered to be. Furthermore,
the Fuzzy Miner uses a number of thresholds: Only elements that meet these
thresholds are shown. As such, these thresholds correspond to the required level
of abstraction: The higher these thresholds are, the higher the level of abstraction
is. For sake of completeness we mention here that a fuzzy instance may contain
clusters of minor nodes: If some objects of interest on the Amsterdam map are
too minor to be shown by themselves on some map, they may be shown as a
single (and major) object provided that they are close enough. For this reason,
the Fuzzy Miner rst attempts to cluster minor nodes into major cluster nodes,
and only if that does not work it will remove the minor node from the map.</p>
      <p>Figure 6 shows an example fuzzy model (left-hand side) and fuzzy instance
(right-hand side). Note that both views show a fuzzy instance, but the fuzzy
model view allows the user to change the thresholds (by changing the sliders)
whereas the fuzzy instance view does not. The signi cance of a node is displayed
as part of its label (for example, the node \Transfer Image" has a signi cance of
0:253), the signi cance of an edge is visualized using its wideness (the wider the
edge, the more signi cant it is), and the correlation of an edge is visualized using
its color contrast (the darker the edge is, the more correlated its input node and
its output node are). The octagonal shaped nodes in the right-hand side view
correspond to the cluster nodes (one of the cluster nodes contain 4 activities
and the other contains 11 activities). All activities on the left hand side except
\Job Complete" are contained in a cluster node on the right. Apparently, the
signi cance weights for these nodes (0:262, 0:253, 0:250, 0:296 and 0:403) were
too low to be shown, which indicates that the corresponding threshold was set
to at least 0:403. Furthermore, the node \Interpret" (on the right) is highly
selfcorrelated, whereas the nodes \Transfer Image" and \Send SMTP" (on the left)
are moderately correlated.</p>
      <p>The Fuzzy Miner has been enhanced to utilize the availability of sub-logs
obtained from the Pattern Abstractions plugin for the chosen abstractions. Fuzzy
models are discovered for each of the sub-logs and are displayed upon zooming in
on its corresponding abstract activity. Abstract activities are di erentiated from
other activities by means of a distinct color (a darker shade of blue, see also
Figure 6).</p>
      <p>Figure 7 depicts the top-level process model of the copier example. This
model is generated from the transformed log obtained after the third iteration of
Pattern Abstractions plugin. The upper branch of the process model corresponds
to the creation of the document image for print requests while the lower branch
corresponds to image creation for copy/scan requests. The two branches meet
after the image is formed and the image is subjected to some image processing
functionality. The document is then printed or sent to the user via email or
FTP. The lower level details of image creation, image processing, print image
have been abstracted in this model. The Pattern Abstractions plugin enables the
discovery of such abstractions with strong domain (functional) signi cance. Upon
zooming in on the Image Processing abstraction, the process model depicted in
Figure 8 is shown. This sub-process in turn contains another abstract activity
viz., Half Toning (the level of hierarchy is two). Zooming in on this abstract
activity displays the sub-process de ning it as depicted in Figure 8. Figure 9
depicts two other abstractions.</p>
      <p>Interpretation of
pages in a document
to print</p>
      <p>Rendering and
Screening of the
document</p>
      <p>Printing the Image
Capturing the Image
of the Document to
Copy/Scan</p>
      <p>Image Processing</p>
      <p>In this fashion, using the chain of plugins presented in this paper, one can
discover hierarchical process models.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>We demonstrated the discovery of hierarchical process models using a chain of
plugins implemented in ProM. The repetitive application of Pattern Abstractions
plugin enables the discovery of multiple levels of hierarchy. We can use this
approach to create maps that (i) depict desired traits, (ii) eliminate irrelevant
details, (iii) reduce complexity, and (iv) improve comprehensibility.
Acknowledgments R.P.J.C. Bose and W.M.P. van der Aalst are grateful to
Philips Healthcare for funding the research in process mining.</p>
      <sec id="sec-3-1">
        <title>Half Toning Fig. 8: The sub-process captured for the abstraction `Image Processing' (in the toplevel model). This sub-process in turn contains another abstraction viz., `Half Toning'. Upon zooming in on `Half Toning', the sub-process de ning that is shown.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bose</surname>
            ,
            <given-names>R.P.J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.:</given-names>
          </string-name>
          <article-title>Abstractions in Process Mining: A Taxonomy of Patterns</article-title>
          . In Dayal, U.,
          <string-name>
            <surname>Eder</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koehler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reijers</surname>
          </string-name>
          , H., eds.: Business Process Management. Volume
          <volume>5701</volume>
          of LNCS., Springer-Verlag (
          <year>2009</year>
          )
          <volume>159</volume>
          {
          <fpage>175</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bose</surname>
            ,
            <given-names>R.P.J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Mining Context-Dependent and Interactive Business Process Maps using Execution Patterns</article-title>
          . In zur Muehlen,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Su</surname>
          </string-name>
          , J., eds.:
          <article-title>BPM 2010 Workshops</article-title>
          . Volume
          <volume>66</volume>
          of LNBIP., Springer-Verlag (
          <year>2011</year>
          )
          <volume>109</volume>
          {
          <fpage>121</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Gunther, C.,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          : Fuzzy Mining:
          <article-title>Adaptive Process Simpli cation Based on Multi-perspective Metrics</article-title>
          .
          <source>In: International Conference on Business Process Management (BPM</source>
          <year>2007</year>
          ).
          <article-title>Volume 4714 of LNCS</article-title>
          ., Springer-Verlag (
          <year>2007</year>
          )
          <volume>328</volume>
          {
          <fpage>343</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Vinter</given-names>
            <surname>Ratzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Wells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Lassen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.M.</given-names>
            ,
            <surname>Laursen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Qvortrup</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.F.</given-names>
            ,
            <surname>Stissing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.S.</given-names>
            ,
            <surname>Westergaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Christensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>CPN Tools for Editing, Simulating, and Analysing Coloured Petri Nets</article-title>
          .
          <source>In: 24th International Conference on Applications and Theory of Petri Nets (ICATPN)</source>
          .
          <article-title>Volume 2679 of LNCS</article-title>
          ., Springer (
          <year>2003</year>
          )
          <volume>450</volume>
          {
          <fpage>462</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Xia</surname>
          </string-name>
          , J.:
          <article-title>Automatic Determination of Graph Simpli cation Parameter Values for Fuzzy Miner</article-title>
          .
          <source>Master's thesis</source>
          , Eindhoven University of Technology (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>