<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Excavating event logs with DiSCover</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eric Verbeek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eindhoven University of Technology</institution>
          ,
          <addr-line>Groene Loper 5, 5612 AE Eindhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This extended abstract introduces the Xcavate ProM plug-in that is built on top of the DiSCover plug-in and the Log Skeleton filter plug-in. The Xcavate plug-in allows the user to specify two ranges of noise levels, to discover a Petri net for every possible combination of noise levels in these ranges, and to return the discovered net that scores best on a combined metric. This combined metric includes a fitness metric, a precision metric, and a simplicity metric. As such, the Xcavate plug-in can handle event logs at different noise levels and return a best net.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;ProM</kwd>
        <kwd>DiSCover plug-in</kwd>
        <kwd>Event Logs</kwd>
        <kwd>Log Skeletons1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. DiSCover</title>
    </sec>
    <sec id="sec-2">
      <title>2. Log Skeleton filter</title>
    </sec>
    <sec id="sec-3">
      <title>3. Xcavate</title>
      <p>• Precision: A (relative) weight for the precision metric4.
• Simplicity: A (relative) weight for the simplicity metric.
• Select classifier: The classifier to use.
• Number of threads to use for replay: The number of threads that can be used by the
replayer (required to determine fitness and/or precision).
• Maximal number of transitions: The maximal number of transitions a Petri net may
contain to do a replay.
• Prefer WF net: Whether WF nets are preferred over non-WF nets.</p>
      <p>The first two parameters determine the ‘excavation site'. For every possible combination of
these two thresholds, the plug-in will filter the event log using the Log Skeleton filter plug-in and
then start the DisCover plug-in on the filtered log using the provided relative thresholds. Provided
that the discovered Petri net does not contain too many transitions, it will then be scored using
the three weights. In the end, the Xcavate plug-in returns a Petri net discovered at the excavation
site that scores best. If WF nets are preferred, it will return a WF net that scores best.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Examples</title>
      <p>
        As a first example, we run the Xcavate plug-in on the BPIC 2012 event log [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that only contains
the A- and O-events, that is, all W-events have been filtered out. For the parameters, we accept
the default values, except for the thresholds for which we both select all possible values. We then
start the Xcavate plug-in by selecting the Finish button. The right-hand side of Figure 2 shows the
net that is discovered by the plug-in in the end. When compared to the left-hand side (which
shows the net as discovered in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), we note that the excavated net is indeed relaxed sound, and
that it contains fewer (only one instead of eleven) silent transitions.
      </p>
      <p>
        As a second example, we ran the Xcavate plug-in on the PDC 2022 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] data set, using the interval
[0,5] for the threshold of the Log Skeleton filter plug-in and the interval [0,5] for the relative
threshold of the DiSCover plug-in, and using several different weight configurations ( ,  ,  )
4 This metric relies on an optimal alignment, and its actual value may vary depending on the actual optimal
alignment. As a result, when using this metric (weight &gt; 0) the excavation may be non-deterministic.
(where  is the fitness weight,  the precision weight, and  the simplicity weight). For the weight
configuration (50,55,30) this resulted in a PDC 2022 score of 90.67%, which is comparable to the
89.53% of the winner of that contest, and comparable to the 90.41% achieved by the best
configuration of the (non-competing) DiSCover plug-in back then. As the nets discovered by the
new DiSCover plug-in are typically much simpler, we can conclude that we get comparable results
with simpler nets. As an example, the bottom part of Figure 3 shows the net discovered by the
Xcavate plug-in for the most complex scenario of the PDC 2022 data set. When compared to the
top part (which shows the net as discovered in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), we note that the excavated net is indeed
simpler, although it still is too complex to understand from a first glance.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Links</title>
      <p>From https://www.promtools.org/ you can download the latest releases for ProM. The latest
release is ProM 6.13. To use the Xcavate plug-in, you need to have the DiSCover package (version
6.13.105 or better) installed in ProM.</p>
      <p>You can download a screencast on how to excavate Petri nets from event logs from the Latest
downloads section on https://hverbeek.win.tue.nl/. In this screencast, we load two event logs,
start the Xcavate plug-in, configure it, run it, and show the best nets discovered.</p>
      <p>On https://svn.win.tue.nl/repos/prom/Packages/DiSCover/Trunk you will find the sources
for the Xcavate and DiSCover plug-ins. ProM and the plug-ins are Open Source.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The Discover plug-in as introduced during the ICPM 2022 conference has been improved in two
ways. First, the discovered nets are now relaxed sound. This is the result of first filtering the log
and second constructing the components from the corresponding directly-follows matrix. As a
result, all components are discovered from the same set of traces, which results in a relaxed sound
net. Second, the discovered nets are now possibly simpler. This is the result of selecting a minimal
set of components that covers all activities. Potentially, this could lead to a loss of precision, as a
relation between two activities may now be lost as they may not end up in a single component.
However, given the fact that the discovered models were often too complex, and supported by
the fact that the score for the PDC 2022 contest is still comparable, we feel that this possible loss
of precision is acceptable.</p>
      <p>The Xcavate plug-in has been built on top of the improved DiSCover plug-in and extends it with
two options. First, it puts the Log Skeleton filter plug-in in front of the DiSCover plug-in, which
allows you to filter out a different of noise than the DiSCover plug-in can filter out. Second, it
allows the user to provide two collections of noise thresholds: One for the Log Skeleton filter
plugin and one for the built-in (relative) filter in the DiSCover plug-in. The Xcavate plug-in then
discovers a net for every possible pair of thresholds and compares the discovered net on its score.
This score is the weighted sum of (1) a fitness metric, (2) a precision metric and (3) a simplicity
metric. For the fitness metric and the precision metric, the input log will be replayed on the
discovered model. The discovered net that results in the best score is then returned by the
Xcavate plug-in. By setting different weights for the metrics, the user can influence the result.</p>
      <p>For future work, we aim to speed up (where possible) the construction of the components. For
this construction a step is now required to compute the set of maximal sets of activities that do
not directly follow each other. In a later step, this set is reduced to a minimal set of sets that cover
all activities. The first step has a bad complexity and may lead to problems. Although for many
event logs the running times for this step are perfectly acceptable, for some other event logs this
step takes ages. Possibly, by using the fact that we only need a minimal set later, we could simplify
the first step and hence alleviate this problem.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H. M. W.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <article-title>Discovering an S-Coverable WF-net using DiSCover</article-title>
          ,
          <source>in Proceedings of the 2022 4th International Conference on Process Mining (ICPM</source>
          <year>2022</year>
          ),
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polyvyannyy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Weber</surname>
          </string-name>
          , Eds. IEEE,
          <year>2022</year>
          , pp.
          <fpage>64</fpage>
          -
          <lpage>71</lpage>
          . [Online]. Available: https://hverbeek.win.tue.nl/wp-content/papercite-data/pdf/verbeek22.pdf
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H. M. W.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <article-title>The Log Skeleton Visualizer in ProM 6.9: The winning contribution to the Process Discovery Contest 2019</article-title>
          ,
          <source>International Journal on Software Tools for Technology Transfer</source>
          , vol.
          <volume>24</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>549</fpage>
          -
          <lpage>561</lpage>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>B. van Dongen.</surname>
          </string-name>
          (
          <year>2012</year>
          ,
          <article-title>4</article-title>
          )
          <string-name>
            <surname>BPI</surname>
          </string-name>
          <article-title>Challenge 2012</article-title>
          . https://dx.doi.org/10.4121/uuid:
          <fpage>3926db30</fpage>
          - f712
          <string-name>
            <surname>-</surname>
          </string-name>
          4394
          <string-name>
            <surname>-</surname>
          </string-name>
          aebc-75976070e91f. [Online]. Available: https://data.4tu.nl/articles/dataset/BPI Challenge 2012/12689204
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H. M. W.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <source>Process Discovery Contest</source>
          <year>2022</year>
          ,
          <article-title>4TU</article-title>
          .
          <source>ResearchData</source>
          ,
          <year>2022</year>
          . [Online]. Available: https://icpmconference.org/2022/process-discovery-contest/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>