<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Log Shifting for Enterprise Collaboration Systems: A Supervised Event Abstraction Approach (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Jonas Blatt , Patrick Delfmann , Florian Schwade and Petra Schubert Institute for IS Research, University of Koblenz-Landau Universita ̈tsstrasse 1</institution>
          ,
          <addr-line>56070 Koblenz</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>-Event abstraction is a pre-processing method in Process Mining to create an interpretable log and, in turn, interpretable process models. Here, one or more events are aggregated to one event at a higher level of abstraction. The ProM plugin presented in this paper provides such an approach by training a log shifting model based on investigating a known high-level log and the related low-level log. This log shifting model can then be used to convert any low-level log of the same application. First evaluation results show that this approach is applicable. The log shifter presented in this paper is particularly tailored for Enterprise Collaboration Systems (ECS) that come with special low-level log characteristics.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        Overly complex models, also referred to as spaghetti–
models are a common problem in process discovery. They
are hard to read and often have no further value for the
business [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In such cases, pre–processing of the event log is
necessary before a discovery algorithm is applied for mining
a process model from the log. One suitable and promising
approach is event abstraction, which aims to aggregate the
events from a lower abstraction level (e. g., system logs or
database operations) to events at a higher abstraction level [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
By applying event abstraction, process discovery may result
in a process model with fewer activities and process flow
alternatives. Thus, the resulting process model is more
comprehensible and better interpretable for humans.
      </p>
      <p>
        In the last decade, several event abstraction approaches
were developed [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For describing event abstraction, authors
use terms such as Activity Mining [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Event Log
Abstraction [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Activity Clustering [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Log Lifting [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and Sequence
Clustering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Some of these approaches address specific
domains (e.g., Healthcare [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Internet of Things [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Customer
Journey [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]), or address general business processes. Some of
them consider intersections of low–level events, others do not.
The approaches further differ in the underlying data mining
techniques applied (e.g., supervised vs. unsupervised), the
concrete family of algorithms applied, or the amount of human
interaction required [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In general, existing approaches mainly
use clustering algorithms (e.g. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) and/or machine learning
approaches (e.g. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) to achieve the generation of the high–
level log.
      </p>
      <p>
        Social Process Mining (SPM) aims to mine rather
unstructured collaboration processes from Social Software. As
ECS support collaboration and communication in companies,
they have emerged as the core components of the digital
workplace in companies [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and thus are valuable sources for
mining and understanding collaboration processes. Due to the
characteristics of ECS, event abstraction comes with special
requirements. First, many ECS operate on multiple databases
that react differently to user actions. One user action may
result in multiple events in the low–level log generated by the
system. Second, as the databases are independent from each
other, the distance of the time stamps of two low–level events
are not always the same and moreover, the low–level events of
two corresponding high–level events might overlap temporally.
Third, due to the same reason, the order of the low–level events
might even vary slightly. Fourth, different sets of low–level
events, which belong to the same high–level activity, may
not contain the same low–level activities depending on the
usage behaviour of the user. Fifth, there may exist low–level
events, where no high–level event was triggered. Finally, logs
generated by most ECS are not interpretable for data scientists.
Hence, for generating interpretable logs, it is necessary to
observe an ECS a sufficiently long time, record the user
actions (e.g., by click path analysis) and thereby obtain a
high–level log that can then be used to train a log shifter
with the corresponding low–level log. Thus, we develop the
log shifting tool presented in this paper which satisfies the
outlined requirements for ECS logs.
      </p>
      <p>The remainder of this paper is structured as follows:
Section II contains an overview and description of the presented
tool and section III presents the evaluation of the log shifting
tool and an outlook on future work on event abstraction.</p>
    </sec>
    <sec id="sec-2">
      <title>II. TOOL DESCRIPTION</title>
      <p>In a nutshell, we train a shifting model by examining
known high–level traces and the corresponding low–level
traces. The trained shifting model can then be used to shift an
arbitrary low–level trace of the observed system to a previously
unknown high–level trace. This approach is illustrated in
Figure 1.</p>
      <p>We use our approach in an ECS with more than 4,000
users in which we can observe and track user activity in the
system using a tailored observer that records a high–level log
representing the real business activities in the process. The
High-Level
Log Trace
Log Shifting</p>
      <p>Model
evaluate</p>
      <p>Shifter
Low-Level
Log Trace
High-Level
Log Trace</p>
      <p>Fig. 1: Log Shifting Approach
observation is scheduled for 3 months for generating sufficient
amount of traces. The observed high–level log can be used in
combination with the low–level log generated by the system
itself to create and train a new log shifting model. Then, this
model can be used to convert arbitrary low–level logs from
the system into high–level logs. We are able to a) shift the
former low–level log (before we started the observation) or b)
the low–level log from another instance of the same system.</p>
      <p>
        In short, the algorithm works as follows: The shifting model
contains the information about frequencies of the low–level
events, which were mapped to the high–level events in the
training phase (by temporal proximity &amp; user equality). In
the shifting phase, we use sliding windows, n–grams and the
Damerau–Levenshtein distance [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to calculate an error score
for each possible combination of mapped windows of low–
level events inside a trace. The windows of low–level events
with the lowest error score are then mapped to new high–
level events. This procedure repeats until there are no low–
level events left. The new timestamp of the high–level event is
calculated by the combination of the mapped low–level event
timestamps (min, max, or mean).
      </p>
      <p>The algorithm is implemented as a ProM1 plugin, and can
also be used as a command line program (see quick start
instructions2). XES is used for the log format for the low–
level log as well as for the high–level log. The generated
log shifting model is a serializable Java class, which can be
in– and exported to and from ProM. Thus, the workflow in
the ProM plugin is as follows: Firstly, the low–level log, the
high–level log and the training parameter are used to create
the shifting model. Then, a low–level log and the previous
generated shifting model can be used to create a new high–
level log. A short introduction to the plugin can be found on
YouTube3.</p>
    </sec>
    <sec id="sec-3">
      <title>III. EVALUATION AND OUTLOOK</title>
      <p>First evaluations with the BPIC20 logs4 and synthetic
created low–level logs with different configurations,
corresponding to the typical characteristics of ECS low–level logs as
introduced in section I, show, that this algorithm performs
well and in a reasonable calculation time. For each activity in
the original logs, we created one or more low–level activities
(differs for each configuration). Then, the events in the logs
were replaced by these low–level events. Thereby, we mix at
random order these synthetic low–level events and adjust the
mean temporal distance between the events. For each low–
level log configuration, we trained multiple log shifting models
(different training parameters) and calculated the evaluation
accuracy, the train duration and the shift duration. In summary,
this algorithm performs well (&gt;=95% accuracy) and in a
reasonable calculation time, if the training parameters are
configured correctly regarding the log properties (can be
automatically detected by hyperparameter optimization).</p>
      <p>As outlined above, we are currently observing a high–level
log from UniConnect, an ECS based on HCL Connections,
which is the market–leading integrated ECS widely used in
practice. In the next months, we will use the log shifting
tool in the context of SPM to train a model to generate
an interpretable log that is suitable for process discovery.
The resulting model can be used for all instances of HCL
Connections. In this future work, we consider more
optimizations regarding the log shifting approach. For instance, we
plan to implement hyper–parameter optimization routines for
determining the best parameter configuration. We will also
evaluate the applicability of the log shifting model to other
ECS.</p>
    </sec>
    <sec id="sec-4">
      <title>ACKNOWLEDGMENT</title>
      <p>This work was supported by the Deutsche
Forschungsgemeinschaft (Grant DE 1983/12-1)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Veiga</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Ferreira</surname>
          </string-name>
          , “
          <article-title>Understanding Spaghetti Models with Sequence Clustering for ProM,”</article-title>
          <source>in Lecture Notes in Business Inf. Processing</source>
          , vol.
          <volume>43</volume>
          ,
          <year>2009</year>
          , pp.
          <fpage>92</fpage>
          -
          <lpage>103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mannhardt</surname>
          </string-name>
          , M. de Leoni,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Koschmider</surname>
          </string-name>
          , “
          <article-title>Event abstraction in process mining: literature review and taxonomy</article-title>
          ,
          <source>” Granular Computing</source>
          , vol.
          <volume>6</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>719</fpage>
          -
          <lpage>736</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Gu</surname>
          </string-name>
          <article-title>¨nther, A</article-title>
          . Rozinat, and W. M. van der Aalst, “Activity Mining by Global Trace Segmentation,” in
          <source>Computer Graphics Forum - CGF</source>
          , vol.
          <volume>43</volume>
          ,
          <year>2009</year>
          , pp.
          <fpage>128</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>M. De Leoni</surname>
          </string-name>
          and S. Du¨ndar, “
          <article-title>Event-log abstraction using batch session identification and clustering</article-title>
          ,
          <source>” Proceedings of the ACM Symposium on Applied Computing</source>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>44</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Rehse</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Fettke</surname>
          </string-name>
          , “
          <article-title>Clustering Business Process Activities for Identifying Reference Model Components</article-title>
          ,” in BPM Workshops,
          <string-name>
            <given-names>F.</given-names>
            <surname>Daniel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. Z.</given-names>
            <surname>Sheng</surname>
          </string-name>
          , and H. Motahari, Eds. Cham: Springer International Publishing,
          <year>2019</year>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Tello</surname>
          </string-name>
          , G. Gianini,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mizouni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Damiani</surname>
          </string-name>
          ,
          <article-title>Machine LearningBased Framework for Log-Lifting in Business Process Mining Applications</article-title>
          . Springer International Publishing,
          <year>2019</year>
          , vol.
          <volume>11675</volume>
          LNCS.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alharbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bulpitt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , “
          <article-title>Improving Pattern Detection in Healthcare Process Mining Using an Interval-Based Event Selection Method</article-title>
          ,
          <source>” in Lecture Notes in Business Information Processing</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>88</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tax</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sidorova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Haakma</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. M. Van Der Aalst</surname>
          </string-name>
          , “
          <article-title>Mining Process Model Descriptions of Daily Life through Event Abstraction,” Intelligent Systems and Applications</article-title>
          .
          <source>IntelliSys</source>
          <year>2016</year>
          .
          <article-title>Studies in Computational Intelligence</article-title>
          , vol.
          <volume>751</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bernard</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Andritsos</surname>
          </string-name>
          , “
          <article-title>CJM-ab: Abstracting customer journey maps using process mining</article-title>
          ,
          <source>” Lecture Notes in Business Information Processing</source>
          , vol.
          <volume>317</volume>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>56</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Williams</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Schubert</surname>
          </string-name>
          , “
          <article-title>Designs for the Digital Workplace,”</article-title>
          <source>in CENTERIS - Conference on ENTERprise Information Systems</source>
          , vol.
          <volume>138</volume>
          .
          <string-name>
            <surname>Elsevier</surname>
            <given-names>B.V.</given-names>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>478</fpage>
          -
          <lpage>485</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Damerau</surname>
          </string-name>
          , “
          <article-title>A technique for computer detection and correction of spelling errors,” Commun</article-title>
          . ACM, vol.
          <volume>7</volume>
          , no.
          <issue>3</issue>
          , p.
          <fpage>171</fpage>
          -
          <lpage>176</lpage>
          , Mar.
          <year>1964</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>